|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Slow responseI am pretty new to Solr and this is my first post to this list so please
forgive me if I make any glaring errors. Here's my problem. When I do a search using the Solr admin interface for a term that I know does not exist in my index the QTime is about 1ms. However, if I add facets to the search the response takes more than 20 seconds (and sometimes longer) to return. Here is the slow URL - /select?qf=AUTHOR_t+SUBJECT_t+TITLE_t&wt=xml&f.AUTHOR_facet.facet.sort=t rue&f.FORMAT_t.facet.limit=25&start=0&facet=true&facet.mincount=1&q=frak &f.FORMAT_t.facet.mincount=1&f.ITYPE_facet.facet.mincount=1&f.SUBJECT_fa cet.facet.limit=25&facet.field=AUTHOR_facet&facet.field=FORMAT_t&facet.f ield=LANGUAGE_t&facet.field=PUBDATE_t&facet.field=SUBJECT_facet&facet.fi eld=AGENCY_facet&facet.field=ITYPE_facet&f.AGENCY_facet.facet.sort=true& f.AGENCY_facet.facet.limit=-1&rows=10&f.ITYPE_facet.facet.limit=-1&f.ITY PE_facet.facet.sort=true&f.AUTHOR_facet.facet.limit=25&f.LANGUAGE_t.face t.sort=true&f.PUBDATE_t.facet.limit=-1&f.AGENCY_facet.facet.mincount=1&f .AUTHOR_facet.facet.mincount=1&fl=*&fl=score&qt=dismax&version=2.2&f.SUB JECT_facet.facet.sort=true&f.SUBJECT_facet.facet.mincount=1&f.PUBDATE_t. facet.sort=false&f.FORMAT_t.facet.sort=true&f.LANGUAGE_t.facet.limit=25& f.LANGUAGE_t.facet.mincount=1&f.PUBDATE_t.facet.mincount=1 I am pretty sure I can't be the first to ask this question but I can't seem to find anything online with the answer. Thanks for your help. Aaron |
|
|
Re: Slow responseOn 9/6/07, Aaron Hammond <aaron.hammond@...> wrote:
> I am pretty new to Solr and this is my first post to this list so please > forgive me if I make any glaring errors. > > Here's my problem. When I do a search using the Solr admin interface for > a term that I know does not exist in my index the QTime is about 1ms. > However, if I add facets to the search the response takes more than 20 > seconds (and sometimes longer) to return. Here is the slow URL - Faceting on multi-value fields is more a function of the number of terms in the field (and their distribution) rather than the number of hits for a query. That said, perhaps faceting should be able to bail out if there are no hits. Is your question more about why faceting takes so long in general, or why it takes so long if there are no results? If you haven't, try optimizing your index for facet faceting in general. How many docs do you have in your index? As a side note, the way multi-valued faceting currently works, it's actually normally faster if the query returns a large number of hits. -Yonik |
|
|
RE: Slow responseThank-you for your response, this does shed some light on the subject.
Our basic question was why were we seeing slower responses the smaller our result set got. Currently we are searching about 1.2 million documents with the source document about 2KB, but we do duplicate some of the data. I bumped up my filterCache to 5 million and the 2nd search I did for an non-indexed term came back in 2.1 seconds so that is much improved. I am a little concerned about having this value so high but this is our problem and we will play with it. I do have a few follow-up questions. First, in regards to the filterCache once a single search has been done and facets requested, as long as new facets aren't requested and the size is large enough then the filters will remain in the cache, correct? Also, you mention that faceting is more a "function of the number of the number of terms in the field". The 2 fields causing our problems are Authors and Subjects. If we divided up the data that made these facets into more specific fields (Primary author, secondary author, etc.) would this perform better? So the number of facet fields would increase but the unique terms for a given facet should be less. Thanks again for all your help. Aaron -----Original Message----- From: yseeley@... [mailto:yseeley@...] On Behalf Of Yonik Seeley Sent: Thursday, September 06, 2007 4:17 PM To: solr-user@... Subject: Re: Slow response On 9/6/07, Aaron Hammond <aaron.hammond@...> wrote: > I am pretty new to Solr and this is my first post to this list so please > forgive me if I make any glaring errors. > > Here's my problem. When I do a search using the Solr admin interface for > a term that I know does not exist in my index the QTime is about 1ms. > However, if I add facets to the search the response takes more than 20 > seconds (and sometimes longer) to return. Here is the slow URL - Faceting on multi-value fields is more a function of the number of terms in the field (and their distribution) rather than the number of hits for a query. That said, perhaps faceting should be able to bail out if there are no hits. Is your question more about why faceting takes so long in general, or why it takes so long if there are no results? If you haven't, try optimizing your index for facet faceting in general. How many docs do you have in your index? As a side note, the way multi-valued faceting currently works, it's actually normally faster if the query returns a large number of hits. -Yonik |
|
|
Re: Slow responseOn 6-Sep-07, at 3:16 PM, Aaron Hammond wrote:
> Thank-you for your response, this does shed some light on the subject. > Our basic question was why were we seeing slower responses the smaller > our result set got. > > Currently we are searching about 1.2 million documents with the source > document about 2KB, but we do duplicate some of the data. I bumped > up my > filterCache to 5 million and the 2nd search I did for an non-indexed > term came back in 2.1 seconds so that is much improved. I am a little > concerned about having this value so high but this is our problem > and we > will play with it. > > I do have a few follow-up questions. First, in regards to the > filterCache once a single search has been done and facets > requested, as > long as new facets aren't requested and the size is large enough then > the filters will remain in the cache, correct? > > Also, you mention that faceting is more a "function of the number > of the > number of terms in the field". The 2 fields causing our problems are > Authors and Subjects. If we divided up the data that made these facets > into more specific fields (Primary author, secondary author, etc.) > would > this perform better? So the number of facet fields would increase but > the unique terms for a given facet should be less. There are essentially two facet computation strategies: 1. cached bitsets: a bitset for each term is generated and intersected with the query restul bitset. This is more general and performs well up to a few thousand terms. 2. field enumeration: cache the field contents, and generate counts using this data. Relatively independent of #unique terms, but requires at most a single facet value per field per document. So, if you factor author into Primary author/Secondary author, where each is guaranteed to only have one value per doc, this could greatly accelerate your faceting. There are probably fewer unique subjects, so strategy 1 is likely fine. To use strategy 2, just make sure that multivalued="false" is set for those fields in schema.xml -Mike |
|
|
Re: Slow responseOn 6-Sep-07, at 3:25 PM, Mike Klaas wrote: > > There are essentially two facet computation strategies: > > 1. cached bitsets: a bitset for each term is generated and > intersected with the query restul bitset. This is more general and > performs well up to a few thousand terms. > > 2. field enumeration: cache the field contents, and generate counts > using this data. Relatively independent of #unique terms, but > requires at most a single facet value per field per document. > > So, if you factor author into Primary author/Secondary author, > where each is guaranteed to only have one value per doc, this could > greatly accelerate your faceting. There are probably fewer unique > subjects, so strategy 1 is likely fine. > > To use strategy 2, just make sure that multivalued="false" is set > for those fields in schema.xml I forgot to mention that strategy 2 also requires a single token for each doc (see http://wiki.apache.org/solr/ FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) -Mike |
|
|
Re: Slow responseHi Mike,
Thanks for clarifying what has been a bit of a black box to me. A couple of questions, to increase my understanding, if you don't mind. If I am only using fields with multiValued="false", with a type of "string" or "integer" (untokenized), does solr automatically use approach 2? Or is this something I have to actively configure? And is approach 2 better than 1? Or vice versa? Or is the answer "it depends"? :-) If, as I suspect, the answer was "it depends", are there any general guidelines on when to use or approach or the other? Thanks, Tom On 9/6/07, Mike Klaas <mike.klaas@...> wrote: > > > On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: > > > > > There are essentially two facet computation strategies: > > > > 1. cached bitsets: a bitset for each term is generated and > > intersected with the query restul bitset. This is more general and > > performs well up to a few thousand terms. > > > > 2. field enumeration: cache the field contents, and generate counts > > using this data. Relatively independent of #unique terms, but > > requires at most a single facet value per field per document. > > > > So, if you factor author into Primary author/Secondary author, > > where each is guaranteed to only have one value per doc, this could > > greatly accelerate your faceting. There are probably fewer unique > > subjects, so strategy 1 is likely fine. > > > > To use strategy 2, just make sure that multivalued="false" is set > > for those fields in schema.xml > > I forgot to mention that strategy 2 also requires a single token for > each doc (see http://wiki.apache.org/solr/ > FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) > > -Mike > |
|
|
Re: Slow responseOn 14-Sep-07, at 3:38 PM, Tom Hill wrote:
> Hi Mike, > > Thanks for clarifying what has been a bit of a black box to me. > > A couple of questions, to increase my understanding, if you don't > mind. > > If I am only using fields with multiValued="false", with a type of > "string" > or "integer" (untokenized), does solr automatically use approach > 2? Or is > this something I have to actively configure? It'll happen automatically. > And is approach 2 better than 1? Or vice versa? Or is the answer "it > depends"? :-) It depends :) > If, as I suspect, the answer was "it depends", are there any general > guidelines on when to use or approach or the other? Yeah, it usually depends on how many unique facet values there are, how many documents are returned in the query, and how much memory you have. 1 is usually faster when there are few terms; 2 is usually faster when there are many terms. Things can be further complicated by additional parameters, like facet.enum.cache.minDf (http://wiki.apache.org/solr/ SimpleFacetParameters#head-3ea6fc5d1056447295c38c9675e35ce06fd95f97) -Mike > > > > > On 9/6/07, Mike Klaas <mike.klaas@...> wrote: >> >> >> On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: >> >>> >>> There are essentially two facet computation strategies: >>> >>> 1. cached bitsets: a bitset for each term is generated and >>> intersected with the query restul bitset. This is more general and >>> performs well up to a few thousand terms. >>> >>> 2. field enumeration: cache the field contents, and generate counts >>> using this data. Relatively independent of #unique terms, but >>> requires at most a single facet value per field per document. >>> >>> So, if you factor author into Primary author/Secondary author, >>> where each is guaranteed to only have one value per doc, this could >>> greatly accelerate your faceting. There are probably fewer unique >>> subjects, so strategy 1 is likely fine. >>> >>> To use strategy 2, just make sure that multivalued="false" is set >>> for those fields in schema.xml >> >> I forgot to mention that strategy 2 also requires a single token for >> each doc (see http://wiki.apache.org/solr/ >> FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) >> >> -Mike >> |
| Free embeddable forum powered by Nabble | Forum Help |