On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote:
> Thank-you for your response, this does shed some light on the subject.
> Our basic question was why were we seeing slower responses the smaller
> our result set got.
>
> Currently we are searching about 1.2 million documents with the source
> document about 2KB, but we do duplicate some of the data. I bumped
> up my
> filterCache to 5 million and the 2nd search I did for an non-indexed
> term came back in 2.1 seconds so that is much improved. I am a little
> concerned about having this value so high but this is our problem
> and we
> will play with it.
>
> I do have a few follow-up questions. First, in regards to the
> filterCache once a single search has been done and facets
> requested, as
> long as new facets aren't requested and the size is large enough then
> the filters will remain in the cache, correct?
>
> Also, you mention that faceting is more a "function of the number
> of the
> number of terms in the field". The 2 fields causing our problems are
> Authors and Subjects. If we divided up the data that made these facets
> into more specific fields (Primary author, secondary author, etc.)
> would
> this perform better? So the number of facet fields would increase but
> the unique terms for a given facet should be less.
There are essentially two facet computation strategies:
1. cached bitsets: a bitset for each term is generated and
intersected with the query restul bitset. This is more general and
performs well up to a few thousand terms.
2. field enumeration: cache the field contents, and generate counts
using this data. Relatively independent of #unique terms, but
requires at most a single facet value per field per document.
So, if you factor author into Primary author/Secondary author, where
each is guaranteed to only have one value per doc, this could greatly
accelerate your faceting. There are probably fewer unique subjects,
so strategy 1 is likely fine.
To use strategy 2, just make sure that multivalued="false" is set for
those fields in schema.xml
-Mike