On Sun, Jun 28, 2009 at 9:08 PM, Nigel<
nigelspleen@...> wrote:
>> Unfortunately the TermInfos must still be hit to look up the
>> freq/proxOffset in the postings files.
>
> But for that data you only have to hit the TermInfos for the terms you're
> searching, correct? So, assuming that there are vastly more terms in the
> index than you ever actually use in a query, we could rely on the LRU cache
> to keep the queried TermInfos around, rather than loading all of them
> up-front. This was a hypothesis based on some tracing through the code but
> not a lot of knowledge of Lucene internals, so please steer me back to
> reality if necessary. (-:
Right, it's only the terms in your query that need to be looked up.
There's already an LRU cache for the Term -> TermInfos lookup, in
TermInfosReader (hard wired to size 1024). It was created as a "short
term" cache so that queries that look up the same term twice (eg once
for idf and once to get freq/prox postings position), would result in
only one real lookup. You could try privately experimenting w/ that
value to see if you can get a reasonable it rate?
Lucene only loads the "indexed" terms (= every 128th) into RAM, by default.
Mike
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail:
java-user-unsubscribe@...
For additional commands, e-mail:
java-user-help@...