Highlighting is very slow

View: New views
7 Messages — Rating Filter:   Alert me  

Highlighting is very slow

by Andrew Clegg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi everyone,

I'm experimenting with highlighting for the first time, and it seems shockingly slow for some queries.

For example, this query:

http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on

takes 313ms. But when I add highlighting:

http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=*&fl=id

it takes 305212ms = 5mins!

Some of my documents are slightly large -- the 10 hits for that query contain between 362 bytes and 1.4 megabytes of text each. All fields are stored and indexed, and most are termvectored. But this doesn't seem excessively large!

Has anyone else seen this sort of behaviour before? This is with a nightly from 2009-10-26.

All suggestions would be appreciated. My schema and config files are attached...

schema.xml
solrconfig.xml

Thanks (once again),

Andrew.

Re: Highlighting is very slow

by hossman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


: Has anyone else seen this sort of behaviour before? This is with a nightly
: from 2009-10-26.

have you tried hl.usePhraseHighlighter=false ? ...

http://old.nabble.com/Highlighting-performance-between-1.3-and-1.4rc-to26190790.html

...it doesn't seem like it should be affecting you for a simple term
query, but i'm not sure.



-Hoss


Re: Highlighting is very slow

by markrmiller :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It should be the same speed wither way for a term query. The  
highlighted is going to be slow on general for a 1mb + doc. It  
processes a token at a time. The fast vector highlighter is much  
faster in those cases and should be in the next release. It handles  
fewer query types though.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 4, 2009, at 1:26 PM, Chris Hostetter <hossman_lucene@...>  
wrote:

>
> : Has anyone else seen this sort of behaviour before? This is with a  
> nightly
> : from 2009-10-26.
>
> have you tried hl.usePhraseHighlighter=false ? ...
>
> http://old.nabble.com/Highlighting-performance-between-1.3-and-1.4rc-to26190790.html
>
> ...it doesn't seem like it should be affecting you for a simple term
> query, but i'm not sure.
>
>
>
> -Hoss
>

Re: Highlighting is very slow

by Andrew Clegg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Indeed -- it actually went slightly slower but only by a few seconds, I suspect that's within normal variance.

I'll hold out for the new version then -- it's certainly not mission critical.

Thanks,

Andrew.

markrmiller wrote:
It should be the same speed wither way for a term query. The  
highlighted is going to be slow on general for a 1mb + doc. It  
processes a token at a time. The fast vector highlighter is much  
faster in those cases and should be in the next release. It handles  
fewer query types though.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 4, 2009, at 1:26 PM, Chris Hostetter <hossman_lucene@fucit.org>  
wrote:

>
> : Has anyone else seen this sort of behaviour before? This is with a  
> nightly
> : from 2009-10-26.
>
> have you tried hl.usePhraseHighlighter=false ? ...
>
> http://old.nabble.com/Highlighting-performance-between-1.3-and-1.4rc-to26190790.html
>
> ...it doesn't seem like it should be affecting you for a simple term
> query, but i'm not sure.
>
>
>
> -Hoss
>

Re: Highlighting is very slow

by Nicolas Dessaigne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Andrew,

Alternatively, you could use a copyfield with a maxChars limit as your
highlighting field. Works well in my case.

See https://issues.apache.org/jira/browse/SOLR-538

Nicolas

2009/11/5 Andrew Clegg <andrew.clegg@...>

>
>
> Indeed -- it actually went slightly slower but only by a few seconds, I
> suspect that's within normal variance.
>
> I'll hold out for the new version then -- it's certainly not mission
> critical.
>
> Thanks,
>
> Andrew.
>
>
> markrmiller wrote:
> >
> > It should be the same speed wither way for a term query. The
> > highlighted is going to be slow on general for a 1mb + doc. It
> > processes a token at a time. The fast vector highlighter is much
> > faster in those cases and should be in the next release. It handles
> > fewer query types though.
> >
> > - Mark
> >
> > http://www.lucidimagination.com (mobile)
> >
> > On Nov 4, 2009, at 1:26 PM, Chris Hostetter <hossman_lucene@...>
> > wrote:
> >
> >>
> >> : Has anyone else seen this sort of behaviour before? This is with a
> >> nightly
> >> : from 2009-10-26.
> >>
> >> have you tried hl.usePhraseHighlighter=false ? ...
> >>
> >>
> http://old.nabble.com/Highlighting-performance-between-1.3-and-1.4rc-to26190790.html
> >>
> >> ...it doesn't seem like it should be affecting you for a simple term
> >> query, but i'm not sure.
> >>
> >>
> >>
> >> -Hoss
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26211697.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Highlighting is very slow

by Andrew Clegg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nicolas Dessaigne wrote:
Alternatively, you could use a copyfield with a maxChars limit as your
highlighting field. Works well in my case.
Thanks for the tip. We did think about doing something similar (only enabling highlighting for certain shorter fields) but we decided that perhaps users would be confused if search terms were sometimes snippeted+highlighted and sometimes not. (A brief run through with a single user suggested this, although that's not statistically significant...) So we decided to avoid highlighting altogether until we can do it across the board.

Cheers,

Andrew.

Re: Highlighting is very slow

by Nicolas Dessaigne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm afraid there is no perfect solution for this problem, as you may always
have very long documents that will result in long response times, even with
a faster implementation (see https://issues.apache.org/jira/browse/SOLR-1268
).

The only way to avoid confusion for users and to ensure correct response
times is to truncate the indexed field. This way, every documents returned
can be highlighted... but you'll miss matches in long documents!

If you don't control the length of the documents and need highlight, either
you don't highlight all documents, either you don't find all documents. I
think that a pretty large copyfield (maybe 50k?) is usually enough for most
documents to be highlighted, but that depends on your corpus.

Good luck ;)
Nicolas


2009/11/9 Andrew Clegg <andrew.clegg@...>

>
>
> Nicolas Dessaigne wrote:
> >
> > Alternatively, you could use a copyfield with a maxChars limit as your
> > highlighting field. Works well in my case.
> >
>
> Thanks for the tip. We did think about doing something similar (only
> enabling highlighting for certain shorter fields) but we decided that
> perhaps users would be confused if search terms were sometimes
> snippeted+highlighted and sometimes not. (A brief run through with a single
> user suggested this, although that's not statistically significant...) So
> we
> decided to avoid highlighting altogether until we can do it across the
> board.
>
> Cheers,
>
> Andrew.
> --
> View this message in context:
> http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26267441.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>