Highlighting performance between 1.3 and 1.4rc

View: New views
5 Messages — Rating Filter:   Alert me  

Highlighting performance between 1.3 and 1.4rc

by Jake Brownell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

The fix MarkM provided yesterday for the problem I reported encountering with the highlighter appears to be working--I installed the Lucene 2.9.1 rc4 artifacts.

Now I'm running into an oddity regarding performance. Our integration test is running slower than it used to. I've placed some average timings below. I'll try to describe what the test does in the hopes that someone will have some insight.

The indexing time represents the time it takes to load and index/commit ~43 books. The test then does two sets of searches.

A basic search is a dismax search across several fields including the text of the book. It searches either the exact title (in quotes) or the ISBN. Highlighting is enabled on the field that holds the text of the book.

An advanced search uses a nested dismax (inside a normal Lucene), to search for either the exact title (in quotes) or the ISBN. The main difference is that the title is only matched against fields related to titles, not authors, text of the book, etc. Highlighting is enabled against the text of the book.

The indexing time remained fairly constant. I ran with and without highlighting enabled, to see how much it was contributing. I am most interested in the jumps in time between 1.3 and 1.4 for the highlighting time.

with highlighting enabled
solr 1.3
Indexing: 40161ms
Basic: 12407ms
Advanced: 1106ms


solr 1.4 rc
Indexing: 41734ms
Basic: 26346ms
Advanced: 17067ms


without any highlighting
solr 1.3
Indexing: 41186ms
Basic: 1024ms
Advanced: 265ms

solr 1.4 rc
Indexing: 40981ms
Basic: 883ms
Advanced: 356ms

FWIW, the integration test uses an embedded solr server.

I supposed I should also ask if there are any general tips to speed up highlighting?

Thanks,
Jake

Re: Highlighting performance between 1.3 and 1.4rc

by markrmiller :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The 1.4 highlighter is Now slower if you have multi term queries or  
phrase queries. You can get the old behavior (which is faster) if you  
pass usePhraseHighlighter=false - but you will not get correct phrase  
highlighting and multi term queries won't highlight - eg prefix/
wildcard/range.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 3, 2009, at 8:18 PM, Jake Brownell <jakeb@...> wrote:

> Hi,
>
> The fix MarkM provided yesterday for the problem I reported  
> encountering with the highlighter appears to be working--I installed  
> the Lucene 2.9.1 rc4 artifacts.
>
> Now I'm running into an oddity regarding performance. Our  
> integration test is running slower than it used to. I've placed some  
> average timings below. I'll try to describe what the test does in  
> the hopes that someone will have some insight.
>
> The indexing time represents the time it takes to load and index/
> commit ~43 books. The test then does two sets of searches.
>
> A basic search is a dismax search across several fields including  
> the text of the book. It searches either the exact title (in quotes)  
> or the ISBN. Highlighting is enabled on the field that holds the  
> text of the book.
>
> An advanced search uses a nested dismax (inside a normal Lucene), to  
> search for either the exact title (in quotes) or the ISBN. The main  
> difference is that the title is only matched against fields related  
> to titles, not authors, text of the book, etc. Highlighting is  
> enabled against the text of the book.
>
> The indexing time remained fairly constant. I ran with and without  
> highlighting enabled, to see how much it was contributing. I am most  
> interested in the jumps in time between 1.3 and 1.4 for the  
> highlighting time.
>
> with highlighting enabled
> solr 1.3
> Indexing: 40161ms
> Basic: 12407ms
> Advanced: 1106ms
>
>
> solr 1.4 rc
> Indexing: 41734ms
> Basic: 26346ms
> Advanced: 17067ms
>
>
> without any highlighting
> solr 1.3
> Indexing: 41186ms
> Basic: 1024ms
> Advanced: 265ms
>
> solr 1.4 rc
> Indexing: 40981ms
> Basic: 883ms
> Advanced: 356ms
>
> FWIW, the integration test uses an embedded solr server.
>
> I supposed I should also ask if there are any general tips to speed  
> up highlighting?
>
> Thanks,
> Jake

RE: Highlighting performance between 1.3 and 1.4rc

by Jake Brownell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks Mark, that did bring the time back down. I'll have to investigate a little more, and weigh the pros of each to determine which best suits are needs.

Jake

-----Original Message-----
From: Mark Miller [mailto:markrmiller@...]
Sent: Tuesday, November 03, 2009 11:23 PM
To: solr-user@...
Cc: solr-user@...
Subject: Re: Highlighting performance between 1.3 and 1.4rc

The 1.4 highlighter is Now slower if you have multi term queries or  
phrase queries. You can get the old behavior (which is faster) if you  
pass usePhraseHighlighter=false - but you will not get correct phrase  
highlighting and multi term queries won't highlight - eg prefix/
wildcard/range.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 3, 2009, at 8:18 PM, Jake Brownell <jakeb@...> wrote:

> Hi,
>
> The fix MarkM provided yesterday for the problem I reported  
> encountering with the highlighter appears to be working--I installed  
> the Lucene 2.9.1 rc4 artifacts.
>
> Now I'm running into an oddity regarding performance. Our  
> integration test is running slower than it used to. I've placed some  
> average timings below. I'll try to describe what the test does in  
> the hopes that someone will have some insight.
>
> The indexing time represents the time it takes to load and index/
> commit ~43 books. The test then does two sets of searches.
>
> A basic search is a dismax search across several fields including  
> the text of the book. It searches either the exact title (in quotes)  
> or the ISBN. Highlighting is enabled on the field that holds the  
> text of the book.
>
> An advanced search uses a nested dismax (inside a normal Lucene), to  
> search for either the exact title (in quotes) or the ISBN. The main  
> difference is that the title is only matched against fields related  
> to titles, not authors, text of the book, etc. Highlighting is  
> enabled against the text of the book.
>
> The indexing time remained fairly constant. I ran with and without  
> highlighting enabled, to see how much it was contributing. I am most  
> interested in the jumps in time between 1.3 and 1.4 for the  
> highlighting time.
>
> with highlighting enabled
> solr 1.3
> Indexing: 40161ms
> Basic: 12407ms
> Advanced: 1106ms
>
>
> solr 1.4 rc
> Indexing: 41734ms
> Basic: 26346ms
> Advanced: 17067ms
>
>
> without any highlighting
> solr 1.3
> Indexing: 41186ms
> Basic: 1024ms
> Advanced: 265ms
>
> solr 1.4 rc
> Indexing: 40981ms
> Basic: 883ms
> Advanced: 356ms
>
> FWIW, the integration test uses an embedded solr server.
>
> I supposed I should also ask if there are any general tips to speed  
> up highlighting?
>
> Thanks,
> Jake

Re: Highlighting performance between 1.3 and 1.4rc

by Peter Wolanin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Trying to clarify when the new behavior is useful - if I'm using the
dismax handler, then would it make sense to always default to
usePhraseHighlighter=false?

-Peter

On Wed, Nov 4, 2009 at 1:42 AM, Jake Brownell <jakeb@...> wrote:

> Thanks Mark, that did bring the time back down. I'll have to investigate a little more, and weigh the pros of each to determine which best suits are needs.
>
> Jake
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@...]
> Sent: Tuesday, November 03, 2009 11:23 PM
> To: solr-user@...
> Cc: solr-user@...
> Subject: Re: Highlighting performance between 1.3 and 1.4rc
>
> The 1.4 highlighter is Now slower if you have multi term queries or
> phrase queries. You can get the old behavior (which is faster) if you
> pass usePhraseHighlighter=false - but you will not get correct phrase
> highlighting and multi term queries won't highlight - eg prefix/
> wildcard/range.
>
> - Mark
>
> http://www.lucidimagination.com (mobile)
>
> On Nov 3, 2009, at 8:18 PM, Jake Brownell <jakeb@...> wrote:
>
>> Hi,
>>
>> The fix MarkM provided yesterday for the problem I reported
>> encountering with the highlighter appears to be working--I installed
>> the Lucene 2.9.1 rc4 artifacts.
>>
>> Now I'm running into an oddity regarding performance. Our
>> integration test is running slower than it used to. I've placed some
>> average timings below. I'll try to describe what the test does in
>> the hopes that someone will have some insight.
>>
>> The indexing time represents the time it takes to load and index/
>> commit ~43 books. The test then does two sets of searches.
>>
>> A basic search is a dismax search across several fields including
>> the text of the book. It searches either the exact title (in quotes)
>> or the ISBN. Highlighting is enabled on the field that holds the
>> text of the book.
>>
>> An advanced search uses a nested dismax (inside a normal Lucene), to
>> search for either the exact title (in quotes) or the ISBN. The main
>> difference is that the title is only matched against fields related
>> to titles, not authors, text of the book, etc. Highlighting is
>> enabled against the text of the book.
>>
>> The indexing time remained fairly constant. I ran with and without
>> highlighting enabled, to see how much it was contributing. I am most
>> interested in the jumps in time between 1.3 and 1.4 for the
>> highlighting time.
>>
>> with highlighting enabled
>> solr 1.3
>> Indexing: 40161ms
>> Basic: 12407ms
>> Advanced: 1106ms
>>
>>
>> solr 1.4 rc
>> Indexing: 41734ms
>> Basic: 26346ms
>> Advanced: 17067ms
>>
>>
>> without any highlighting
>> solr 1.3
>> Indexing: 41186ms
>> Basic: 1024ms
>> Advanced: 265ms
>>
>> solr 1.4 rc
>> Indexing: 40981ms
>> Basic: 883ms
>> Advanced: 356ms
>>
>> FWIW, the integration test uses an embedded solr server.
>>
>> I supposed I should also ask if there are any general tips to speed
>> up highlighting?
>>
>> Thanks,
>> Jake
>



--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@...

Re: Highlighting performance between 1.3 and 1.4rc

by markrmiller :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If the query doesn't have clauses where it would matter (positional,  
phrase, multiterm)  it's just as fast either way.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 6, 2009, at 8:35 PM, Peter Wolanin <peter.wolanin@...>  
wrote:

> Trying to clarify when the new behavior is useful - if I'm using the
> dismax handler, then would it make sense to always default to
> usePhraseHighlighter=false?
>
> -Peter
>
> On Wed, Nov 4, 2009 at 1:42 AM, Jake Brownell <jakeb@...>  
> wrote:
>> Thanks Mark, that did bring the time back down. I'll have to  
>> investigate a little more, and weigh the pros of each to determine  
>> which best suits are needs.
>>
>> Jake
>>
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@...]
>> Sent: Tuesday, November 03, 2009 11:23 PM
>> To: solr-user@...
>> Cc: solr-user@...
>> Subject: Re: Highlighting performance between 1.3 and 1.4rc
>>
>> The 1.4 highlighter is Now slower if you have multi term queries or
>> phrase queries. You can get the old behavior (which is faster) if you
>> pass usePhraseHighlighter=false - but you will not get correct phrase
>> highlighting and multi term queries won't highlight - eg prefix/
>> wildcard/range.
>>
>> - Mark
>>
>> http://www.lucidimagination.com (mobile)
>>
>> On Nov 3, 2009, at 8:18 PM, Jake Brownell <jakeb@...> wrote:
>>
>>> Hi,
>>>
>>> The fix MarkM provided yesterday for the problem I reported
>>> encountering with the highlighter appears to be working--I installed
>>> the Lucene 2.9.1 rc4 artifacts.
>>>
>>> Now I'm running into an oddity regarding performance. Our
>>> integration test is running slower than it used to. I've placed some
>>> average timings below. I'll try to describe what the test does in
>>> the hopes that someone will have some insight.
>>>
>>> The indexing time represents the time it takes to load and index/
>>> commit ~43 books. The test then does two sets of searches.
>>>
>>> A basic search is a dismax search across several fields including
>>> the text of the book. It searches either the exact title (in quotes)
>>> or the ISBN. Highlighting is enabled on the field that holds the
>>> text of the book.
>>>
>>> An advanced search uses a nested dismax (inside a normal Lucene), to
>>> search for either the exact title (in quotes) or the ISBN. The main
>>> difference is that the title is only matched against fields related
>>> to titles, not authors, text of the book, etc. Highlighting is
>>> enabled against the text of the book.
>>>
>>> The indexing time remained fairly constant. I ran with and without
>>> highlighting enabled, to see how much it was contributing. I am most
>>> interested in the jumps in time between 1.3 and 1.4 for the
>>> highlighting time.
>>>
>>> with highlighting enabled
>>> solr 1.3
>>> Indexing: 40161ms
>>> Basic: 12407ms
>>> Advanced: 1106ms
>>>
>>>
>>> solr 1.4 rc
>>> Indexing: 41734ms
>>> Basic: 26346ms
>>> Advanced: 17067ms
>>>
>>>
>>> without any highlighting
>>> solr 1.3
>>> Indexing: 41186ms
>>> Basic: 1024ms
>>> Advanced: 265ms
>>>
>>> solr 1.4 rc
>>> Indexing: 40981ms
>>> Basic: 883ms
>>> Advanced: 356ms
>>>
>>> FWIW, the integration test uses an embedded solr server.
>>>
>>> I supposed I should also ask if there are any general tips to speed
>>> up highlighting?
>>>
>>> Thanks,
>>> Jake
>>
>
>
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wolanin@...