[jira] Created: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next >

[jira] Created: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Explore performance of multi-PQ vs single-PQ sorting API
--------------------------------------------------------

                 Key: LUCENE-1997
                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: 2.9
            Reporter: Michael McCandless
            Assignee: Michael McCandless


Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
where a simpler (non-segment-based) comparator API is proposed that
gathers results into multiple PQs (one per segment) and then merges
them in the end.

I started from John's multi-PQ code and worked it into
contrib/benchmark so that we could run perf tests.  Then I generified
the Python script I use for running search benchmarks (in
contrib/benchmark/sortBench.py).

The script first creates indexes with 1M docs (based on
SortableSingleDocSource, and based on wikipedia, if available).  Then
it runs various combinations:

  * Index with 20 balanced segments vs index with the "normal" log
    segment size

  * Queries with different numbers of hits (only for wikipedia index)

  * Different top N

  * Different sorts (by title, for wikipedia, and by random string,
    random int, and country for the random index)

For each test, 7 search rounds are run and the best QPS is kept.  The
script runs singlePQ then multiPQ, and records the resulting best QPS
for each and produces table (in Jira format) as output.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1997:
---------------------------------------

    Attachment: LUCENE-1997.patch

Attached patch.

Note that patch is based on 2.9.x branch, so first checkout 2.9.x,
apply the patch, then:

  cd contrib/benchmark
  ant compile
  <edit constants @ top of sortBench.py>
  python -u sortBench.py -run results
  python -u sortBench.py -report results

The important constants are INDEX_DIR_BASE (where created indexes are
stored), WIKI_FILE (points to .tar.bz2 or .tar export of wikipedia; if
this file can't be found the script just skips the wikipedia tests).
You can also change INDEX_NUM_DOCS and INDEX_NUM_THREADS.

If you don't have the wiki export downloaded, that's fine... the
script should just run the tests based on the random index.


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767870#action_12767870 ]

Michael McCandless commented on LUCENE-1997:
--------------------------------------------

OK I ran sortBench.py on opensolaris 2009.06 box, Java 1.6.0_13.

It'd be great if others with more mainstream platforms (Linux,
Windows) could run this and post back.

Raw results (only ran on the log-sized segments):

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|318481|title|10|114.26|112.40|{color:red}-1.6%{color}|
|log|1|318481|title|25|117.59|110.08|{color:red}-6.4%{color}|
|log|1|318481|title|50|116.22|106.96|{color:red}-8.0%{color}|
|log|1|318481|title|100|114.48|100.07|{color:red}-12.6%{color}|
|log|1|318481|title|500|103.16|73.98|{color:red}-28.3%{color}|
|log|1|318481|title|1000|95.60|57.85|{color:red}-39.5%{color}|
|log|<all>|1000000|title|10|95.71|109.41|{color:green}14.3%{color}|
|log|<all>|1000000|title|25|111.56|101.73|{color:red}-8.8%{color}|
|log|<all>|1000000|title|50|110.56|98.84|{color:red}-10.6%{color}|
|log|<all>|1000000|title|100|104.09|93.02|{color:red}-10.6%{color}|
|log|<all>|1000000|title|500|93.36|66.67|{color:red}-28.6%{color}|
|log|<all>|1000000|title|1000|97.07|50.03|{color:red}-48.5%{color}|
|log|<all>|1000000|rand string|10|118.10|109.63|{color:red}-7.2%{color}|
|log|<all>|1000000|rand string|25|107.68|102.33|{color:red}-5.0%{color}|
|log|<all>|1000000|rand string|50|107.12|100.37|{color:red}-6.3%{color}|
|log|<all>|1000000|rand string|100|110.63|95.17|{color:red}-14.0%{color}|
|log|<all>|1000000|rand string|500|79.97|72.09|{color:red}-9.9%{color}|
|log|<all>|1000000|rand string|1000|76.82|54.67|{color:red}-28.8%{color}|
|log|<all>|1000000|country|10|129.49|103.63|{color:red}-20.0%{color}|
|log|<all>|1000000|country|25|111.74|102.60|{color:red}-8.2%{color}|
|log|<all>|1000000|country|50|108.82|100.90|{color:red}-7.3%{color}|
|log|<all>|1000000|country|100|108.01|96.84|{color:red}-10.3%{color}|
|log|<all>|1000000|country|500|97.60|72.02|{color:red}-26.2%{color}|
|log|<all>|1000000|country|1000|85.19|54.56|{color:red}-36.0%{color}|
|log|<all>|1000000|rand int|10|151.75|110.37|{color:red}-27.3%{color}|
|log|<all>|1000000|rand int|25|138.06|109.15|{color:red}-20.9%{color}|
|log|<all>|1000000|rand int|50|135.40|106.49|{color:red}-21.4%{color}|
|log|<all>|1000000|rand int|100|108.30|101.86|{color:red}-5.9%{color}|
|log|<all>|1000000|rand int|500|94.45|73.42|{color:red}-22.3%{color}|
|log|<all>|1000000|rand int|1000|88.30|54.71|{color:red}-38.0%{color}|

Some observations:
 
  * MultiPQ seems like it's generally slower, thought it is faster in
    one case, when topN = 10, sorting by title.  It's only faster with
    the *:* (MatchAllDocsQuery) query, not with the TermQuery for
    term=1, which is odd.

  * MultiPQ slows down, relatively, as topN increases.

  * Sorting by int acts differently: MultiPQ is quite a bit slower
    across the board, except for topN=100


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1997:
---------------------------------------

    Attachment: LUCENE-1997.patch

New patch attached:

  * Turn off testing on the balanced index by default (set DO_BALANCED to True if you want to change this)

  * Minor formatting fixes in generating the report

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769039#action_12769039 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Results from John Wang:

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|<all>|1000000|rand string|10|91.76|108.63|{color:green}18.4%{color}|
|log|<all>|1000000|rand string|25|92.39|106.79|{color:green}15.6%{color}|
|log|<all>|1000000|rand string|50|91.30|104.02|{color:green}13.9%{color}|
|log|<all>|1000000|rand string|500|86.16|63.27|{color:red}-26.6%{color}|
|log|<all>|1000000|rand string|1000|76.92|64.85|{color:red}-15.7%{color}|
|log|<all>|1000000|country|10|92.42|108.78|{color:green}17.7%{color}|
|log|<all>|1000000|country|25|92.60|106.26|{color:green}14.8%{color}|
|log|<all>|1000000|country|50|92.64|103.76|{color:green}12.0%{color}|
|log|<all>|1000000|country|500|83.92|50.30|{color:red}-40.1%{color}|
|log|<all>|1000000|country|1000|74.78|46.59|{color:red}-37.7%{color}|
|log|<all>|1000000|rand int|10|114.03|114.85|{color:green}0.7%{color}|
|log|<all>|1000000|rand int|25|113.77|112.92|{color:red}-0.7%{color}|
|log|<all>|1000000|rand int|50|113.36|109.56|{color:red}-3.4%{color}|
|log|<all>|1000000|rand int|500|103.90|66.29|{color:red}-36.2%{color}|
|log|<all>|1000000|rand int|1000|89.52|70.67|{color:red}-21.1%{color}|

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769042#action_12769042 ]

Jake Mannix commented on LUCENE-1997:
-------------------------------------

Hah!  Thanks for posting that, Mark!   Much easier to read. :)

Hey John, can you comment with your hardware specs on this, so it can be recorded for posterity? ;)

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769045#action_12769045 ]

John Wang commented on LUCENE-1997:
-----------------------------------

My machine HW spec:

Model Name: MacBook Pro
  Model Identifier: MacBookPro3,1
  Processor Name: Intel Core 2 Duo
  Processor Speed: 2.4 GHz
  Number Of Processors: 1
  Total Number Of Cores: 2
  L2 Cache: 4 MB
  Memory: 4 GB
  Bus Speed: 800 MHz

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769051#action_12769051 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Another run:

I made the changes to int/string comparator to do the faster compare.
Java 1.5.0_20
Laptop
Quad Core - 2.0 Ghz
Ubuntu 9.10 Kernel 2.6.31
4 GB RAM

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|317925|title|10|87.38|75.42|{color:red}-13.7%{color}|
|log|1|317925|title|25|86.55|74.49|{color:red}-13.9%{color}|
|log|1|317925|title|50|90.49|71.90|{color:red}-20.5%{color}|
|log|1|317925|title|100|88.07|83.08|{color:red}-5.7%{color}|
|log|1|317925|title|500|76.67|54.34|{color:red}-29.1%{color}|
|log|1|317925|title|1000|69.29|38.54|{color:red}-44.4%{color}|
|log|<all>|1000000|title|10|109.01|92.78|{color:red}-14.9%{color}|
|log|<all>|1000000|title|25|108.30|89.43|{color:red}-17.4%{color}|
|log|<all>|1000000|title|50|107.19|85.86|{color:red}-19.9%{color}|
|log|<all>|1000000|title|100|94.84|80.25|{color:red}-15.4%{color}|
|log|<all>|1000000|title|500|78.84|49.10|{color:red}-37.7%{color}|
|log|<all>|1000000|title|1000|72.52|26.90|{color:red}-62.9%{color}|
|log|<all>|1000000|rand string|10|115.32|101.53|{color:red}-12.0%{color}|
|log|<all>|1000000|rand string|25|115.22|91.82|{color:red}-20.3%{color}|
|log|<all>|1000000|rand string|50|114.40|89.70|{color:red}-21.6%{color}|
|log|<all>|1000000|rand string|100|91.30|81.04|{color:red}-11.2%{color}|
|log|<all>|1000000|rand string|500|76.31|43.94|{color:red}-42.4%{color}|
|log|<all>|1000000|rand string|1000|67.33|28.29|{color:red}-58.0%{color}|
|log|<all>|1000000|country|10|115.40|101.46|{color:red}-12.1%{color}|
|log|<all>|1000000|country|25|115.06|92.15|{color:red}-19.9%{color}|
|log|<all>|1000000|country|50|114.03|90.06|{color:red}-21.0%{color}|
|log|<all>|1000000|country|100|99.30|80.07|{color:red}-19.4%{color}|
|log|<all>|1000000|country|500|75.64|43.44|{color:red}-42.6%{color}|
|log|<all>|1000000|country|1000|66.05|27.94|{color:red}-57.7%{color}|
|log|<all>|1000000|rand int|10|118.47|109.30|{color:red}-7.7%{color}|
|log|<all>|1000000|rand int|25|118.72|99.37|{color:red}-16.3%{color}|
|log|<all>|1000000|rand int|50|118.25|95.14|{color:red}-19.5%{color}|
|log|<all>|1000000|rand int|100|97.57|83.39|{color:red}-14.5%{color}|
|log|<all>|1000000|rand int|500|86.55|46.21|{color:red}-46.6%{color}|
|log|<all>|1000000|rand int|1000|78.23|28.94|{color:red}-63.0%{color}|



> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769053#action_12769053 ]

Yonik Seeley commented on LUCENE-1997:
--------------------------------------

While Java5 numbers are still important, I'd say that Java6 (-server of course) should be weighted far heavier?  That must be what a majority of people are running in production for new systems?


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769051#action_12769051 ]

Mark Miller edited comment on LUCENE-1997 at 10/23/09 4:29 AM:
---------------------------------------------------------------

Another run:

I made the changes to int/string comparator to do the faster compare.
Java 1.5.0_20
Laptop - 64bit OS - 64bit JVM - 64bit
Quad Core - 2.0 Ghz
Ubuntu 9.10 Kernel 2.6.31
4 GB RAM

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|317925|title|10|87.38|75.42|{color:red}-13.7%{color}|
|log|1|317925|title|25|86.55|74.49|{color:red}-13.9%{color}|
|log|1|317925|title|50|90.49|71.90|{color:red}-20.5%{color}|
|log|1|317925|title|100|88.07|83.08|{color:red}-5.7%{color}|
|log|1|317925|title|500|76.67|54.34|{color:red}-29.1%{color}|
|log|1|317925|title|1000|69.29|38.54|{color:red}-44.4%{color}|
|log|<all>|1000000|title|10|109.01|92.78|{color:red}-14.9%{color}|
|log|<all>|1000000|title|25|108.30|89.43|{color:red}-17.4%{color}|
|log|<all>|1000000|title|50|107.19|85.86|{color:red}-19.9%{color}|
|log|<all>|1000000|title|100|94.84|80.25|{color:red}-15.4%{color}|
|log|<all>|1000000|title|500|78.84|49.10|{color:red}-37.7%{color}|
|log|<all>|1000000|title|1000|72.52|26.90|{color:red}-62.9%{color}|
|log|<all>|1000000|rand string|10|115.32|101.53|{color:red}-12.0%{color}|
|log|<all>|1000000|rand string|25|115.22|91.82|{color:red}-20.3%{color}|
|log|<all>|1000000|rand string|50|114.40|89.70|{color:red}-21.6%{color}|
|log|<all>|1000000|rand string|100|91.30|81.04|{color:red}-11.2%{color}|
|log|<all>|1000000|rand string|500|76.31|43.94|{color:red}-42.4%{color}|
|log|<all>|1000000|rand string|1000|67.33|28.29|{color:red}-58.0%{color}|
|log|<all>|1000000|country|10|115.40|101.46|{color:red}-12.1%{color}|
|log|<all>|1000000|country|25|115.06|92.15|{color:red}-19.9%{color}|
|log|<all>|1000000|country|50|114.03|90.06|{color:red}-21.0%{color}|
|log|<all>|1000000|country|100|99.30|80.07|{color:red}-19.4%{color}|
|log|<all>|1000000|country|500|75.64|43.44|{color:red}-42.6%{color}|
|log|<all>|1000000|country|1000|66.05|27.94|{color:red}-57.7%{color}|
|log|<all>|1000000|rand int|10|118.47|109.30|{color:red}-7.7%{color}|
|log|<all>|1000000|rand int|25|118.72|99.37|{color:red}-16.3%{color}|
|log|<all>|1000000|rand int|50|118.25|95.14|{color:red}-19.5%{color}|
|log|<all>|1000000|rand int|100|97.57|83.39|{color:red}-14.5%{color}|
|log|<all>|1000000|rand int|500|86.55|46.21|{color:red}-46.6%{color}|
|log|<all>|1000000|rand int|1000|78.23|28.94|{color:red}-63.0%{color}|



      was (Author: markrmiller@...):
    Another run:

I made the changes to int/string comparator to do the faster compare.
Java 1.5.0_20
Laptop
Quad Core - 2.0 Ghz
Ubuntu 9.10 Kernel 2.6.31
4 GB RAM

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|317925|title|10|87.38|75.42|{color:red}-13.7%{color}|
|log|1|317925|title|25|86.55|74.49|{color:red}-13.9%{color}|
|log|1|317925|title|50|90.49|71.90|{color:red}-20.5%{color}|
|log|1|317925|title|100|88.07|83.08|{color:red}-5.7%{color}|
|log|1|317925|title|500|76.67|54.34|{color:red}-29.1%{color}|
|log|1|317925|title|1000|69.29|38.54|{color:red}-44.4%{color}|
|log|<all>|1000000|title|10|109.01|92.78|{color:red}-14.9%{color}|
|log|<all>|1000000|title|25|108.30|89.43|{color:red}-17.4%{color}|
|log|<all>|1000000|title|50|107.19|85.86|{color:red}-19.9%{color}|
|log|<all>|1000000|title|100|94.84|80.25|{color:red}-15.4%{color}|
|log|<all>|1000000|title|500|78.84|49.10|{color:red}-37.7%{color}|
|log|<all>|1000000|title|1000|72.52|26.90|{color:red}-62.9%{color}|
|log|<all>|1000000|rand string|10|115.32|101.53|{color:red}-12.0%{color}|
|log|<all>|1000000|rand string|25|115.22|91.82|{color:red}-20.3%{color}|
|log|<all>|1000000|rand string|50|114.40|89.70|{color:red}-21.6%{color}|
|log|<all>|1000000|rand string|100|91.30|81.04|{color:red}-11.2%{color}|
|log|<all>|1000000|rand string|500|76.31|43.94|{color:red}-42.4%{color}|
|log|<all>|1000000|rand string|1000|67.33|28.29|{color:red}-58.0%{color}|
|log|<all>|1000000|country|10|115.40|101.46|{color:red}-12.1%{color}|
|log|<all>|1000000|country|25|115.06|92.15|{color:red}-19.9%{color}|
|log|<all>|1000000|country|50|114.03|90.06|{color:red}-21.0%{color}|
|log|<all>|1000000|country|100|99.30|80.07|{color:red}-19.4%{color}|
|log|<all>|1000000|country|500|75.64|43.44|{color:red}-42.6%{color}|
|log|<all>|1000000|country|1000|66.05|27.94|{color:red}-57.7%{color}|
|log|<all>|1000000|rand int|10|118.47|109.30|{color:red}-7.7%{color}|
|log|<all>|1000000|rand int|25|118.72|99.37|{color:red}-16.3%{color}|
|log|<all>|1000000|rand int|50|118.25|95.14|{color:red}-19.5%{color}|
|log|<all>|1000000|rand int|100|97.57|83.39|{color:red}-14.5%{color}|
|log|<all>|1000000|rand int|500|86.55|46.21|{color:red}-46.6%{color}|
|log|<all>|1000000|rand int|1000|78.23|28.94|{color:red}-63.0%{color}|


 

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769055#action_12769055 ]

Mark Miller edited comment on LUCENE-1997 at 10/23/09 4:37 AM:
---------------------------------------------------------------

Hey John, did you pull from a wiki dump or use the random index?

*edit*

NM - that explains your shortened table - no wiki results - I go it.

      was (Author: markrmiller@...):
    Hey John, did you pull from a wiki dump or use the random index?
 

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769055#action_12769055 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Hey John, did you pull from a wiki dump or use the random index?

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769056#action_12769056 ]

Jake Mannix commented on LUCENE-1997:
-------------------------------------

Java6 is standard in production servers, since when?  What justified lucene staying java1.4 for so long if this is the case?  In my own experience, my last job only moved to java1.5 a year ago, and at my current company, we're still on 1.5, and I've seen that be pretty common, and I'm in the Valley, where things update pretty quickly.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769058#action_12769058 ]

Jake Mannix commented on LUCENE-1997:
-------------------------------------

I would say that of course weighting more highly linux and solaris should be done over results on macs, because while I love my mac, I've yet to see a production cluster running on MacBook Pros... :)

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769059#action_12769059 ]

Yonik Seeley commented on LUCENE-1997:
--------------------------------------

bq. Java6 is standard in production servers, since when?

Maybe I'm wrong... it  was just a guess. It's just what I've seen most customers deploying new projects on.

bq. What justified lucene staying java1.4 for so long if this is the case?

The decision of what JVM a business should use to deploy their new app is a very different one than what Lucene should require.
A minority of users may be justification enough to avoid requring a new JVM... unless the benefits are really that huge.  Lucene does not target the JVM that most people will be deploying on - if that were the case, I have a feeling we'd be switching to Java6 instead of Java5.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769060#action_12769060 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Same system, Java 1.6.0_15

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|317925|title|10|105.46|97.11|{color:red}-7.9%{color}|
|log|1|317925|title|25|109.08|98.34|{color:red}-9.8%{color}|
|log|1|317925|title|50|108.01|93.99|{color:red}-13.0%{color}|
|log|1|317925|title|100|105.79|84.08|{color:red}-20.5%{color}|
|log|1|317925|title|500|91.12|50.28|{color:red}-44.8%{color}|
|log|1|317925|title|1000|80.51|33.59|{color:red}-58.3%{color}|
|log|<all>|1000000|title|10|113.89|105.39|{color:red}-7.5%{color}|
|log|<all>|1000000|title|25|113.14|102.13|{color:red}-9.7%{color}|
|log|<all>|1000000|title|50|111.30|96.51|{color:red}-13.3%{color}|
|log|<all>|1000000|title|100|86.77|83.86|{color:red}-3.4%{color}|
|log|<all>|1000000|title|500|78.00|42.15|{color:red}-46.0%{color}|
|log|<all>|1000000|title|1000|70.50|27.02|{color:red}-61.7%{color}|
|log|<all>|1000000|rand string|10|107.78|106.09|{color:red}-1.6%{color}|
|log|<all>|1000000|rand string|25|103.09|102.53|{color:red}-0.5%{color}|
|log|<all>|1000000|rand string|50|106.42|95.17|{color:red}-10.6%{color}|
|log|<all>|1000000|rand string|100|86.28|85.41|{color:red}-1.0%{color}|
|log|<all>|1000000|rand string|500|76.69|37.76|{color:red}-50.8%{color}|
|log|<all>|1000000|rand string|1000|68.48|22.95|{color:red}-66.5%{color}|
|log|<all>|1000000|country|10|103.36|106.79|{color:green}3.3%{color}|
|log|<all>|1000000|country|25|103.43|102.69|{color:red}-0.7%{color}|
|log|<all>|1000000|country|50|102.93|94.97|{color:red}-7.7%{color}|
|log|<all>|1000000|country|100|108.49|85.71|{color:red}-21.0%{color}|
|log|<all>|1000000|country|500|80.87|38.23|{color:red}-52.7%{color}|
|log|<all>|1000000|country|1000|67.24|22.79|{color:red}-66.1%{color}|
|log|<all>|1000000|rand int|10|120.59|112.03|{color:red}-7.1%{color}|
|log|<all>|1000000|rand int|25|119.80|107.49|{color:red}-10.3%{color}|
|log|<all>|1000000|rand int|50|119.96|98.84|{color:red}-17.6%{color}|
|log|<all>|1000000|rand int|100|88.58|89.24|{color:green}0.7%{color}|
|log|<all>|1000000|rand int|500|83.50|40.13|{color:red}-51.9%{color}|
|log|<all>|1000000|rand int|1000|74.80|23.83|{color:red}-68.1%{color}|


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769085#action_12769085 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

bq. Java6 is standard in production servers, since when?

bq. Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on.

Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen.
Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea :)

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769088#action_12769088 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

John, what happened to your topn:100 results?

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769089#action_12769089 ]

Yonik Seeley commented on LUCENE-1997:
--------------------------------------

There was a bad stretch in Java6... they plopped in a major JVM upgrade (not just bug fixes) and there were bugs.  I think that's been behind us for a little while now though.  If someone were starting a project today, I'd recommend the latest Java6 JVM.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769090#action_12769090 ]

John Wang commented on LUCENE-1997:
-----------------------------------

bq: topn:100
I had made changes to sortBench.py to look at each run. And forgot to add back in 100 :) My bad.


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...

< Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next >