|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next > |
|
|
[jira] Created: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting APIExplore performance of multi-PQ vs single-PQ sorting API
-------------------------------------------------------- Key: LUCENE-1997 URL: https://issues.apache.org/jira/browse/LUCENE-1997 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael McCandless Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, where a simpler (non-segment-based) comparator API is proposed that gathers results into multiple PQs (one per segment) and then merges them in the end. I started from John's multi-PQ code and worked it into contrib/benchmark so that we could run perf tests. Then I generified the Python script I use for running search benchmarks (in contrib/benchmark/sortBench.py). The script first creates indexes with 1M docs (based on SortableSingleDocSource, and based on wikipedia, if available). Then it runs various combinations: * Index with 20 balanced segments vs index with the "normal" log segment size * Queries with different numbers of hits (only for wikipedia index) * Different top N * Different sorts (by title, for wikipedia, and by random string, random int, and country for the random index) For each test, 7 search rounds are run and the best QPS is kept. The script runs singlePQ then multiPQ, and records the resulting best QPS for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --------------------------------------- Attachment: LUCENE-1997.patch Attached patch. Note that patch is based on 2.9.x branch, so first checkout 2.9.x, apply the patch, then: cd contrib/benchmark ant compile <edit constants @ top of sortBench.py> python -u sortBench.py -run results python -u sortBench.py -report results The important constants are INDEX_DIR_BASE (where created indexes are stored), WIKI_FILE (points to .tar.bz2 or .tar export of wikipedia; if this file can't be found the script just skips the wikipedia tests). You can also change INDEX_NUM_DOCS and INDEX_NUM_THREADS. If you don't have the wiki export downloaded, that's fine... the script should just run the tests based on the random index. > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767870#action_12767870 ] Michael McCandless commented on LUCENE-1997: -------------------------------------------- OK I ran sortBench.py on opensolaris 2009.06 box, Java 1.6.0_13. It'd be great if others with more mainstream platforms (Linux, Windows) could run this and post back. Raw results (only ran on the log-sized segments): ||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |log|1|318481|title|10|114.26|112.40|{color:red}-1.6%{color}| |log|1|318481|title|25|117.59|110.08|{color:red}-6.4%{color}| |log|1|318481|title|50|116.22|106.96|{color:red}-8.0%{color}| |log|1|318481|title|100|114.48|100.07|{color:red}-12.6%{color}| |log|1|318481|title|500|103.16|73.98|{color:red}-28.3%{color}| |log|1|318481|title|1000|95.60|57.85|{color:red}-39.5%{color}| |log|<all>|1000000|title|10|95.71|109.41|{color:green}14.3%{color}| |log|<all>|1000000|title|25|111.56|101.73|{color:red}-8.8%{color}| |log|<all>|1000000|title|50|110.56|98.84|{color:red}-10.6%{color}| |log|<all>|1000000|title|100|104.09|93.02|{color:red}-10.6%{color}| |log|<all>|1000000|title|500|93.36|66.67|{color:red}-28.6%{color}| |log|<all>|1000000|title|1000|97.07|50.03|{color:red}-48.5%{color}| |log|<all>|1000000|rand string|10|118.10|109.63|{color:red}-7.2%{color}| |log|<all>|1000000|rand string|25|107.68|102.33|{color:red}-5.0%{color}| |log|<all>|1000000|rand string|50|107.12|100.37|{color:red}-6.3%{color}| |log|<all>|1000000|rand string|100|110.63|95.17|{color:red}-14.0%{color}| |log|<all>|1000000|rand string|500|79.97|72.09|{color:red}-9.9%{color}| |log|<all>|1000000|rand string|1000|76.82|54.67|{color:red}-28.8%{color}| |log|<all>|1000000|country|10|129.49|103.63|{color:red}-20.0%{color}| |log|<all>|1000000|country|25|111.74|102.60|{color:red}-8.2%{color}| |log|<all>|1000000|country|50|108.82|100.90|{color:red}-7.3%{color}| |log|<all>|1000000|country|100|108.01|96.84|{color:red}-10.3%{color}| |log|<all>|1000000|country|500|97.60|72.02|{color:red}-26.2%{color}| |log|<all>|1000000|country|1000|85.19|54.56|{color:red}-36.0%{color}| |log|<all>|1000000|rand int|10|151.75|110.37|{color:red}-27.3%{color}| |log|<all>|1000000|rand int|25|138.06|109.15|{color:red}-20.9%{color}| |log|<all>|1000000|rand int|50|135.40|106.49|{color:red}-21.4%{color}| |log|<all>|1000000|rand int|100|108.30|101.86|{color:red}-5.9%{color}| |log|<all>|1000000|rand int|500|94.45|73.42|{color:red}-22.3%{color}| |log|<all>|1000000|rand int|1000|88.30|54.71|{color:red}-38.0%{color}| Some observations: * MultiPQ seems like it's generally slower, thought it is faster in one case, when topN = 10, sorting by title. It's only faster with the *:* (MatchAllDocsQuery) query, not with the TermQuery for term=1, which is odd. * MultiPQ slows down, relatively, as topN increases. * Sorting by int acts differently: MultiPQ is quite a bit slower across the board, except for topN=100 > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --------------------------------------- Attachment: LUCENE-1997.patch New patch attached: * Turn off testing on the balanced index by default (set DO_BALANCED to True if you want to change this) * Minor formatting fixes in generating the report > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769039#action_12769039 ] Mark Miller commented on LUCENE-1997: ------------------------------------- Results from John Wang: ||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |log|<all>|1000000|rand string|10|91.76|108.63|{color:green}18.4%{color}| |log|<all>|1000000|rand string|25|92.39|106.79|{color:green}15.6%{color}| |log|<all>|1000000|rand string|50|91.30|104.02|{color:green}13.9%{color}| |log|<all>|1000000|rand string|500|86.16|63.27|{color:red}-26.6%{color}| |log|<all>|1000000|rand string|1000|76.92|64.85|{color:red}-15.7%{color}| |log|<all>|1000000|country|10|92.42|108.78|{color:green}17.7%{color}| |log|<all>|1000000|country|25|92.60|106.26|{color:green}14.8%{color}| |log|<all>|1000000|country|50|92.64|103.76|{color:green}12.0%{color}| |log|<all>|1000000|country|500|83.92|50.30|{color:red}-40.1%{color}| |log|<all>|1000000|country|1000|74.78|46.59|{color:red}-37.7%{color}| |log|<all>|1000000|rand int|10|114.03|114.85|{color:green}0.7%{color}| |log|<all>|1000000|rand int|25|113.77|112.92|{color:red}-0.7%{color}| |log|<all>|1000000|rand int|50|113.36|109.56|{color:red}-3.4%{color}| |log|<all>|1000000|rand int|500|103.90|66.29|{color:red}-36.2%{color}| |log|<all>|1000000|rand int|1000|89.52|70.67|{color:red}-21.1%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769042#action_12769042 ] Jake Mannix commented on LUCENE-1997: ------------------------------------- Hah! Thanks for posting that, Mark! Much easier to read. :) Hey John, can you comment with your hardware specs on this, so it can be recorded for posterity? ;) > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769045#action_12769045 ] John Wang commented on LUCENE-1997: ----------------------------------- My machine HW spec: Model Name: MacBook Pro Model Identifier: MacBookPro3,1 Processor Name: Intel Core 2 Duo Processor Speed: 2.4 GHz Number Of Processors: 1 Total Number Of Cores: 2 L2 Cache: 4 MB Memory: 4 GB Bus Speed: 800 MHz > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769051#action_12769051 ] Mark Miller commented on LUCENE-1997: ------------------------------------- Another run: I made the changes to int/string comparator to do the faster compare. Java 1.5.0_20 Laptop Quad Core - 2.0 Ghz Ubuntu 9.10 Kernel 2.6.31 4 GB RAM ||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |log|1|317925|title|10|87.38|75.42|{color:red}-13.7%{color}| |log|1|317925|title|25|86.55|74.49|{color:red}-13.9%{color}| |log|1|317925|title|50|90.49|71.90|{color:red}-20.5%{color}| |log|1|317925|title|100|88.07|83.08|{color:red}-5.7%{color}| |log|1|317925|title|500|76.67|54.34|{color:red}-29.1%{color}| |log|1|317925|title|1000|69.29|38.54|{color:red}-44.4%{color}| |log|<all>|1000000|title|10|109.01|92.78|{color:red}-14.9%{color}| |log|<all>|1000000|title|25|108.30|89.43|{color:red}-17.4%{color}| |log|<all>|1000000|title|50|107.19|85.86|{color:red}-19.9%{color}| |log|<all>|1000000|title|100|94.84|80.25|{color:red}-15.4%{color}| |log|<all>|1000000|title|500|78.84|49.10|{color:red}-37.7%{color}| |log|<all>|1000000|title|1000|72.52|26.90|{color:red}-62.9%{color}| |log|<all>|1000000|rand string|10|115.32|101.53|{color:red}-12.0%{color}| |log|<all>|1000000|rand string|25|115.22|91.82|{color:red}-20.3%{color}| |log|<all>|1000000|rand string|50|114.40|89.70|{color:red}-21.6%{color}| |log|<all>|1000000|rand string|100|91.30|81.04|{color:red}-11.2%{color}| |log|<all>|1000000|rand string|500|76.31|43.94|{color:red}-42.4%{color}| |log|<all>|1000000|rand string|1000|67.33|28.29|{color:red}-58.0%{color}| |log|<all>|1000000|country|10|115.40|101.46|{color:red}-12.1%{color}| |log|<all>|1000000|country|25|115.06|92.15|{color:red}-19.9%{color}| |log|<all>|1000000|country|50|114.03|90.06|{color:red}-21.0%{color}| |log|<all>|1000000|country|100|99.30|80.07|{color:red}-19.4%{color}| |log|<all>|1000000|country|500|75.64|43.44|{color:red}-42.6%{color}| |log|<all>|1000000|country|1000|66.05|27.94|{color:red}-57.7%{color}| |log|<all>|1000000|rand int|10|118.47|109.30|{color:red}-7.7%{color}| |log|<all>|1000000|rand int|25|118.72|99.37|{color:red}-16.3%{color}| |log|<all>|1000000|rand int|50|118.25|95.14|{color:red}-19.5%{color}| |log|<all>|1000000|rand int|100|97.57|83.39|{color:red}-14.5%{color}| |log|<all>|1000000|rand int|500|86.55|46.21|{color:red}-46.6%{color}| |log|<all>|1000000|rand int|1000|78.23|28.94|{color:red}-63.0%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769053#action_12769053 ] Yonik Seeley commented on LUCENE-1997: -------------------------------------- While Java5 numbers are still important, I'd say that Java6 (-server of course) should be weighted far heavier? That must be what a majority of people are running in production for new systems? > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769051#action_12769051 ] Mark Miller edited comment on LUCENE-1997 at 10/23/09 4:29 AM: --------------------------------------------------------------- Another run: I made the changes to int/string comparator to do the faster compare. Java 1.5.0_20 Laptop - 64bit OS - 64bit JVM - 64bit Quad Core - 2.0 Ghz Ubuntu 9.10 Kernel 2.6.31 4 GB RAM ||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |log|1|317925|title|10|87.38|75.42|{color:red}-13.7%{color}| |log|1|317925|title|25|86.55|74.49|{color:red}-13.9%{color}| |log|1|317925|title|50|90.49|71.90|{color:red}-20.5%{color}| |log|1|317925|title|100|88.07|83.08|{color:red}-5.7%{color}| |log|1|317925|title|500|76.67|54.34|{color:red}-29.1%{color}| |log|1|317925|title|1000|69.29|38.54|{color:red}-44.4%{color}| |log|<all>|1000000|title|10|109.01|92.78|{color:red}-14.9%{color}| |log|<all>|1000000|title|25|108.30|89.43|{color:red}-17.4%{color}| |log|<all>|1000000|title|50|107.19|85.86|{color:red}-19.9%{color}| |log|<all>|1000000|title|100|94.84|80.25|{color:red}-15.4%{color}| |log|<all>|1000000|title|500|78.84|49.10|{color:red}-37.7%{color}| |log|<all>|1000000|title|1000|72.52|26.90|{color:red}-62.9%{color}| |log|<all>|1000000|rand string|10|115.32|101.53|{color:red}-12.0%{color}| |log|<all>|1000000|rand string|25|115.22|91.82|{color:red}-20.3%{color}| |log|<all>|1000000|rand string|50|114.40|89.70|{color:red}-21.6%{color}| |log|<all>|1000000|rand string|100|91.30|81.04|{color:red}-11.2%{color}| |log|<all>|1000000|rand string|500|76.31|43.94|{color:red}-42.4%{color}| |log|<all>|1000000|rand string|1000|67.33|28.29|{color:red}-58.0%{color}| |log|<all>|1000000|country|10|115.40|101.46|{color:red}-12.1%{color}| |log|<all>|1000000|country|25|115.06|92.15|{color:red}-19.9%{color}| |log|<all>|1000000|country|50|114.03|90.06|{color:red}-21.0%{color}| |log|<all>|1000000|country|100|99.30|80.07|{color:red}-19.4%{color}| |log|<all>|1000000|country|500|75.64|43.44|{color:red}-42.6%{color}| |log|<all>|1000000|country|1000|66.05|27.94|{color:red}-57.7%{color}| |log|<all>|1000000|rand int|10|118.47|109.30|{color:red}-7.7%{color}| |log|<all>|1000000|rand int|25|118.72|99.37|{color:red}-16.3%{color}| |log|<all>|1000000|rand int|50|118.25|95.14|{color:red}-19.5%{color}| |log|<all>|1000000|rand int|100|97.57|83.39|{color:red}-14.5%{color}| |log|<all>|1000000|rand int|500|86.55|46.21|{color:red}-46.6%{color}| |log|<all>|1000000|rand int|1000|78.23|28.94|{color:red}-63.0%{color}| was (Author: markrmiller@...): Another run: I made the changes to int/string comparator to do the faster compare. Java 1.5.0_20 Laptop Quad Core - 2.0 Ghz Ubuntu 9.10 Kernel 2.6.31 4 GB RAM ||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |log|1|317925|title|10|87.38|75.42|{color:red}-13.7%{color}| |log|1|317925|title|25|86.55|74.49|{color:red}-13.9%{color}| |log|1|317925|title|50|90.49|71.90|{color:red}-20.5%{color}| |log|1|317925|title|100|88.07|83.08|{color:red}-5.7%{color}| |log|1|317925|title|500|76.67|54.34|{color:red}-29.1%{color}| |log|1|317925|title|1000|69.29|38.54|{color:red}-44.4%{color}| |log|<all>|1000000|title|10|109.01|92.78|{color:red}-14.9%{color}| |log|<all>|1000000|title|25|108.30|89.43|{color:red}-17.4%{color}| |log|<all>|1000000|title|50|107.19|85.86|{color:red}-19.9%{color}| |log|<all>|1000000|title|100|94.84|80.25|{color:red}-15.4%{color}| |log|<all>|1000000|title|500|78.84|49.10|{color:red}-37.7%{color}| |log|<all>|1000000|title|1000|72.52|26.90|{color:red}-62.9%{color}| |log|<all>|1000000|rand string|10|115.32|101.53|{color:red}-12.0%{color}| |log|<all>|1000000|rand string|25|115.22|91.82|{color:red}-20.3%{color}| |log|<all>|1000000|rand string|50|114.40|89.70|{color:red}-21.6%{color}| |log|<all>|1000000|rand string|100|91.30|81.04|{color:red}-11.2%{color}| |log|<all>|1000000|rand string|500|76.31|43.94|{color:red}-42.4%{color}| |log|<all>|1000000|rand string|1000|67.33|28.29|{color:red}-58.0%{color}| |log|<all>|1000000|country|10|115.40|101.46|{color:red}-12.1%{color}| |log|<all>|1000000|country|25|115.06|92.15|{color:red}-19.9%{color}| |log|<all>|1000000|country|50|114.03|90.06|{color:red}-21.0%{color}| |log|<all>|1000000|country|100|99.30|80.07|{color:red}-19.4%{color}| |log|<all>|1000000|country|500|75.64|43.44|{color:red}-42.6%{color}| |log|<all>|1000000|country|1000|66.05|27.94|{color:red}-57.7%{color}| |log|<all>|1000000|rand int|10|118.47|109.30|{color:red}-7.7%{color}| |log|<all>|1000000|rand int|25|118.72|99.37|{color:red}-16.3%{color}| |log|<all>|1000000|rand int|50|118.25|95.14|{color:red}-19.5%{color}| |log|<all>|1000000|rand int|100|97.57|83.39|{color:red}-14.5%{color}| |log|<all>|1000000|rand int|500|86.55|46.21|{color:red}-46.6%{color}| |log|<all>|1000000|rand int|1000|78.23|28.94|{color:red}-63.0%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769055#action_12769055 ] Mark Miller edited comment on LUCENE-1997 at 10/23/09 4:37 AM: --------------------------------------------------------------- Hey John, did you pull from a wiki dump or use the random index? *edit* NM - that explains your shortened table - no wiki results - I go it. was (Author: markrmiller@...): Hey John, did you pull from a wiki dump or use the random index? > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769055#action_12769055 ] Mark Miller commented on LUCENE-1997: ------------------------------------- Hey John, did you pull from a wiki dump or use the random index? > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769056#action_12769056 ] Jake Mannix commented on LUCENE-1997: ------------------------------------- Java6 is standard in production servers, since when? What justified lucene staying java1.4 for so long if this is the case? In my own experience, my last job only moved to java1.5 a year ago, and at my current company, we're still on 1.5, and I've seen that be pretty common, and I'm in the Valley, where things update pretty quickly. > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769058#action_12769058 ] Jake Mannix commented on LUCENE-1997: ------------------------------------- I would say that of course weighting more highly linux and solaris should be done over results on macs, because while I love my mac, I've yet to see a production cluster running on MacBook Pros... :) > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769059#action_12769059 ] Yonik Seeley commented on LUCENE-1997: -------------------------------------- bq. Java6 is standard in production servers, since when? Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on. bq. What justified lucene staying java1.4 for so long if this is the case? The decision of what JVM a business should use to deploy their new app is a very different one than what Lucene should require. A minority of users may be justification enough to avoid requring a new JVM... unless the benefits are really that huge. Lucene does not target the JVM that most people will be deploying on - if that were the case, I have a feeling we'd be switching to Java6 instead of Java5. > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769060#action_12769060 ] Mark Miller commented on LUCENE-1997: ------------------------------------- Same system, Java 1.6.0_15 ||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |log|1|317925|title|10|105.46|97.11|{color:red}-7.9%{color}| |log|1|317925|title|25|109.08|98.34|{color:red}-9.8%{color}| |log|1|317925|title|50|108.01|93.99|{color:red}-13.0%{color}| |log|1|317925|title|100|105.79|84.08|{color:red}-20.5%{color}| |log|1|317925|title|500|91.12|50.28|{color:red}-44.8%{color}| |log|1|317925|title|1000|80.51|33.59|{color:red}-58.3%{color}| |log|<all>|1000000|title|10|113.89|105.39|{color:red}-7.5%{color}| |log|<all>|1000000|title|25|113.14|102.13|{color:red}-9.7%{color}| |log|<all>|1000000|title|50|111.30|96.51|{color:red}-13.3%{color}| |log|<all>|1000000|title|100|86.77|83.86|{color:red}-3.4%{color}| |log|<all>|1000000|title|500|78.00|42.15|{color:red}-46.0%{color}| |log|<all>|1000000|title|1000|70.50|27.02|{color:red}-61.7%{color}| |log|<all>|1000000|rand string|10|107.78|106.09|{color:red}-1.6%{color}| |log|<all>|1000000|rand string|25|103.09|102.53|{color:red}-0.5%{color}| |log|<all>|1000000|rand string|50|106.42|95.17|{color:red}-10.6%{color}| |log|<all>|1000000|rand string|100|86.28|85.41|{color:red}-1.0%{color}| |log|<all>|1000000|rand string|500|76.69|37.76|{color:red}-50.8%{color}| |log|<all>|1000000|rand string|1000|68.48|22.95|{color:red}-66.5%{color}| |log|<all>|1000000|country|10|103.36|106.79|{color:green}3.3%{color}| |log|<all>|1000000|country|25|103.43|102.69|{color:red}-0.7%{color}| |log|<all>|1000000|country|50|102.93|94.97|{color:red}-7.7%{color}| |log|<all>|1000000|country|100|108.49|85.71|{color:red}-21.0%{color}| |log|<all>|1000000|country|500|80.87|38.23|{color:red}-52.7%{color}| |log|<all>|1000000|country|1000|67.24|22.79|{color:red}-66.1%{color}| |log|<all>|1000000|rand int|10|120.59|112.03|{color:red}-7.1%{color}| |log|<all>|1000000|rand int|25|119.80|107.49|{color:red}-10.3%{color}| |log|<all>|1000000|rand int|50|119.96|98.84|{color:red}-17.6%{color}| |log|<all>|1000000|rand int|100|88.58|89.24|{color:green}0.7%{color}| |log|<all>|1000000|rand int|500|83.50|40.13|{color:red}-51.9%{color}| |log|<all>|1000000|rand int|1000|74.80|23.83|{color:red}-68.1%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769085#action_12769085 ] Mark Miller commented on LUCENE-1997: ------------------------------------- bq. Java6 is standard in production servers, since when? bq. Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on. Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen. Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea :) > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769088#action_12769088 ] Mark Miller commented on LUCENE-1997: ------------------------------------- John, what happened to your topn:100 results? > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769089#action_12769089 ] Yonik Seeley commented on LUCENE-1997: -------------------------------------- There was a bad stretch in Java6... they plopped in a major JVM upgrade (not just bug fixes) and there were bugs. I think that's been behind us for a little while now though. If someone were starting a project today, I'd recommend the latest Java6 JVM. > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769090#action_12769090 ] John Wang commented on LUCENE-1997: ----------------------------------- bq: topn:100 I had made changes to sortBench.py to look at each run. And forgot to add back in 100 :) My bad. > Explore performance of multi-PQ vs single-PQ sorting API > -------------------------------------------------------- > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |