|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
[VOTE] Commit LUCENE-843 (IndexWriter performance gains)Hi,
I'd like to commit LUCENE-843. The patch has gone through a number of iterations but the final version that's there now (take9) is quite a bit cleaner & simpler than the ones leading up to it and I believe ready. It provides solid indexing performance gains (between 2X-8X), but, it is somewhat more complex than the current "single doc per segment" approach and it does introduce a change to the index format (only when autoCommit=false) whereby multiple segments can share a single set of term vector & stored fields files. Given that it's such a big change I think (?) it's appropriate to ask for a vote (only PMC member votes are binding) to make sure we have consensus that this is net/net a good change for Lucene. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)On 7/2/07, Michael McCandless <lucene@...> wrote:
> I'd like to commit LUCENE-843. +1 Awesome job! > The patch has gone through a number of iterations but the final > version that's there now (take9) is quite a bit cleaner & simpler than > the ones leading up to it and I believe ready. > > It provides solid indexing performance gains (between 2X-8X), but, it > is somewhat more complex than the current "single doc per segment" > approach and it does introduce a change to the index format (only when > autoCommit=false) whereby multiple segments can share a single set of > term vector & stored fields files. I'll miss the elegant single doc approach that's been with us for so long, but one can't ignore the magnitude of these performance gains. > Given that it's such a big change I think (?) it's appropriate to ask > for a vote (only PMC member votes are binding) to make sure we have > consensus that this is net/net a good change for Lucene. IMO, there's no need to be that formal. A simple vote on the dev list (non-committer votes are welcome and carry weight too), and if there's a consensus then everything is good. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote: > Hi, > > I'd like to commit LUCENE-843. > > The patch has gone through a number of iterations but the final > version that's there now (take9) is quite a bit cleaner & simpler than > the ones leading up to it and I believe ready. > > It provides solid indexing performance gains (between 2X-8X), but, it > is somewhat more complex than the current "single doc per segment" > approach and it does introduce a change to the index format (only when > autoCommit=false) whereby multiple segments can share a single set of > term vector & stored fields files. > +0 for now, I will try to review tonight or tomorrow night. From what I gather from reading the issue, etc. it sounds great and you and others have put a lot of hard work into it. Also, from some benchmarking I have done, it seems to sit well with the notion of optimizing merge factor, etc. based on the amount of memory available. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)Also, is it worth considering a couple of things:
1. Do a build version release prior to committing (i.e. 2.2.1) that way we could isolate this change and do a separate release to 2.3. I don't want to do releases just for the sake of releases, but I think we should at least prepare people that the next release (i.e. the one containing 843) has a significant change. I don't think this patch warrants a major revision tick, but it does make sense to have people really scrutinize it and to have them know that there are significant gains to be had. 2. or, at a minimum, do a tag of the trunk right before committing. I just find explicit tags make it easier to rollback or compare diffs if need be Note these suggestions are by no means a judgment of the quality of the patch, just some precautions before such a big change. -Grant On Jul 2, 2007, at 1:31 PM, Grant Ingersoll wrote: > > On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote: > >> Hi, >> >> I'd like to commit LUCENE-843. >> >> The patch has gone through a number of iterations but the final >> version that's there now (take9) is quite a bit cleaner & simpler >> than >> the ones leading up to it and I believe ready. >> >> It provides solid indexing performance gains (between 2X-8X), but, it >> is somewhat more complex than the current "single doc per segment" >> approach and it does introduce a change to the index format (only >> when >> autoCommit=false) whereby multiple segments can share a single set of >> term vector & stored fields files. >> > > +0 for now, I will try to review tonight or tomorrow night. From > what I gather from reading the issue, etc. it sounds great and you > and others have put a lot of hard work into it. Also, from some > benchmarking I have done, it seems to sit well with the notion of > optimizing merge factor, etc. based on the amount of memory available. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@... > For additional commands, e-mail: java-dev-help@... > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)On 7/2/07, Grant Ingersoll <gsingers@...> wrote:
> 2. or, at a minimum, do a tag of the trunk right before committing. > I just find explicit tags make it easier to rollback or compare diffs > if need be You can always use an explicit revision number, which is easy to find out from the bug, or you can even find the closest by time: svn info -r {2006-11-10T00:03:00Z} http://svn.apache.org/repos/asf -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)+1 This is great work! Commit it.
Doug Michael McCandless wrote: > Hi, > > I'd like to commit LUCENE-843. > > The patch has gone through a number of iterations but the final > version that's there now (take9) is quite a bit cleaner & simpler than > the ones leading up to it and I believe ready. > > It provides solid indexing performance gains (between 2X-8X), but, it > is somewhat more complex than the current "single doc per segment" > approach and it does introduce a change to the index format (only when > autoCommit=false) whereby multiple segments can share a single set of > term vector & stored fields files. > > Given that it's such a big change I think (?) it's appropriate to ask > for a vote (only PMC member votes are binding) to make sure we have > consensus that this is net/net a good change for Lucene. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@... > For additional commands, e-mail: java-dev-help@... > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)On Jul 2, 2007, at 4:18 PM, Yonik Seeley wrote: > On 7/2/07, Grant Ingersoll <gsingers@...> wrote: >> 2. or, at a minimum, do a tag of the trunk right before committing. >> I just find explicit tags make it easier to rollback or compare diffs >> if need be > > You can always use an explicit revision number, which is easy to find > out from the bug, or you can even find the closest by time: > Yeah, I know you can do that, I just sometimes like explicit tags for things of this magnitude. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)Mike,
Nice piece of work here. One caveat, I think you mentioned you needed to update fileformats.xml (don't forget to generate the site and commit those changes too), but I don't see that in the patch. Also, do you see any downsides to this patch? Do you think it would ever be the case that a user would not benefit from it? If so, probably would be useful to document them. Other than that, I am +1 Cheers, Grant On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote: > Hi, > > I'd like to commit LUCENE-843. > > The patch has gone through a number of iterations but the final > version that's there now (take9) is quite a bit cleaner & simpler than > the ones leading up to it and I believe ready. > > It provides solid indexing performance gains (between 2X-8X), but, it > is somewhat more complex than the current "single doc per segment" > approach and it does introduce a change to the index format (only when > autoCommit=false) whereby multiple segments can share a single set of > term vector & stored fields files. > > Given that it's such a big change I think (?) it's appropriate to ask > for a vote (only PMC member votes are binding) to make sure we have > consensus that this is net/net a good change for Lucene. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@... > For additional commands, e-mail: java-dev-help@... > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
Re: [VOTE] Commit LUCENE-843 (IndexWriter performance gains)Ahh, right, I will update fileformats.xml & re-build html/PDF (with Forrest 0.8) before committing. The only downside I have now is if you do flush by RAM (which gives best performance), you have to be very careful to work around LUCENE-845 by also setting maxBufferedDocs to be something "around" the right number. However this downside should go away once we resolve LUCENE-845 (which is next on my stack, after the "multiple writers over NFS" that's in progress now!). I will also plant a tag just before committing. Thanks for reviewing, everyone! I will give it another day or so and then commit. Mike "Grant Ingersoll" <gsingers@...> wrote: > Mike, > > Nice piece of work here. One caveat, I think you mentioned you > needed to update fileformats.xml (don't forget to generate the site > and commit those changes too), but I don't see that in the patch. > > Also, do you see any downsides to this patch? Do you think it would > ever be the case that a user would not benefit from it? If so, > probably would be useful to document them. > > Other than that, I am +1 > > Cheers, > Grant > > On Jul 2, 2007, at 9:35 AM, Michael McCandless wrote: > > > Hi, > > > > I'd like to commit LUCENE-843. > > > > The patch has gone through a number of iterations but the final > > version that's there now (take9) is quite a bit cleaner & simpler than > > the ones leading up to it and I believe ready. > > > > It provides solid indexing performance gains (between 2X-8X), but, it > > is somewhat more complex than the current "single doc per segment" > > approach and it does introduce a change to the index format (only when > > autoCommit=false) whereby multiple segments can share a single set of > > term vector & stored fields files. > > > > Given that it's such a big change I think (?) it's appropriate to ask > > for a vote (only PMC member votes are binding) to make sure we have > > consensus that this is net/net a good change for Lucene. > > > > Mike > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscribe@... > > For additional commands, e-mail: java-dev-help@... > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@... > For additional commands, e-mail: java-dev-help@... > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
| Free embeddable forum powered by Nabble | Forum Help |