[jira] Created: (LUCENE-1737) Always use bulk-copy when merging stored fields and term vectors

View: New views
2 Messages — Rating Filter:   Alert me  

[jira] Created: (LUCENE-1737) Always use bulk-copy when merging stored fields and term vectors

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Always use bulk-copy when merging stored fields and term vectors
----------------------------------------------------------------

                 Key: LUCENE-1737
                 URL: https://issues.apache.org/jira/browse/LUCENE-1737
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.9


Lucene has nice optimizations in place during merging of stored fields
(LUCENE-1043) and term vectors (LUCENE-1120) whereby the bytes are
bulk copied to the new segmetn.  This is much faster than decoding &
rewriting one document at a time.

However the optimization is rather brittle: it relies on the mapping
of field name to number to be the same ("congruent") for the segment
being merged.

Unfortunately, the field mapping will be congruent only if the app
adds the same fields in precisely the same order to each document.

I think we should fix IndexWriter to assign the same field number for
a given field that has been assigned in the past.  Ie, when writing a
new segment, we pre-seed the field numbers based on past segments.
All other aspects of FieldInfo would remain fully dynamic.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1737) Always use bulk-copy when merging stored fields and term vectors

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1737:
---------------------------------------

    Fix Version/s:     (was: 2.9)

Clearing 2.9 fix version.

> Always use bulk-copy when merging stored fields and term vectors
> ----------------------------------------------------------------
>
>                 Key: LUCENE-1737
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1737
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>
> Lucene has nice optimizations in place during merging of stored fields
> (LUCENE-1043) and term vectors (LUCENE-1120) whereby the bytes are
> bulk copied to the new segmetn.  This is much faster than decoding &
> rewriting one document at a time.
> However the optimization is rather brittle: it relies on the mapping
> of field name to number to be the same ("congruent") for the segment
> being merged.
> Unfortunately, the field mapping will be congruent only if the app
> adds the same fields in precisely the same order to each document.
> I think we should fix IndexWriter to assign the same field number for
> a given field that has been assigned in the past.  Ie, when writing a
> new segment, we pre-seed the field numbers based on past segments.
> All other aspects of FieldInfo would remain fully dynamic.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...