[jira] Created: (SOLR-1536) Support for TokenFilters that may modify input documents

View: New views
4 Messages — Rating Filter:   Alert me  

[jira] Created: (SOLR-1536) Support for TokenFilters that may modify input documents

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Support for TokenFilters that may modify input documents
--------------------------------------------------------

                 Key: SOLR-1536
                 URL: https://issues.apache.org/jira/browse/SOLR-1536
             Project: Solr
          Issue Type: New Feature
          Components: Analysis
    Affects Versions: 1.5
            Reporter: Andrzej Bialecki
         Attachments: altering.patch

In some scenarios it's useful to be able to create or modify fields in the input document based on analysis of other fields of this document. This need arises e.g. when indexing multilingual documents, or when doing NLP processing such as NER. However, currently this is not possible to do.

This issue provides an implementation of this functionality that consists of the following parts:

* DocumentAlteringFilterFactory - abstract superclass that indicates that TokenFilter-s created from this factory may modify fields in a SolrInputDocument.

* TypeAsFieldFilterFactory - example implementation that illustrates this concept, with a JUnit test.

* DocumentBuilder modifications to support this functionality.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1536) Support for TokenFilters that may modify input documents

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated SOLR-1536:
------------------------------------

    Attachment: altering.patch

> Support for TokenFilters that may modify input documents
> --------------------------------------------------------
>
>                 Key: SOLR-1536
>                 URL: https://issues.apache.org/jira/browse/SOLR-1536
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>    Affects Versions: 1.5
>            Reporter: Andrzej Bialecki
>         Attachments: altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the input document based on analysis of other fields of this document. This need arises e.g. when indexing multilingual documents, or when doing NLP processing such as NER. However, currently this is not possible to do.
> This issue provides an implementation of this functionality that consists of the following parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that TokenFilter-s created from this factory may modify fields in a SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this concept, with a JUnit test.
> * DocumentBuilder modifications to support this functionality.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1536) Support for TokenFilters that may modify input documents

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774057#action_12774057 ]

Otis Gospodnetic commented on SOLR-1536:
----------------------------------------

Is this better than writing a custom UpdateRequestProcessor that takes the value of the incoming SolrInputDocument (SID), does something to it, removes the original field, and adds the modified version back to SID?


> Support for TokenFilters that may modify input documents
> --------------------------------------------------------
>
>                 Key: SOLR-1536
>                 URL: https://issues.apache.org/jira/browse/SOLR-1536
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>    Affects Versions: 1.5
>            Reporter: Andrzej Bialecki
>         Attachments: altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the input document based on analysis of other fields of this document. This need arises e.g. when indexing multilingual documents, or when doing NLP processing such as NER. However, currently this is not possible to do.
> This issue provides an implementation of this functionality that consists of the following parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that TokenFilter-s created from this factory may modify fields in a SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this concept, with a JUnit test.
> * DocumentBuilder modifications to support this functionality.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1536) Support for TokenFilters that may modify input documents

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774162#action_12774162 ]

Andrzej Bialecki  commented on SOLR-1536:
-----------------------------------------

My opinion may be biased, but I'll try to be as objective as I can ;) I think it's better, because it provides you much more flexibility in building analysis & indexing chains without coding. If we went with URProcessor you would have to implement a new one whenever your analysis chain changes ... With the approach in this patch it's just a configuration issue, and not an issue of implementing as many custom update processors as there are possible combinations ...

> Support for TokenFilters that may modify input documents
> --------------------------------------------------------
>
>                 Key: SOLR-1536
>                 URL: https://issues.apache.org/jira/browse/SOLR-1536
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>    Affects Versions: 1.5
>            Reporter: Andrzej Bialecki
>         Attachments: altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the input document based on analysis of other fields of this document. This need arises e.g. when indexing multilingual documents, or when doing NLP processing such as NER. However, currently this is not possible to do.
> This issue provides an implementation of this functionality that consists of the following parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that TokenFilter-s created from this factory may modify fields in a SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this concept, with a JUnit test.
> * DocumentBuilder modifications to support this functionality.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.