|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (TIKA-317) Annotation-based Tika configurationAnnotation-based Tika configuration
----------------------------------- Key: TIKA-317 URL: https://issues.apache.org/jira/browse/TIKA-317 Project: Tika Issue Type: Improvement Components: parser Reporter: Jukka Zitting Assignee: Jukka Zitting Priority: Minor Fix For: 0.5 I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-317) Annotation-based Tika configuration[ https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774521#action_12774521 ] Chris A. Mattmann commented on TIKA-317: ---------------------------------------- Hey Jukka: could you explain how this will be simpler? I, personally, like the tika-config.xml file. Details, please :) > Annotation-based Tika configuration > ----------------------------------- > > Key: TIKA-317 > URL: https://issues.apache.org/jira/browse/TIKA-317 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.5 > > > I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-317) Annotation-based Tika configuration[ https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774534#action_12774534 ] Benson Margulies commented on TIKA-317: --------------------------------------- I'm with Jukka. I needed to replace one processor. Having to copy and modify the xml file, and then forever maintain my mutant version as new Tika releases change the rest of the contents that I don't want to change, is not a good prospect. > Annotation-based Tika configuration > ----------------------------------- > > Key: TIKA-317 > URL: https://issues.apache.org/jira/browse/TIKA-317 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.5 > > > I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-317) Annotation-based Tika configuration[ https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774542#action_12774542 ] Jukka Zitting commented on TIKA-317: ------------------------------------ As Benson mentioned, a pretty typical deployment scenario is one where you want to extend Tika with a few custom Parser classes. Currently you'd either need to maintain a custom version of the full configuration file, or do some CompositeParser magic to inject your custom parsers at runtime. Neither option is ideal. Another concern of mine is that the current configuration mechanism disconnects the list of supported media types from the parser implementation class. It would be better if that list was maintained in the same Java source file instead of in the XML configuration. Thinking further, there's some interest in making Tika easy to use in more dynamic environments like an OSGi container where new parser components may be added to or removed from the system at any time. A static configuration file does not work that well in such situations. So my idea is to move the list of media types supported by a Parser class to a class annotation (or perhaps a getSupportedTypes() method that would work better with composite parsers) and replace the tika-config.xml file with a META-INF/services/org.apache.tika.parser.Parser file that simply lists all the Parser implementations within that jar file. > Annotation-based Tika configuration > ----------------------------------- > > Key: TIKA-317 > URL: https://issues.apache.org/jira/browse/TIKA-317 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.5 > > > I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-317) Annotation-based Tika configuration[ https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774543#action_12774543 ] Chris A. Mattmann commented on TIKA-317: ---------------------------------------- Thanks for the more detail Jukka, but I fail to see how co-locating metadata with code (as in the case of JDK annotations) is any better of a mechanism that separating out such configuration into an XML file, Also, what is the difference between having the information in the tika-config.xml file versus locating (some of) that information int a META-INF/services/o.a.tika.parser.Parser file? I guess I just need to understand more b/c I'm missing something? > Annotation-based Tika configuration > ----------------------------------- > > Key: TIKA-317 > URL: https://issues.apache.org/jira/browse/TIKA-317 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.5 > > > I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (TIKA-317) Annotation-based Tika configuration[ https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774546#action_12774546 ] Jukka Zitting commented on TIKA-317: ------------------------------------ Re: co-locating metadata with code; Doing so makes it easier to support multiple different configuration mechanisms (default Tika config, programmatic configuration, OSGi services, IoC containers, etc.) as you don't need to duplicate the media type lists for each different way of configuring things. Re: tika-config.xml vs. META-INF/services/...; The service provider mechanism [1] makes it easy to add custom parser implementations without having to maintain a separate copy of the full Tika configuration file. You could for example create a my-custom-parsers.jar file with a META-INF/services/o.a.tika.parser.Parser file that lists only your custom parser classes. When you add that jar to the classpath, Tika would then automatically pick up those parsers in addition to the standard parser classes from the tika-parsers jar. [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html#Service Provider > Annotation-based Tika configuration > ----------------------------------- > > Key: TIKA-317 > URL: https://issues.apache.org/jira/browse/TIKA-317 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.5 > > > I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (TIKA-317) Annotation-based Tika configuration[ https://issues.apache.org/jira/browse/TIKA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-317: ------------------------------- Fix Version/s: (was: 0.5) Postponing to after 0.5 > Annotation-based Tika configuration > ----------------------------------- > > Key: TIKA-317 > URL: https://issues.apache.org/jira/browse/TIKA-317 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > > I'd like to simplify Tika configuration and make it easier to customize by pushing the information in tika-config.xml to Parser annotations and Java SPI service files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
| Free embeddable forum powered by Nabble | Forum Help |