[jira] Created: (TIKA-298) CompositeParser.getParser() should use mimetype hierarchy when falling back

View: New views
3 Messages — Rating Filter:   Alert me  

[jira] Created: (TIKA-298) CompositeParser.getParser() should use mimetype hierarchy when falling back

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

CompositeParser.getParser() should use mimetype hierarchy when falling back
---------------------------------------------------------------------------

                 Key: TIKA-298
                 URL: https://issues.apache.org/jira/browse/TIKA-298
             Project: Tika
          Issue Type: Improvement
    Affects Versions: 0.4
            Reporter: Ken Krugler


CompositeParser.getParser() doesn't use supertypes when falling back - if it can't get a parser for the exact mimetype, then it goes
straight to the fallback parser.

So, for example, if the file mimetype is application/<whatever>+xml, and no parser exists for it, then you get the default "do nothing" parser versus the XML parser.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-298) CompositeParser.getParser() should use mimetype hierarchy when falling back

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/TIKA-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764471#action_12764471 ]

Ken Krugler commented on TIKA-298:
----------------------------------

Jukka said on the mailing list:

========================================================
Note that both the MimeType.getSuperType()  method already does some
of this and we have related supertype settings stored in the
tika-mimetypes.xml configuration. The type registry could also be told
about the +xml convention and related implicit supertype settings like
the ones encoded in the MediaType.isSpecializationOf() method.

(Note that we currently have both MimeType and MediaType classes for
similar purposes. This is due to an ongoing redesign of the mime type
registry. For now it's probably best to work on the MimeType class
until the redesign is more complete.)
========================================================

> CompositeParser.getParser() should use mimetype hierarchy when falling back
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-298
>                 URL: https://issues.apache.org/jira/browse/TIKA-298
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ken Krugler
>
> CompositeParser.getParser() doesn't use supertypes when falling back - if it can't get a parser for the exact mimetype, then it goes
> straight to the fallback parser.
> So, for example, if the file mimetype is application/<whatever>+xml, and no parser exists for it, then you get the default "do nothing" parser versus the XML parser.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-298) CompositeParser.getParser() should use mimetype hierarchy when falling back

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/TIKA-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-298:
-----------------------------------

    Component/s: parser

- set fix component

> CompositeParser.getParser() should use mimetype hierarchy when falling back
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-298
>                 URL: https://issues.apache.org/jira/browse/TIKA-298
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.4
>            Reporter: Ken Krugler
>
> CompositeParser.getParser() doesn't use supertypes when falling back - if it can't get a parser for the exact mimetype, then it goes
> straight to the fallback parser.
> So, for example, if the file mimetype is application/<whatever>+xml, and no parser exists for it, then you get the default "do nothing" parser versus the XML parser.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.