« Return to Thread: [jira] Created: (TIKA-257) Uncorrect mime-type detection for ooxml

[jira] Created: (TIKA-257) Uncorrect mime-type detection for ooxml

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View in Thread

Uncorrect mime-type detection for ooxml
---------------------------------------

                 Key: TIKA-257
                 URL: https://issues.apache.org/jira/browse/TIKA-257
             Project: Tika
          Issue Type: Bug
          Components: general
    Affects Versions: 0.4
            Reporter: Maxim Valyanskiy


MimeTypes detects docx (and other office XML documents) as 'application/zip' when file does not have proper extension:

$ java -jar tika-app/target/tika-app-0.4-SNAPSHOT.jar -m /home/maxcom/download-tmp/proto.docx
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
resourceName: proto.docx

$ cat /home/maxcom/download-tmp/proto.docx | java -jar tika-app/target/tika-app-0.4-SNAPSHOT.jar -m
Content-Type: application/zip

This breaks text extraction when filename is not known

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

 « Return to Thread: [jira] Created: (TIKA-257) Uncorrect mime-type detection for ooxml