« Return to Thread: [jira] Created: (TIKA-255) Embedded Visio Content Crashes PPT Parser

[jira] Updated: (TIKA-255) Embedded Visio Content Crashes PPT Parser

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View in Thread


     [ https://issues.apache.org/jira/browse/TIKA-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Weekly updated TIKA-255:
------------------------------

    Attachment: extract-tika.ppt

This PPT file is valid but crashes Tika 0.4 nightly:

@sfx22001:~/tika-reactor# java -jar tika-app/target/tika-app-0.4-SNAPSHOT.jar /home/dew/extract-tika.ppt
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@61c80b01
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:85)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:116)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:57)
Caused by: java.lang.NullPointerException
        at org.apache.poi.hslf.model.SimpleShape.getClientRecords(SimpleShape.java:322)
        at org.apache.poi.hslf.model.SimpleShape.getClientDataRecord(SimpleShape.java:307)
        at org.apache.poi.hslf.model.TextShape.getPlaceholderAtom(TextShape.java:547)
        at org.apache.poi.hslf.model.Sheet.getPlaceholder(Sheet.java:408)
        at org.apache.poi.hslf.model.HeadersFooters.isVisible(HeadersFooters.java:244)
        at org.apache.poi.hslf.model.HeadersFooters.isHeaderVisible(HeadersFooters.java:148)
        at org.apache.poi.hslf.extractor.PowerPointExtractor.getText(PowerPointExtractor.java:173)
        at org.apache.poi.hslf.extractor.PowerPointExtractor.getText(PowerPointExtractor.java:162)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:88)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
        ... 3 more


> Embedded Visio Content Crashes PPT Parser
> -----------------------------------------
>
>                 Key: TIKA-255
>                 URL: https://issues.apache.org/jira/browse/TIKA-255
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Debian 5.0.1
>            Reporter: David Weekly
>         Attachments: extract-tika.ppt
>
>
> The attached PPT is a valid file but crashes Tika. It contains embedded Visio data, which may be the cause for the issue.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

 « Return to Thread: [jira] Created: (TIKA-255) Embedded Visio Content Crashes PPT Parser