[jira] Created: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)

View: New views
4 Messages — Rating Filter:   Alert me  

[jira] Created: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)
------------------------------------------------------------------------------------------------

                 Key: TIKA-316
                 URL: https://issues.apache.org/jira/browse/TIKA-316
             Project: Tika
          Issue Type: Bug
    Affects Versions: 0.4, 0.5
         Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio 2003
            Reporter: Mike Hays


tika-app (0.4 and 0.5 nightly) return the following when attempting to parse a Visio 2003 file (other versions may be affected):

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@145e044
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
        at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
        at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
        at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
        at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
        at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
        at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95)
        at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
        at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
        ... 3 more

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Hays updated TIKA-316:
---------------------------

    Attachment: repro-TIKA-316.vsd

Run either:
java -jar tika-app-0.4.jar repro-TIKA-316.vsd
java -jar tika-app-0.5-SNAPSHOT.jar repro-TIKA-316.vsd

> Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)
> ------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-316
>                 URL: https://issues.apache.org/jira/browse/TIKA-316
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.4, 0.5
>         Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio 2003
>            Reporter: Mike Hays
>         Attachments: repro-TIKA-316.vsd
>
>
> tika-app (0.4 and 0.5 nightly) return the following when attempting to parse a Visio 2003 file (other versions may be affected):
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@145e044
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
> Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
>         at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
>         at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>         at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95)
>         at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
>         at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
>         at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
>         ... 3 more

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-316:
-----------------------------------

    Component/s: cli

- set fix component

> Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)
> ------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-316
>                 URL: https://issues.apache.org/jira/browse/TIKA-316
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 0.4, 0.5
>         Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio 2003
>            Reporter: Mike Hays
>         Attachments: repro-TIKA-316.vsd
>
>
> tika-app (0.4 and 0.5 nightly) return the following when attempting to parse a Visio 2003 file (other versions may be affected):
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@145e044
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
> Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
>         at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
>         at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>         at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95)
>         at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
>         at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
>         at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
>         ... 3 more

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-316) Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/TIKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-316:
-------------------------------

    Component/s:     (was: cli)
                 parser

Looks like this is caused by some underlying POI issue, i.e. the HDGF code in POI fails to interpret this file correctly.

It would be great if someone could report this issue upstream to POI and add a reference to that issue here.

> Parsing Visio diagrams with tika-app causes TikaException (Found a chunk with a negative length)
> ------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-316
>                 URL: https://issues.apache.org/jira/browse/TIKA-316
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4, 0.5
>         Environment: Windows Server 2003 SP2, JRE 1.6.0_16, tika-app, Visio 2003
>            Reporter: Mike Hays
>         Attachments: repro-TIKA-316.vsd
>
>
> tika-app (0.4 and 0.5 nightly) return the following when attempting to parse a Visio 2003 file (other versions may be affected):
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@145e044
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:123)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:103)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:176)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:63)
> Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
>         at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
>         at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>         at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>         at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95)
>         at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
>         at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
>         at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:118)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
>         ... 3 more

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.