[jira] Created: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

View: New views
7 Messages — Rating Filter:   Alert me  

[jira] Created: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
---------------------------------------------------------------------------------------------------------------------

                 Key: TIKA-290
                 URL: https://issues.apache.org/jira/browse/TIKA-290
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.4
         Environment: Windows XP / jdk1.6.0_15
            Reporter: MRIT64
            Priority: Minor
             Fix For: 0.4


It's just for information (I am testing Tika).

I am using tika-app-0.4.jar from the box.
I get the run-time error below :
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

with the ANSI text file containing :
azerty

123456789012345 6789012345678901 2345678901234567890123456789 0123456789012345678901234567890123 456789012345678901234567890123456 789012345678901234567890123456789012345678901 2345678901234567890123456789 012345678901234567890123456 7890123456789012345 678901234567890123456789012345 6789012345678901234567890

1234567890123456789012 345678901234567890123456789012345 6789012345678901234567890123456789012345678901234 567890123456789012345678901234567890123456789012345678901234 56789012345678901234567890123456789012345678901234567890123456789012345 78901234567890123456789012345678901234 56789012345678901234567890TOOLONGTOKEN

qwerty.

It works well if this file is saved in UTF-8 or if I delete some lines in the ANSI file. I don't know why.

Best regards


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/TIKA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-290:
-------------------------------

    Fix Version/s:     (was: 0.4)

Do you have a full stack trace of the exception?

> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-290
>                 URL: https://issues.apache.org/jira/browse/TIKA-290
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> It's just for information (I am testing Tika).
> I am using tika-app-0.4.jar from the box.
> I get the run-time error below :
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> with the ANSI text file containing :
> azerty
> 123456789012345 6789012345678901 2345678901234567890123456789 0123456789012345678901234567890123 456789012345678901234567890123456 789012345678901234567890123456789012345678901 2345678901234567890123456789 012345678901234567890123456 7890123456789012345 678901234567890123456789012345 6789012345678901234567890
> 1234567890123456789012 345678901234567890123456789012345 6789012345678901234567890123456789012345678901234 567890123456789012345678901234567890123456789012345678901234 56789012345678901234567890123456789012345678901234567890123456789012345 78901234567890123456789012345678901234 56789012345678901234567890TOOLONGTOKEN
> qwerty.
> It works well if this file is saved in UTF-8 or if I delete some lines in the ANSI file. I don't know why.
> Best regards

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/TIKA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760348#action_12760348 ]

MRIT64 commented on TIKA-290:
-----------------------------

Trace :

java -jar tika-app-0.4.jar test
.txt
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected R
untimeException from org.apache.tika.parser.txt.TXTParser@1ac1fe4
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121
)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
05)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:116)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:57)
Caused by: java.lang.NullPointerException
        at java.io.Reader.<init>(Unknown Source)
        at java.io.BufferedReader.<init>(Unknown Source)
        at java.io.BufferedReader.<init>(Unknown Source)
        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119
)
        ... 3 more


> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-290
>                 URL: https://issues.apache.org/jira/browse/TIKA-290
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> It's just for information (I am testing Tika).
> I am using tika-app-0.4.jar from the box.
> I get the run-time error below :
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> with the ANSI text file containing :
> azerty
> 123456789012345 6789012345678901 2345678901234567890123456789 0123456789012345678901234567890123 456789012345678901234567890123456 789012345678901234567890123456789012345678901 2345678901234567890123456789 012345678901234567890123456 7890123456789012345 678901234567890123456789012345 6789012345678901234567890
> 1234567890123456789012 345678901234567890123456789012345 6789012345678901234567890123456789012345678901234 567890123456789012345678901234567890123456789012345678901234 56789012345678901234567890123456789012345678901234567890123456789012345 78901234567890123456789012345678901234 56789012345678901234567890TOOLONGTOKEN
> qwerty.
> It works well if this file is saved in UTF-8 or if I delete some lines in the ANSI file. I don't know why.
> Best regards

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/TIKA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

MRIT64 updated TIKA-290:
------------------------

    Attachment: test.txt

Attached file is the file that cause the problem.

Best regards

> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-290
>                 URL: https://issues.apache.org/jira/browse/TIKA-290
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>         Attachments: test.txt
>
>
> It's just for information (I am testing Tika).
> I am using tika-app-0.4.jar from the box.
> I get the run-time error below :
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> with the ANSI text file containing :
> azerty
> 123456789012345 6789012345678901 2345678901234567890123456789 0123456789012345678901234567890123 456789012345678901234567890123456 789012345678901234567890123456789012345678901 2345678901234567890123456789 012345678901234567890123456 7890123456789012345 678901234567890123456789012345 6789012345678901234567890
> 1234567890123456789012 345678901234567890123456789012345 6789012345678901234567890123456789012345678901234 567890123456789012345678901234567890123456789012345678901234 56789012345678901234567890123456789012345678901234567890123456789012345 78901234567890123456789012345678901234 56789012345678901234567890TOOLONGTOKEN
> qwerty.
> It works well if this file is saved in UTF-8 or if I delete some lines in the ANSI file. I don't know why.
> Best regards

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/TIKA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760348#action_12760348 ]

MRIT64 edited comment on TIKA-290 at 9/28/09 12:13 PM:
-------------------------------------------------------

Trace :

java -jar tika-app-0.4.jar test.txt
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected R
untimeException from org.apache.tika.parser.txt.TXTParser@1ac1fe4
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:116)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:57)
Caused by: java.lang.NullPointerException
        at java.io.Reader.<init>(Unknown Source)
        at java.io.BufferedReader.<init>(Unknown Source)
        at java.io.BufferedReader.<init>(Unknown Source)
        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
        ... 3 more


      was (Author: mrit64):
    Trace :

java -jar tika-app-0.4.jar test
.txt
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected R
untimeException from org.apache.tika.parser.txt.TXTParser@1ac1fe4
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121
)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
05)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:116)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:57)
Caused by: java.lang.NullPointerException
        at java.io.Reader.<init>(Unknown Source)
        at java.io.BufferedReader.<init>(Unknown Source)
        at java.io.BufferedReader.<init>(Unknown Source)
        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119
)
        ... 3 more

 

> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-290
>                 URL: https://issues.apache.org/jira/browse/TIKA-290
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>         Attachments: test.txt
>
>
> It's just for information (I am testing Tika).
> I am using tika-app-0.4.jar from the box.
> I get the run-time error below :
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> with the ANSI text file containing :
> azerty
> 123456789012345 6789012345678901 2345678901234567890123456789 0123456789012345678901234567890123 456789012345678901234567890123456 789012345678901234567890123456789012345678901 2345678901234567890123456789 012345678901234567890123456 7890123456789012345 678901234567890123456789012345 6789012345678901234567890
> 1234567890123456789012 345678901234567890123456789012345 6789012345678901234567890123456789012345678901234 567890123456789012345678901234567890123456789012345678901234 56789012345678901234567890123456789012345678901234567890123456789012345 78901234567890123456789012345678901234 56789012345678901234567890TOOLONGTOKEN
> qwerty.
> It works well if this file is saved in UTF-8 or if I delete some lines in the ANSI file. I don't know why.
> Best regards

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/TIKA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-290.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.5
         Assignee: Jukka Zitting

The NPE issue was already fixed in current trunk, but the test file still threw an IOException because of an unsupported character encoding. It looks like the ICU4J encoding detection code we use opts for some weird encodings as the "best match" when the input matches multiple different encodings.

I solved the immediate problem in revision 820956 by only accepting encodings that are actually supported by the Java runtime.

The solution is still not ideal as Tika now reports the test file as using the ISO-8859-2 encoding. I guess we need to come up with some better detection heuristics for cases like this. I'll follow up on tika-dev@, for now I'm resolving this issue as fixed.

> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-290
>                 URL: https://issues.apache.org/jira/browse/TIKA-290
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: test.txt
>
>
> It's just for information (I am testing Tika).
> I am using tika-app-0.4.jar from the box.
> I get the run-time error below :
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> with the ANSI text file containing :
> azerty
> 123456789012345 6789012345678901 2345678901234567890123456789 0123456789012345678901234567890123 456789012345678901234567890123456 789012345678901234567890123456789012345678901 2345678901234567890123456789 012345678901234567890123456 7890123456789012345 678901234567890123456789012345 6789012345678901234567890
> 1234567890123456789012 345678901234567890123456789012345 6789012345678901234567890123456789012345678901234 567890123456789012345678901234567890123456789012345678901234 56789012345678901234567890123456789012345678901234567890123456789012345 78901234567890123456789012345678901234 56789012345678901234567890TOOLONGTOKEN
> qwerty.
> It works well if this file is saved in UTF-8 or if I delete some lines in the ANSI file. I don't know why.
> Best regards

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-290) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/TIKA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761670#action_12761670 ]

MRIT64 commented on TIKA-290:
-----------------------------

Hi

Thanks for your investigations and for the fix.

For information, the test file has been created with Notepad on Windows XP and saved with the default file format (ANSI).

Best regards

> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-290
>                 URL: https://issues.apache.org/jira/browse/TIKA-290
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: test.txt
>
>
> It's just for information (I am testing Tika).
> I am using tika-app-0.4.jar from the box.
> I get the run-time error below :
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
> with the ANSI text file containing :
> azerty
> 123456789012345 6789012345678901 2345678901234567890123456789 0123456789012345678901234567890123 456789012345678901234567890123456 789012345678901234567890123456789012345678901 2345678901234567890123456789 012345678901234567890123456 7890123456789012345 678901234567890123456789012345 6789012345678901234567890
> 1234567890123456789012 345678901234567890123456789012345 6789012345678901234567890123456789012345678901234 567890123456789012345678901234567890123456789012345678901234 56789012345678901234567890123456789012345678901234567890123456789012345 78901234567890123456789012345678901234 56789012345678901234567890TOOLONGTOKEN
> qwerty.
> It works well if this file is saved in UTF-8 or if I delete some lines in the ANSI file. I don't know why.
> Best regards

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.