« Return to Thread: pdf2html conversion outputs garbage (null Blob)

Re: pdf2html conversion outputs garbage (null Blob)

by Thierry Delprat-2 :: Rate this Message:

Reply to Author | View in Thread

Sean Radford a écrit :

> Sorted!
>
> Using the jdb command line debugger ascertained that the pdftohtml
> command was being set-up and called as expected:
>
> pdftohtml -c -noframes
> '/tmp/cmdLineBasedConverter8814426520292516367.tmp'
> /tmp/pdf2html_1246551942656/index.html 2>&1"
>
> The problem is with the version of pdftohtml (poppler-0.5.4) that
> installs from the Red Hat Network.
>
> Have replaced with the latest version (0.10.7), compiling from source,
> and all now works.
>
> http://jira.nuxeo.org/browse/NXP-3821 updated accordingly and can be
> closed off.
Thx for the debugging work Sean :)

>
> Thanks,
>
> Sean
>
>
> Tiry wrote:
>> Can you please create a Jira ticket for this and dump all your elements.
>> There are already similar task ( NXP-3656 and NXP-3654).
>>
>> Thx for your tests and feedback.
>>
>> Sean Radford a écrit :
>>> OOo_3.1.0_LinuxX86-64_install_en-US
>>>
>>> But I'm not so worried about this just now as I'm using the older
>>> style port 8100 to communicate to OO.
>>>
>>> Looking in to the test I have noticed the tmp directory created
>>> (pdf2html_XXXXXX) only contains help.png and no index.html. I found
>>> index.html in the base directory of nuxeo-platform-convert.
>>>
>>> I then looked at the /tmp/pdf2html_XXXX created from within Nuxeo DM
>>> and yep, no index.html file. I have found index.html in the root of
>>> the user that is running JBoss, i.e. user jboss
>>>
>>> On the failing box, I'm running Nuxeo from a service init script,
>>> which runs as root and calls:
>>>
>>> su - jboss -c "/opt/tnuxeo/nuxeo-5.2/bin/run.sh -b 127.0.0.1 >
>>> /dev/null &"
>>>
>>> On the non-failing box it is not a service and simply run as a
>>> standard user.
>>>
>>> Is that any use? I'll continue to investigate myself, but have to go
>>> to a meeting now, so will come back to it in a few hours...
>>>
>>> Sean
>>>
>>>
>>> Tiry wrote:
>>>> Please check that your open office version is 64 bits
>>>> Please also give us the OpenOffice version you are using.
>>>>
>>>> Tiry
>>>>
>>>> Sean Radford a écrit :
>>>>> P.S. Both systems are 64-bit cpus running under JDK1.6.0_14
>>>>> (64bit), though the failing box is RHEL-5.5 and the other
>>>>> Kubuntu-9.0.4
>>>>>
>>>>> Sean Radford wrote:
>>>>>> Hi Tiry,
>>>>>>
>>>>>> On the box that fails:
>>>>>>
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Test set: org.nuxeo.ecm.platform.convert.tests.DocumentTestUtils
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>> 0.003 sec
>>>>>>
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Test set:
>>>>>> org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>>>>>> 3.702 sec <<< FAILURE!
>>>>>> testAnyToTextConverter(org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters)  
>>>>>> Time elapsed: 3.679 sec  <<< ERROR!
>>>>>> org.nuxeo.ecm.core.convert.api.ConversionException: Error in
>>>>>> JODConverter
>>>>>>    at
>>>>>> org.nuxeo.ecm.platform.convert.plugins.JODBasedConverter.convert(JODBasedConverter.java:407)
>>>>>>
>>>>>>    at
>>>>>> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:172)
>>>>>>
>>>>>>    at
>>>>>> org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters.doTestPDFConverter(TestAnyToPDFConverters.java:55)
>>>>>>
>>>>>>    at
>>>>>> org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters.testAnyToTextConverter(TestAnyToPDFConverters.java:77)
>>>>>>
>>>>>> Caused by: org.nuxeo.ecm.core.api.WrappedException: Exception:
>>>>>> com.artofsolving.jodconverter.openoffice.connection.OpenOfficeException.
>>>>>> message: conversion failed: could not save output document; OOo
>>>>>> errorCode: 525
>>>>>>    at
>>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.loadAndExport(OpenOfficeDocumentConverter.java:142)
>>>>>>
>>>>>>    at
>>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.convertInternal(OpenOfficeDocumentConverter.java:120)
>>>>>>
>>>>>>    at
>>>>>> com.artofsolving.jodconverter.openoffice.converter.AbstractOpenOfficeDocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:104)
>>>>>>
>>>>>>    at
>>>>>> org.nuxeo.ecm.platform.convert.plugins.JODBasedConverter.convert(JODBasedConverter.java:388)
>>>>>>
>>>>>>    ... 29 more
>>>>>> Caused by: org.nuxeo.ecm.core.api.WrappedException: Exception:
>>>>>> com.sun.star.task.ErrorCodeIOException. message:
>>>>>>    at
>>>>>> com.sun.star.lib.uno.environments.remote.Job.remoteUnoRequestRaisedException(Job.java:187)
>>>>>>
>>>>>>    at
>>>>>> com.sun.star.lib.uno.environments.remote.Job.execute(Job.java:153)
>>>>>>    at
>>>>>> com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:349)
>>>>>>
>>>>>>    at
>>>>>> com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:318)
>>>>>>
>>>>>>    at
>>>>>> com.sun.star.lib.uno.environments.remote.JavaThreadPool.enter(JavaThreadPool.java:106)
>>>>>>
>>>>>>    at
>>>>>> com.sun.star.lib.uno.bridges.java_remote.java_remote_bridge.sendRequest(java_remote_bridge.java:657)
>>>>>>
>>>>>>    at
>>>>>> com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.request(ProxyFactory.java:159)
>>>>>>
>>>>>>    at
>>>>>> com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.invoke(ProxyFactory.java:141)
>>>>>>
>>>>>>    at $Proxy18.storeToURL(Unknown Source)
>>>>>>    at
>>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.storeDocument(OpenOfficeDocumentConverter.java:156)
>>>>>>
>>>>>>    at
>>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.loadAndExport(OpenOfficeDocumentConverter.java:140)
>>>>>>
>>>>>>    ... 32 more
>>>>>>
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Test set: org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>>>>>> 0.323 sec <<< FAILURE!
>>>>>> testConverter(org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml)  
>>>>>> Time elapsed: 0.318 sec  <<< FAILURE!
>>>>>> junit.framework.AssertionFailedError: expected:<2> but was:<1>
>>>>>>    at junit.framework.Assert.fail(Assert.java:47)
>>>>>>    at junit.framework.Assert.failNotEquals(Assert.java:280)
>>>>>>    at junit.framework.Assert.assertEquals(Assert.java:64)
>>>>>>    at junit.framework.Assert.assertEquals(Assert.java:198)
>>>>>>    at junit.framework.Assert.assertEquals(Assert.java:204)
>>>>>>    at
>>>>>> org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml.testConverter(TestPDFToHtml.java:95)
>>>>>>
>>>>>>    at
>>>>>> org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml.testConverter(TestPDFToHtml.java:95)
>>>>>>
>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>    at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>
>>>>>>    at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at junit.framework.TestCase.runTest(TestCase.java:164)
>>>>>>    at org.jmock.core.VerifyingTestCase.runBare(Unknown Source)
>>>>>>    at junit.framework.TestResult$1.protect(TestResult.java:106)
>>>>>>    at junit.framework.TestResult.runProtected(TestResult.java:124)
>>>>>>    at junit.framework.TestResult.run(TestResult.java:109)
>>>>>>    at junit.framework.TestCase.run(TestCase.java:120)
>>>>>>    at junit.framework.TestSuite.runTest(TestSuite.java:230)
>>>>>>    at junit.framework.TestSuite.run(TestSuite.java:225)
>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>    at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>
>>>>>>    at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at
>>>>>> org.apache.maven.surefire.junit.JUnitTestSet.execute(JUnitTestSet.java:213)
>>>>>>
>>>>>>    at
>>>>>> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:138)
>>>>>>
>>>>>>    at
>>>>>> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:125)
>>>>>>
>>>>>>    at org.apache.maven.surefire.Surefire.run(Surefire.java:132)
>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>    at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>
>>>>>>    at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at
>>>>>> org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:290)
>>>>>>
>>>>>>    at
>>>>>> org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:818)
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Test set: org.nuxeo.ecm.platform.convert.tests.TestPDFToImage
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>> 0.75 sec
>>>>>>
>>>>>>
>>>>>> I'm not worried about theTestAnyToPDFConverters failing as that
>>>>>> is fine within the application running over port 8100.
>>>>>>
>>>>>> TestPDFToHthml passes ok on my other box.
>>>>>>
>>>>>> I'll have a look at the code in a moment myself and see what I
>>>>>> can deduce - problem is that the failing box is remote and no
>>>>>> debug tools...
>>>>>>
>>>>>> Help much appreciated - I'm setting up the box as a demo of Nuxeo
>>>>>> for a potential client and this feature just happens to be
>>>>>> something they want to see!
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Sean,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Tiry wrote:
>>>>>>> Can you check what part of the unit tests for PDF converter
>>>>>>> (nuxeo-platfotm-convert) run on you box ?
>>>>>>>
>>>>>>> Sean Radford a écrit :
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> 1 of my 5.2 boxes is not converting pdf's to html correctly....
>>>>>>>> garbage characters are just output to the screen in the preview
>>>>>>>> frame.
>>>>>>>>
>>>>>>>> I am unable to run it through the debugger, but it appears that
>>>>>>>> the BlobHolder being passed into
>>>>>>>> ConversionServiceImpl#convert(...) has a null Blob.
>>>>>>>>
>>>>>>>> As a result PDF2HtmlConverter#getCmdBlobParameters(...) does
>>>>>>>> not set 'inFilePath' and so this is not available in
>>>>>>>> CommandLineBasedConverter#execOnBlob(...).
>>>>>>>>
>>>>>>>> Thus the commandline being run is 'pdftohtml OUTPUT_DIR', hence
>>>>>>>> the garbage...
>>>>>>>>
>>>>>>>> Any ideas what is going on?
>>>>>>>>
>>>>>>>> This is happening for all PDF's (including any new ones added)
>>>>>>>> - they all download correctly, so being stored ok.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

_______________________________________________
ECM mailing list
ECM@...
http://lists.nuxeo.com/mailman/listinfo/ecm
To unsubscribe, go to http://lists.nuxeo.com/mailman/options/ecm

 « Return to Thread: pdf2html conversion outputs garbage (null Blob)