« Return to Thread: pdf2html conversion outputs garbage (null Blob)

Re: pdf2html conversion outputs garbage (null Blob)

by Sean Radford-3 :: Rate this Message:

Reply to Author | View in Thread

Sorted!

Using the jdb command line debugger ascertained that the pdftohtml
command was being set-up and called as expected:

pdftohtml -c -noframes
'/tmp/cmdLineBasedConverter8814426520292516367.tmp'
/tmp/pdf2html_1246551942656/index.html 2>&1"

The problem is with the version of pdftohtml (poppler-0.5.4) that
installs from the Red Hat Network.

Have replaced with the latest version (0.10.7), compiling from source,
and all now works.

http://jira.nuxeo.org/browse/NXP-3821 updated accordingly and can be
closed off.

Thanks,

Sean


Tiry wrote:

> Can you please create a Jira ticket for this and dump all your elements.
> There are already similar task ( NXP-3656 and NXP-3654).
>
> Thx for your tests and feedback.
>
> Sean Radford a écrit :
>> OOo_3.1.0_LinuxX86-64_install_en-US
>>
>> But I'm not so worried about this just now as I'm using the older
>> style port 8100 to communicate to OO.
>>
>> Looking in to the test I have noticed the tmp directory created
>> (pdf2html_XXXXXX) only contains help.png and no index.html. I found
>> index.html in the base directory of nuxeo-platform-convert.
>>
>> I then looked at the /tmp/pdf2html_XXXX created from within Nuxeo DM
>> and yep, no index.html file. I have found index.html in the root of
>> the user that is running JBoss, i.e. user jboss
>>
>> On the failing box, I'm running Nuxeo from a service init script,
>> which runs as root and calls:
>>
>> su - jboss -c "/opt/tnuxeo/nuxeo-5.2/bin/run.sh -b 127.0.0.1 >
>> /dev/null &"
>>
>> On the non-failing box it is not a service and simply run as a
>> standard user.
>>
>> Is that any use? I'll continue to investigate myself, but have to go
>> to a meeting now, so will come back to it in a few hours...
>>
>> Sean
>>
>>
>> Tiry wrote:
>>> Please check that your open office version is 64 bits
>>> Please also give us the OpenOffice version you are using.
>>>
>>> Tiry
>>>
>>> Sean Radford a écrit :
>>>> P.S. Both systems are 64-bit cpus running under JDK1.6.0_14
>>>> (64bit), though the failing box is RHEL-5.5 and the other
>>>> Kubuntu-9.0.4
>>>>
>>>> Sean Radford wrote:
>>>>> Hi Tiry,
>>>>>
>>>>> On the box that fails:
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Test set: org.nuxeo.ecm.platform.convert.tests.DocumentTestUtils
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>> 0.003 sec
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Test set: org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>>>>> 3.702 sec <<< FAILURE!
>>>>> testAnyToTextConverter(org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters)  
>>>>> Time elapsed: 3.679 sec  <<< ERROR!
>>>>> org.nuxeo.ecm.core.convert.api.ConversionException: Error in
>>>>> JODConverter
>>>>>    at
>>>>> org.nuxeo.ecm.platform.convert.plugins.JODBasedConverter.convert(JODBasedConverter.java:407)
>>>>>
>>>>>    at
>>>>> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:172)
>>>>>
>>>>>    at
>>>>> org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters.doTestPDFConverter(TestAnyToPDFConverters.java:55)
>>>>>
>>>>>    at
>>>>> org.nuxeo.ecm.platform.convert.tests.TestAnyToPDFConverters.testAnyToTextConverter(TestAnyToPDFConverters.java:77)
>>>>>
>>>>> Caused by: org.nuxeo.ecm.core.api.WrappedException: Exception:
>>>>> com.artofsolving.jodconverter.openoffice.connection.OpenOfficeException.
>>>>> message: conversion failed: could not save output document; OOo
>>>>> errorCode: 525
>>>>>    at
>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.loadAndExport(OpenOfficeDocumentConverter.java:142)
>>>>>
>>>>>    at
>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.convertInternal(OpenOfficeDocumentConverter.java:120)
>>>>>
>>>>>    at
>>>>> com.artofsolving.jodconverter.openoffice.converter.AbstractOpenOfficeDocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:104)
>>>>>
>>>>>    at
>>>>> org.nuxeo.ecm.platform.convert.plugins.JODBasedConverter.convert(JODBasedConverter.java:388)
>>>>>
>>>>>    ... 29 more
>>>>> Caused by: org.nuxeo.ecm.core.api.WrappedException: Exception:
>>>>> com.sun.star.task.ErrorCodeIOException. message:
>>>>>    at
>>>>> com.sun.star.lib.uno.environments.remote.Job.remoteUnoRequestRaisedException(Job.java:187)
>>>>>
>>>>>    at
>>>>> com.sun.star.lib.uno.environments.remote.Job.execute(Job.java:153)
>>>>>    at
>>>>> com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:349)
>>>>>
>>>>>    at
>>>>> com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:318)
>>>>>
>>>>>    at
>>>>> com.sun.star.lib.uno.environments.remote.JavaThreadPool.enter(JavaThreadPool.java:106)
>>>>>
>>>>>    at
>>>>> com.sun.star.lib.uno.bridges.java_remote.java_remote_bridge.sendRequest(java_remote_bridge.java:657)
>>>>>
>>>>>    at
>>>>> com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.request(ProxyFactory.java:159)
>>>>>
>>>>>    at
>>>>> com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.invoke(ProxyFactory.java:141)
>>>>>
>>>>>    at $Proxy18.storeToURL(Unknown Source)
>>>>>    at
>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.storeDocument(OpenOfficeDocumentConverter.java:156)
>>>>>
>>>>>    at
>>>>> com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter.loadAndExport(OpenOfficeDocumentConverter.java:140)
>>>>>
>>>>>    ... 32 more
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Test set: org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>>>>> 0.323 sec <<< FAILURE!
>>>>> testConverter(org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml)  
>>>>> Time elapsed: 0.318 sec  <<< FAILURE!
>>>>> junit.framework.AssertionFailedError: expected:<2> but was:<1>
>>>>>    at junit.framework.Assert.fail(Assert.java:47)
>>>>>    at junit.framework.Assert.failNotEquals(Assert.java:280)
>>>>>    at junit.framework.Assert.assertEquals(Assert.java:64)
>>>>>    at junit.framework.Assert.assertEquals(Assert.java:198)
>>>>>    at junit.framework.Assert.assertEquals(Assert.java:204)
>>>>>    at
>>>>> org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml.testConverter(TestPDFToHtml.java:95)
>>>>>
>>>>>    at
>>>>> org.nuxeo.ecm.platform.convert.tests.TestPDFToHtml.testConverter(TestPDFToHtml.java:95)
>>>>>
>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>    at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>
>>>>>    at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>
>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>    at junit.framework.TestCase.runTest(TestCase.java:164)
>>>>>    at org.jmock.core.VerifyingTestCase.runBare(Unknown Source)
>>>>>    at junit.framework.TestResult$1.protect(TestResult.java:106)
>>>>>    at junit.framework.TestResult.runProtected(TestResult.java:124)
>>>>>    at junit.framework.TestResult.run(TestResult.java:109)
>>>>>    at junit.framework.TestCase.run(TestCase.java:120)
>>>>>    at junit.framework.TestSuite.runTest(TestSuite.java:230)
>>>>>    at junit.framework.TestSuite.run(TestSuite.java:225)
>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>    at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>
>>>>>    at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>
>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>    at
>>>>> org.apache.maven.surefire.junit.JUnitTestSet.execute(JUnitTestSet.java:213)
>>>>>
>>>>>    at
>>>>> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:138)
>>>>>
>>>>>    at
>>>>> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:125)
>>>>>
>>>>>    at org.apache.maven.surefire.Surefire.run(Surefire.java:132)
>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>    at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>
>>>>>    at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>
>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>    at
>>>>> org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:290)
>>>>>
>>>>>    at
>>>>> org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:818)
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Test set: org.nuxeo.ecm.platform.convert.tests.TestPDFToImage
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>> 0.75 sec
>>>>>
>>>>>
>>>>> I'm not worried about theTestAnyToPDFConverters failing as that is
>>>>> fine within the application running over port 8100.
>>>>>
>>>>> TestPDFToHthml passes ok on my other box.
>>>>>
>>>>> I'll have a look at the code in a moment myself and see what I can
>>>>> deduce - problem is that the failing box is remote and no debug
>>>>> tools...
>>>>>
>>>>> Help much appreciated - I'm setting up the box as a demo of Nuxeo
>>>>> for a potential client and this feature just happens to be
>>>>> something they want to see!
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sean,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Tiry wrote:
>>>>>> Can you check what part of the unit tests for PDF converter
>>>>>> (nuxeo-platfotm-convert) run on you box ?
>>>>>>
>>>>>> Sean Radford a écrit :
>>>>>>> Hi,
>>>>>>>
>>>>>>> 1 of my 5.2 boxes is not converting pdf's to html correctly....
>>>>>>> garbage characters are just output to the screen in the preview
>>>>>>> frame.
>>>>>>>
>>>>>>> I am unable to run it through the debugger, but it appears that
>>>>>>> the BlobHolder being passed into
>>>>>>> ConversionServiceImpl#convert(...) has a null Blob.
>>>>>>>
>>>>>>> As a result PDF2HtmlConverter#getCmdBlobParameters(...) does not
>>>>>>> set 'inFilePath' and so this is not available in
>>>>>>> CommandLineBasedConverter#execOnBlob(...).
>>>>>>>
>>>>>>> Thus the commandline being run is 'pdftohtml OUTPUT_DIR', hence
>>>>>>> the garbage...
>>>>>>>
>>>>>>> Any ideas what is going on?
>>>>>>>
>>>>>>> This is happening for all PDF's (including any new ones added) -
>>>>>>> they all download correctly, so being stored ok.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


--
Dr. Sean Radford, MBBS, MSc
http://www.tacola.com/
t: +44 (0) 8700 671 490
m: +44 (0) 7802 24 24 86

_______________________________________________
ECM mailing list
ECM@...
http://lists.nuxeo.com/mailman/listinfo/ecm
To unsubscribe, go to http://lists.nuxeo.com/mailman/options/ecm

 « Return to Thread: pdf2html conversion outputs garbage (null Blob)