« Return to Thread: Plugin extracting text from docs

RE: Plugin extracting text from docs

by Martin Gregorie-2 :: Rate this Message:

Reply to Author | View in Thread

On Thu, 2009-07-02 at 14:15 -0400, Rosenbaum, Larry M. wrote:
> > And, please tell me of problems.
>  
> > pdftohtml is imho not found in gentoo, but pdf2html is maybe the same ?
>
> It appears that "pdftohtml" is only available as a Windows executable
> (on Sourceforge).  I need something that will run on Solaris.
>
Fedora 10 uses pdftotext, which can output raw text or simple html.

Wikipedia says its Open Source and related to the Poppler library which
is behind xpdf.

It seems to be available from Foo labs: http://www.foolabs.com
 

Martin


 « Return to Thread: Plugin extracting text from docs