« Return to Thread: Plugin extracting text from docs

RE: Plugin extracting text from docs

by Rosenbaum, Larry M. :: Rate this Message:

Reply to Author | View in Thread

> From: Jonas Eckerman [mailto:jonas_lists@...]
>
> Rosenbaum, Larry M. wrote:
>
> > It appears that "pdftohtml" is only available as a Windows executable
> (on Sourceforge).
>
> If you want a precompiled executable it seems Windows is the only
> platform, but AFAICS the source code is also available at
> http://sourceforge.net/projects/pdftohtml/files/

I have found the Xpdf package, which pdftohtml is based on, has a pdftotext command line utility.  If you build it with the "--without-x" option, you get just the command line utilities without the X-windows stuff, which eliminates the need to install a bunch of font software.

 « Return to Thread: Plugin extracting text from docs