Excessive PDF output file size

View: New views
2 Messages — Rating Filter:   Alert me  

Excessive PDF output file size

by Stephen Clouse :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am currently working on replacing PDFLib with FOP in an existing application and have hit a roadblock due to the file sizes of the PDFs FOP is producing.

If you point your web browser to http://warpcore.org/fop/ you will find two versions of the first page of a report, one from the existing PDFLib-generated file, the other as rendered by FOP.  The FOP version is almost 4 times larger (16KB vs 4.2KB).

The full report (51 pages of the same style output) is even worse, 633KB vs. 86KB (over 7 times larger).  I thought it might have something to do with the table borders, with FOP rendering individual border segments whereas the PDFLib version is basically lines being drawn manually, but even with all borders supressed (reducing it to text-only) the FOP version is close to 400KB.  I have used FOP plenty in the past and not encountered such an issue, but I have also never done anything this heavy on tables.

Is there anything you can recommend to get this file down to a reasonable size?  If it's something where FOP needs to be optimized, where can I start looking?  (I'm definitely not opposed to doing some development work on FOP but I haven't had need to work with the source code at all to date.)

--
Stephen Clouse <stephenclouse@...>

Re: Excessive PDF output file size

by Jeremias Maerki-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

First of all, I don't think there's anything you can do short of
changing FOP to bring the PDF sizes down. I don't really feel like going
into all the details but:

- XSL-FO offers quite some functionality that results in a lot of PDF
commands to paint the borders if no border segment merging is done
(which could be quite difficult to do and is pretty much impossible
before a certain bug [1] is fixed).

- Each table-cell is specified to produce a reference area which
currently results in a q/cm/<content>/Q sequence. State handling then
causes additional commands (like font selection) for each reference area
which could be avoided if no q/Q were used. But getting rid of that
would have a fallout of its own.

The end result would we an considerably increase of complexity in the
rendering source code at the very least.

Anyway, I guess with the new intermediate format [2] you could actually
write an optimizer (specialized for your use case) on the IF XML level
to get rid of many <g> elements generated by the reference areas from
the table-cells.

[1] http://markmail.org/message/2jmh4pvwae2kjgkw
[2] http://xmlgraphics.apache.org/fop/trunk/intermediate.html

On 02.10.2009 00:38:45 Stephen Clouse wrote:

> I am currently working on replacing PDFLib with FOP in an existing
> application and have hit a roadblock due to the file sizes of the PDFs FOP
> is producing.
>
> If you point your web browser to http://warpcore.org/fop/ you will find two
> versions of the first page of a report, one from the existing
> PDFLib-generated file, the other as rendered by FOP.  The FOP version is
> almost 4 times larger (16KB vs 4.2KB).
>
> The full report (51 pages of the same style output) is even worse, 633KB vs.
> 86KB (over 7 times larger).  I thought it might have something to do with
> the table borders, with FOP rendering individual border segments whereas the
> PDFLib version is basically lines being drawn manually, but even with all
> borders supressed (reducing it to text-only) the FOP version is close to
> 400KB.  I have used FOP plenty in the past and not encountered such an
> issue, but I have also never done anything this heavy on tables.
>
> Is there anything you can recommend to get this file down to a reasonable
> size?  If it's something where FOP needs to be optimized, where can I start
> looking?  (I'm definitely not opposed to doing some development work on FOP
> but I haven't had need to work with the source code at all to date.)
>
> --
> Stephen Clouse <stephenclouse@...>




Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@...
For additional commands, e-mail: fop-users-help@...