« Return to Thread: Upcoming Zend_Pdf Enhancements

Upcoming Zend_Pdf Enhancements

by Willie Alberty :: Rate this Message:

Reply to Author | View in Thread

For the last few months I have been working with Kevin McArthur on a  
comprehensive PDF generation project for a client [Streamflow] who has  
some pretty advanced layout needs. The project is nearing completion  
and we have been discussing the possibility of contributing large  
portions of the code back to the Zend Framework as improvements to  
Zend_Pdf.

In light of several recent postings to fw-general and fw-formats, as  
well as a few encouraging proposals recently submitted to the Wiki, we  
would like to formally announce our plans and describe the new  
functionality at a high level here.

We will be submitting proposals in the coming weeks that describe  
these new components in more detail along with fully-functional  
reference implementations. Our hope is to join forces with other  
interested developers to help fast-track these proposals through the  
feedback and approval process, write tests, user documentation, and  
examples, and exercise the code as much as possible.

We're really proud of this work and are excited to share it with the  
community. We believe that these enhancements will further establish  
Zend_Pdf's role as the gold standard for PDF generation using PHP.


Text Layout Engine
------------------

"How do I wrap long lines of text?" This is probably the most commonly-
asked question regarding Zend_Pdf. I'm pleased to report that not only  
have we solved the problem of text-wrapping, but a whole host of  
others as well. The new engine provides fully-automatic text layout,  
and has customization hooks in a variety of places.

Line breaks are calculated using the Unicode Line Breaking Algorithm  
(UAX #14), providing linguistically-appropriate line breaks, not just  
at whitespace characters.

Paragraph styles allow you to specify left-, center-, and right-
alignment, as well as full justification, line leading, line height,  
line multiple (double-space, triple-space, etc.), pre- and post-  
paragraph spacing, left- and right-side margins, and first-line  
indentation. Paragraph styles also support left-, center-, right-, and  
decimal-aligned tab stops, with or without leaders, for intra-line  
alignment needs.

In addition to the left-to-right line sweep used by most Latin-based  
scripts, right-to-left line sweep is also supported, and is  
automatically detected by the layout engine; you never need to supply  
strings in reverse character order for right-to-left text layout.

The layout engine is based around the concept of an attributed string.  
These are Unicode strings of unlimited length, and fully support the  
entire Unicode character set, including characters outside the Basic  
Multilingual Plane (BMP).

Attributed strings allow you to assign stylistic attributes to  
arbitrary ranges of characters within the string. These attributes are  
used by typesetters to determine the specific look and location for  
every character. This means that you can make unlimited style changes  
within a block of text, even changing styles character-by-character if  
desired.

The layout engine automatically manages all of these style changes,  
applying them as necessary when drawing the text on the page. The  
following style attributes are supported:

  - Font
  - Font size
  - Fill color
  - Stroke width and color
  - Underline and strikethrough
  - Super- and sub-script
  - Background color

You can add your own custom attributes as well, which you can use in  
your own subclasses to completely customize the layout engine's  
behavior.

These attributed strings will eventually be shared with Zend_Rtf  
(recently proposed by Andries Seutens), as each attributed string is  
essentially a self-contained RTF document. This opens up the  
possibility for generating fully-styled PDF or RTF output from the  
same source with only a couple of lines of code. It will also  
eventually be possible to use existing styled RTF documents as the  
basis for PDF text drawing, eliminating the need to manually apply  
style attributes in your PHP code.

A layout manager class is responsible for drawing these attributed  
strings. It lays out the text in a series of arbitrarily-shaped text  
containers, automatically moving from one to the next as each is  
filled. Rectangular and circular containers will be provided, but you  
can easily create your own custom containers for other shapes or to  
flow text around images.

Multi-column output is as easy as creating two adjacent text  
containers on the same page. Text containers don't even need to be on  
the same PDF page: you can start your text in a small container on  
page 1, then continue it on page 17.

Callback functions are provided to allow you to create text additional  
containers as needed, which can be located on new pages. This is  
useful if you do not know the length of the text you are drawing ahead  
of time, or if you want to adapt your layout on-the-fly.

You can also use multiple layout managers on a single page, allowing  
you to create complex multi-page flows for a series of text runs.  
These can be useful for creating page headers and footers, or for  
running stories side-by-side in a newsletter.


Drawing Model
-------------

Three new primitive geometry classes allow you to precisely define  
drawing locations, sizes, and regions. They also provide a host of  
convenience functions allowing for calculation, conversion,  
intersection testing, etc.:

  - Point: x and y coordinate
  - Size: height and width
  - Rectangle: combination of a point and size

PDF pages are drawn using a series of content streams, which contain  
all of the low-level drawing commands. Zend_Pdf_Page currently manages  
its own private content stream.

We've separated content streams from Zend_Pdf_Page, promoting them to  
first-class objects. This allows us to use these content streams as  
templates that can be reused again and again, either on a single page  
or multiple pages. Templates can greatly reduce PDF file sizes and  
improve memory use and performance in PDF viewer applications.

It is also possible to create a template from any page in an existing  
PDF document. You can then reuse the template in the same PDF, or even  
copy it to a new PDF document, where you can use it as a page  
background, draw it as a thumbnail, perform imposition, etc.


Performance and Memory
----------------------

We've also made numerous performance and memory-usage improvements  
throughout the code. Most data is now lazily-loaded, allowing you to  
manipulate very large documents, containing thousands or millions of  
individual objects or hundreds of megabytes or gigabytes in size, with  
a very low memory footprint.


Future Enhancements
-------------------

All of this new functionality lays the groundwork for even more  
powerful enhancements down the road:

  - Top-to-bottom line sweep for Asian scripts
  - Bi-directional text (for Hebrew, Arabic, and others)
  - Bulleted and numbered text lists
  - HTML-inspired inline text tables
  - Inline attachments (for example, images that flow with text)
  - Advanced typographic features such as tracking, pairwise kerning,  
ligatures, etc.
  - Hyphenation support
  - Glyph substitution using fallback fonts
  - and more...


Again, we're really excited to be sharing this code with the  
community. We'll be creating the proposals for the various components  
in the coming weeks and announcing them on the fw-formats list when  
they're ready for review. In the meantime, if you have any high-level  
questions, please don't hesitate to ask.

--

Willie Alberty, Owner
Spenlen Media
willie@...

http://www.spenlen.com/

 « Return to Thread: Upcoming Zend_Pdf Enhancements