« Return to Thread: (output) encoding support in doxia-sink-api

Re: (output) encoding support in doxia-sink-api

by Benjamin Bentmann :: Rate this Message:

Reply to Author | View in Thread

Hervé BOUTEMY wrote:

> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
> interface to improve encoding support:
> - Sink createSink( File outputDir, String outputName, String encoding )

Sounds good. This would enable the caller to configure the desired
output encoding for text-based sinks like XHTML.

> - Sink createSink( Writer writer )
> [...]
> 2. some formats embed images, like RTF or PDF, then need direct stream access
> to write binary data
>
> Then I think "Sink createSink( Writer writer )" should be removed.

Makes sense, too. One can setup a writer on top of a stream but the
reverse is not possible. So it's sensible to not limit the API to
writers and support binary-based sinks.

It might however be convenient to create an AbstractTextSinkFactory from
which all/most text-based sinks could inherit. For instance,
XhtmlSinkFactory and XdocSinkFactory look pretty much the same. In more
detail, how about

   public abstract class AbstractTextSinkFactory
   {
     /**
       * @param writer The writer for the sink output, never
<code>null</code>.
       * @param encoding The character encoding used by the writer.
       */
     protected abstract Sink createSink( Writer writer, String encoding )
       throws IOException;

     public Sink createSink( File outputDir, String outputName )
     {
       return createSink( outputDir, outputName, "UTF-8" );
     }

     public Sink createSink( File outputDir, String outputName, String
encoding )
     {
       // check args, create out dir, yadayada
       Writer writer = WriterFactory.newWriter( new File( outputDir,
outputName ), encoding );
       return createSink( writer, encoding );
     }
   }

Based on this, subclasses could be trimmed down to

   public class XHtmlSinkFactory
   {
     protected abstract Sink createSink( Writer writer, String encoding )
     {
       return new XhtmlSink( writer, encoding );
     }
   }

The encoding parameter passed in here is apparently only informative. It
would merely allow the sinks to determine the encoding, e.g. for the XML
declaration.

> Or if we want an API without filename, this method could be transformed into
> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
> output, String encoding ).

Unless we actually have need for this (testing with in-memory streams?),
I would keep the interface slim to ease its implementation.

Another question might be: Does the relation
   1 sink -> 1 output file/stream
always hold or are there formats that might need to output multiple
files? In the later case, createSink( OutputStream ) would be troublesome.


Benjamin

 « Return to Thread: (output) encoding support in doxia-sink-api