(output) encoding support in doxia-sink-api

View: New views
10 Messages — Rating Filter:   Alert me  

(output) encoding support in doxia-sink-api

by Hervé BOUTEMY :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi folks,

For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
interface to improve encoding support:
- Sink createSink( File outputDir, String outputName, String encoding )
- Sink createSink( Writer writer )
See [1] for the full interface.

I worked with Vincent to implement output encoding in Doxia, and we faced
problems that lead us think that forcing a fixed encoding was the right
approach to have something simple and reliable: with UTF-8 as this fixed
encoding, this didn't limit end-users from any country in the world.

But now, I'm convinced it's not the right approach and API:
1. some formats need to output the encoding (like HTML, or XML): we need an
encoding parameter, as we can't get it from a Writer instance
2. some formats embed images, like RTF or PDF, then need direct stream access
to write binary data

Then I think "Sink createSink( Writer writer )" should be removed.
Or if we want an API without filename, this method could be transformed into
Sink createSink( OutputStream output ) + Sink createSink( OutputStream
output, String encoding ).

Any objection or idea?

Hervé

[1]
http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup

Re: (output) encoding support in doxia-sink-api

by Benjamin Bentmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hervé BOUTEMY wrote:

> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
> interface to improve encoding support:
> - Sink createSink( File outputDir, String outputName, String encoding )

Sounds good. This would enable the caller to configure the desired
output encoding for text-based sinks like XHTML.

> - Sink createSink( Writer writer )
> [...]
> 2. some formats embed images, like RTF or PDF, then need direct stream access
> to write binary data
>
> Then I think "Sink createSink( Writer writer )" should be removed.

Makes sense, too. One can setup a writer on top of a stream but the
reverse is not possible. So it's sensible to not limit the API to
writers and support binary-based sinks.

It might however be convenient to create an AbstractTextSinkFactory from
which all/most text-based sinks could inherit. For instance,
XhtmlSinkFactory and XdocSinkFactory look pretty much the same. In more
detail, how about

   public abstract class AbstractTextSinkFactory
   {
     /**
       * @param writer The writer for the sink output, never
<code>null</code>.
       * @param encoding The character encoding used by the writer.
       */
     protected abstract Sink createSink( Writer writer, String encoding )
       throws IOException;

     public Sink createSink( File outputDir, String outputName )
     {
       return createSink( outputDir, outputName, "UTF-8" );
     }

     public Sink createSink( File outputDir, String outputName, String
encoding )
     {
       // check args, create out dir, yadayada
       Writer writer = WriterFactory.newWriter( new File( outputDir,
outputName ), encoding );
       return createSink( writer, encoding );
     }
   }

Based on this, subclasses could be trimmed down to

   public class XHtmlSinkFactory
   {
     protected abstract Sink createSink( Writer writer, String encoding )
     {
       return new XhtmlSink( writer, encoding );
     }
   }

The encoding parameter passed in here is apparently only informative. It
would merely allow the sinks to determine the encoding, e.g. for the XML
declaration.

> Or if we want an API without filename, this method could be transformed into
> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
> output, String encoding ).

Unless we actually have need for this (testing with in-memory streams?),
I would keep the interface slim to ease its implementation.

Another question might be: Does the relation
   1 sink -> 1 output file/stream
always hold or are there formats that might need to output multiple
files? In the later case, createSink( OutputStream ) would be troublesome.


Benjamin

Re: (output) encoding support in doxia-sink-api

by Hervé BOUTEMY :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le vendredi 07 novembre 2008, Benjamin Bentmann a écrit :
> It might however be convenient to create an AbstractTextSinkFactory from
> which all/most text-based sinks could inherit.
Seems a good idea.
This would ensure too that every text-based sinks have the same default
encoding: it gives some consistency. Is it too limitating? I don't think.

> > Or if we want an API without filename, this method could be transformed
> > into Sink createSink( OutputStream output ) + Sink createSink(
> > OutputStream output, String encoding ).
>
> Unless we actually have need for this (testing with in-memory streams?),
if we want to generate and zip or transfer on-the-fly, for example. It's only
an general idea: I don't really have any plans. I was just guessing that if a
method with a Writer was added, some had something in mind.

> I would keep the interface slim to ease its implementation.
ok with that

> Another question might be: Does the relation
>    1 sink -> 1 output file/stream
> always hold or are there formats that might need to output multiple
> files? In the later case, createSink( OutputStream ) would be troublesome.
AFAIK, we always have 1 sink -> 1 file/stream
but I don't have sufficient experience with Doxia use-cases to affirm that.

>
>
> Benjamin
thanks for your comments

Hervé


Re: (output) encoding support in doxia-sink-api

by Vincent Siveton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

2008/11/7 Hervé BOUTEMY <herve.boutemy@...>:
> Hi folks,
>
> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
> interface to improve encoding support:
> - Sink createSink( File outputDir, String outputName, String encoding )
> - Sink createSink( Writer writer )
> See [1] for the full interface.

+1

>
> I worked with Vincent to implement output encoding in Doxia, and we faced
> problems that lead us think that forcing a fixed encoding was the right
> approach to have something simple and reliable: with UTF-8 as this fixed
> encoding, this didn't limit end-users from any country in the world.
>
> But now, I'm convinced it's not the right approach and API:
> 1. some formats need to output the encoding (like HTML, or XML): we need an
> encoding parameter, as we can't get it from a Writer instance

see also DOXIA-185

> 2. some formats embed images, like RTF or PDF, then need direct stream access
> to write binary data
>
> Then I think "Sink createSink( Writer writer )" should be removed.

I prefer deprecated for backward compatibility issue.

> Or if we want an API without filename, this method could be transformed into
> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
> output, String encoding ).

+1

Thanks Hervé for that.

Vincent

> Any objection or idea?
>
> Hervé
>
> [1]
> http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup
>

Re: (output) encoding support in doxia-sink-api

by Vincent Siveton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Benjamin

2008/11/7 Benjamin Bentmann <benjamin.bentmann@...>:

[SNIP]

> It might however be convenient to create an AbstractTextSinkFactory from
> which all/most text-based sinks could inherit. For instance,
> XhtmlSinkFactory and XdocSinkFactory look pretty much the same. In more
> detail, how about

+1 too

Vincent

Re: (output) encoding support in doxia-sink-api

by Benjamin Bentmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Vincent Siveton wrote:

>> Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.

The source code says "@since 1.0-beta-1", i.e. this method was never
part of a released Doxia version, right? Then we should be able to
safely removed it. Having both "@since 1.0-beta-1" and "@deprecated
since 1.0-beta-1" on a method would be quite strange ;-)


Benjamin

Re: (output) encoding support in doxia-sink-api

by Vincent Siveton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Other comments which maybe related to this thread:

* Parser needs at least 2 parsing: one for macro and another one for
processing, and now a third one to validate XML (DOXIA-263)
* Sink uses some time StringWriter to play with the writed content and
to create a valid content (DOXIA-177)

So, maybe it will more easy to have directly String in input/output
instead of stream.

Cheers,

Vincent


2008/11/7 Vincent Siveton <vincent.siveton@...>:

> Hi,
>
> 2008/11/7 Hervé BOUTEMY <herve.boutemy@...>:
>> Hi folks,
>>
>> For the next 1.0-beta-1 version, 2 methods have been added to SinkFactory
>> interface to improve encoding support:
>> - Sink createSink( File outputDir, String outputName, String encoding )
>> - Sink createSink( Writer writer )
>> See [1] for the full interface.
>
> +1
>
>>
>> I worked with Vincent to implement output encoding in Doxia, and we faced
>> problems that lead us think that forcing a fixed encoding was the right
>> approach to have something simple and reliable: with UTF-8 as this fixed
>> encoding, this didn't limit end-users from any country in the world.
>>
>> But now, I'm convinced it's not the right approach and API:
>> 1. some formats need to output the encoding (like HTML, or XML): we need an
>> encoding parameter, as we can't get it from a Writer instance
>
> see also DOXIA-185
>
>> 2. some formats embed images, like RTF or PDF, then need direct stream access
>> to write binary data
>>
>> Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.
>
>> Or if we want an API without filename, this method could be transformed into
>> Sink createSink( OutputStream output ) + Sink createSink( OutputStream
>> output, String encoding ).
>
> +1
>
> Thanks Hervé for that.
>
> Vincent
>
>> Any objection or idea?
>>
>> Hervé
>>
>> [1]
>> http://svn.apache.org/viewvc/maven/doxia/doxia/trunk/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/SinkFactory.java?view=markup
>>
>

Re: (output) encoding support in doxia-sink-api

by Hervé BOUTEMY :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > Then I think "Sink createSink( Writer writer )" should be removed.
>
> I prefer deprecated for backward compatibility issue.

this method was added in 1.0-beta-1-SNAPSHOT: 1.0-beta-1 hasn't been released
yet. Is there really any backward compatibility issue here?


Re: (output) encoding support in doxia-sink-api

by Vincent Siveton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/11/7 Benjamin Bentmann <benjamin.bentmann@...>:

> Vincent Siveton wrote:
>
>>> Then I think "Sink createSink( Writer writer )" should be removed.
>>
>> I prefer deprecated for backward compatibility issue.
>
> The source code says "@since 1.0-beta-1", i.e. this method was never part of
> a released Doxia version, right? Then we should be able to safely removed
> it. Having both "@since 1.0-beta-1" and "@deprecated since 1.0-beta-1" on a
> method would be quite strange ;-)

mmh seems logical  :)

Vincent

>
> Benjamin
>

Re: (output) encoding support in doxia-sink-api

by Vincent Siveton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Another question might be: Does the relation
>  1 sink -> 1 output file/stream
> always hold or are there formats that might need to output multiple files?
> In the later case, createSink( OutputStream ) would be troublesome.

The relation should persist.

Cheers,

Vincent