On Jan 23, 2009, at 3:24 PM, Hervé BOUTEMY wrote:
> the problem with such an auto-dection in a tool like Doxia used by
> maven-site-plugin is that if the guessed encoding is not right, you
> can't do
> anything
I was thinking that manually specifying a particular encoding would
override the autodetection feature.
> (or you have to configure it, which is what you wanted to avoid)
If autodetection guesses wrong (and I maintain that it would seldom
guess wrong), having to configure it those few times would be better
than having to configure it all the time, which is what UTF-8 users
have to do now.
>> Another issue is that without autodetection, supporting more than one
>> type of character encoding for the APT files in a Maven project is
>> impossible.
> same remarks than before: and what if guessed encoding from a file
> is wrong?
The error rate would go from all the time to some of the time, which
is still a win. Again, I'm assuming that autodetection is optional and
enabled by default; if it causes problems it could be disabled,
reverting to the same behavior as before.
> There are a lot of Maven plugins today that complain if you don't
> configure
> default encoding: it is a simple property to add in your POM.
> Doesn't it meet
> your needs?
The problem is that I have many dozens of POMs, and I have to declare
the encoding in all of them. Is there some way of configuring the
encoding globally, perhaps in settings.xml?
>> In light of this, I suggest changing Doxia's APT handling so that it
>> defaults to UTF-8 rather than ISO-8859-1. Not only will this help
>> UTF-8 users (who may be a majority),
> do you have figures, or is it a guess?
It's a guess, though there's circumstantial evidence pointing to the
rise of UTF-8. It's definitely growing on the web [1], and text
editors I've used, such as Eclipse on Linux and TextMate on Mac OS X,
default to UTF-8. I'm actually surprised UTF-8 hasn't been adopted
more quickly because it solves so many issues. But I worry that we're
never we're never going to get there if modern applications continue
to require native file encodings by default.
Trevor
[1]
http://www.w3.org/QA/2008/05/utf8-web-growth.html