« Return to Thread: Character encoding for APT files

Re: Character encoding for APT files

by Trevor Harmon :: Rate this Message:

Reply to Author | View in Thread

On Jan 23, 2009, at 3:24 PM, Hervé BOUTEMY wrote:

> the problem with such an auto-dection in a tool like Doxia used by
> maven-site-plugin is that if the guessed encoding is not right, you  
> can't do
> anything

I was thinking that manually specifying a particular encoding would  
override the autodetection feature.

> (or you have to configure it, which is what you wanted to avoid)

If autodetection guesses wrong (and I maintain that it would seldom  
guess wrong), having to configure it those few times would be better  
than having to configure it all the time, which is what UTF-8 users  
have to do now.

>> Another issue is that without autodetection, supporting more than one
>> type of character encoding for the APT files in a Maven project is
>> impossible.
> same remarks than before: and what if guessed encoding from a file  
> is wrong?

The error rate would go from all the time to some of the time, which  
is still a win. Again, I'm assuming that autodetection is optional and  
enabled by default; if it causes problems it could be disabled,  
reverting to the same behavior as before.

> There are a lot of Maven plugins today that complain if you don't  
> configure
> default encoding: it is a simple property to add in your POM.  
> Doesn't it meet
> your needs?

The problem is that I have many dozens of POMs, and I have to declare  
the encoding in all of them. Is there some way of configuring the  
encoding globally, perhaps in settings.xml?

>> In light of this, I suggest changing Doxia's APT handling so that it
>> defaults to UTF-8 rather than ISO-8859-1. Not only will this help
>> UTF-8 users (who may be a majority),
> do you have figures, or is it a guess?

It's a guess, though there's circumstantial evidence pointing to the  
rise of UTF-8. It's definitely growing on the web [1], and text  
editors I've used, such as Eclipse on Linux and TextMate on Mac OS X,  
default to UTF-8. I'm actually surprised UTF-8 hasn't been adopted  
more quickly because it solves so many issues. But I worry that we're  
never we're never going to get there if modern applications continue  
to require native file encodings by default.

Trevor

[1] http://www.w3.org/QA/2008/05/utf8-web-growth.html

 « Return to Thread: Character encoding for APT files