On Jan 22, 2009, at 4:50 PM, Hervé BOUTEMY wrote:
> Sorry, I was working on other things and missed this discussion.
> I just commented (and closed as "Not A Bug" :) ) the issue.
I agree that autodetecting is not a bullet-proof feature, but an
absolute guarantee is not required in this case. I share Jason van
Zyl's view: "If it's right most of the time, and it saves the user
from having to know or worry about it then yes I would use it." [1]
Another issue is that without autodetection, supporting more than one
type of character encoding for the APT files in a Maven project is
impossible.
That said, if autodetection is simply out of the question, let me
suggest a different tack. Doxia appears to require ISO-8859-1 for APT
files by default. This is a Western-centric encoding that lacks
support for Asian languages. It is also deprecated. According to
Wikipedia:
"The ISO/IEC working group responsible for maintaining eight-bit coded
character sets disbanded and ceased all maintenance of ISO 8859,
including ISO 8859-1, in order to concentrate on the Universal
Character Set and Unicode." [2]
I would also say that with the increasing popularity of UTF-8, the
number of encoding problems encountered by users due to Doxia favoring
ISO-8859-1 is already larger than any problems that might occur due to
bad autodetection. In other words, autodetection might be wrong some
of the time, but for many users, ISO-8859-1 is wrong all of the time.
In light of this, I suggest changing Doxia's APT handling so that it
defaults to UTF-8 rather than ISO-8859-1. Not only will this help
UTF-8 users (who may be a majority), it will also help increase
Maven's acceptance in the Asian world, a trend that is already
happening [3].
I can work on a patch for this, if there's a chance it will be accepted.
Trevor
[1]
http://www.nabble.com/Re%3A--VOTE--POM-Element-for-Source-File-Encoding-p16566779.html[2]
http://en.wikipedia.org/wiki/ISO_8859-1[3]
http://blogs.sonatype.com/people/2008/07/apache-maven-the-definitive-chinese-guide/