gb2312 encoding

View: New views
3 Messages — Rating Filter:   Alert me  

gb2312 encoding

by alex242 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I have some problems with HTML documents in gb2313 (simple chinese) encoding. After using Tidy I get some unreadable characters. Example of document: http://www.chemspider.com/ArticlesHandler.ashx?type=art&id=69

my Tidy config file:

output-file: res.html
error-file: error.txt
char-encoding: utf8
output-bom: yes
output-encoding: utf8

I've already spent several days trying to solve this problem and without any success... so, if sombody can give some advise how to work with different encodings in Tide, it will be much appreciated.

best regards,
Alex

Re: gb2312 encoding

by Arnaud Desitter-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi,

You need to convert your files from whatever encoding they are in
(gb2313 in your case) to UTF8. iconv is an option.

Regards,

2008/8/4 alex242 <pshenichnov@...>:

>
> Hello,
>
> I have some problems with HTML documents in gb2313 (simple chinese)
> encoding. After using Tidy I get some unreadable characters. Example of
> document:  http://www.chemspider.com/ArticlesHandler.ashx?type=art&id=69
> http://www.chemspider.com/ArticlesHandler.ashx?type=art&id=69
>
> my Tidy config file:
>
> output-file: res.html
> error-file: error.txt
> char-encoding: utf8
> output-bom: yes
> output-encoding: utf8
>
> I've already spent several days trying to solve this problem and without any
> success... so, if sombody can give some advise how to work with different
> encodings in Tide, it will be much appreciated.
>
> best regards,
> Alex
> --
> View this message in context: http://www.nabble.com/gb2312-encoding-tp18803906p18803906.html
> Sent from the w3.org - html-tidy mailing list archive at Nabble.com.
>
>
>
>



Re: gb2312 encoding

by Eric Frost :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Alex,

Try this: http://www.iconv.com/iconv.htm

I never knew there were so many character sets!

I almost had ASCII memorized at one point programming Commodores...

Eric

______________________________________
Eric Frost, PhD
http://www.mappoint2009.com
http://www.pushpintool.com


--------------------------------------------------
From: "Arnaud Desitter" <arnaud02@...>
Sent: Monday, August 04, 2008 10:08 AM
To: "alex242" <pshenichnov@...>
Cc: <html-tidy@...>
Subject: Re: gb2312 encoding

>
> Hi,
>
> You need to convert your files from whatever encoding they are in
> (gb2313 in your case) to UTF8. iconv is an option.
>
> Regards,
>
> 2008/8/4 alex242 <pshenichnov@...>:
>>
>> Hello,
>>
>> I have some problems with HTML documents in gb2313 (simple chinese)
>> encoding. After using Tidy I get some unreadable characters. Example of
>> document:  http://www.chemspider.com/ArticlesHandler.ashx?type=art&id=69
>> http://www.chemspider.com/ArticlesHandler.ashx?type=art&id=69
>>
>> my Tidy config file:
>>
>> output-file: res.html
>> error-file: error.txt
>> char-encoding: utf8
>> output-bom: yes
>> output-encoding: utf8
>>
>> I've already spent several days trying to solve this problem and without
>> any
>> success... so, if sombody can give some advise how to work with different
>> encodings in Tide, it will be much appreciated.
>>
>> best regards,
>> Alex