|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
RE: How to compile c++ code without strip off utf-8 BOM?Hi Tao Wang,
My test.cpp source is UTF-8 with BOM. If I compile it like this... g++ -x c++ <(xxd -g 1 -s 3 test.cpp | xxd -g 1 -s -3 -r) -o a.out ... that strips out the first three bytes at the beginning. For test.cpp, this happens to be the BOM (ef bb bf) at the beginning. You'd may want to create a little 'stripBOM' program that behaves like 'cat', but gobbles the BOM if present. Or you could use awk, sed, perl, or your favorite-text-munging-tool-of-choice to perform the same conversion. I just used xxd because it was quick, for illustrative purposes. (There's probably a more suitable unix tool than xxd for this kind of cat-with-offset, but you'd want something that filters out BOM rather than always offsetting.) HTH, --Ejlay |
|
|
Re: How to compile c++ code without strip off utf-8 BOM?> Hi Tao Wang,
> > My test.cpp source is UTF-8 with BOM. > > If I compile it like this... > > g++ -x c++ <(xxd -g 1 -s 3 test.cpp | xxd -g 1 -s -3 -r) -o a.out > > ... that strips out the first three bytes at the beginning. For test.cpp, this happens to be the BOM (ef bb bf) at the beginning. > > You'd may want to create a little 'stripBOM' program that behaves like 'cat', but gobbles the BOM if present. > > Or you could use awk, sed, perl, or your favorite-text-munging-tool-of-choice to perform the same conversion. I just used xxd because it was quick, for illustrative purposes. (There's probably a more suitable unix tool than xxd for this kind of cat-with-offset, but you'd want something that filters out BOM rather than always offsetting.) > > HTH, > --Ejlay > Hi Eljay and Tao Wang, I have experienced the same problem working in a multi-platform environment with a shared repository. In my case the source files have no BOM (they are stored in the server using the Windows machines' native encoding), so my solution was to add -finput-charset=WINDOWS-1252 to gcc's command line. Unfortunately, it seems like iconv has no way to insert/remove the BOM, so Tao Wang is out of luck. Eljay's solution isn't always viable either, because if the source file #includes a header with the BOM the compilation fails. I think there are two possible ways out: 1) Automatically execute a conversion command (like uconv --remove-signature) at checkouts/commits 2) Install a modified libiconv with an additional character set "UTF8-BOM" Best regards Dario |
| Free embeddable forum powered by Nabble | Forum Help |