pgsql2shp : Encoding headache

View: New views
6 Messages — Rating Filter:   Alert me  

pgsql2shp : Encoding headache

by Arnaud Lesauvage-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all!

I have an UTF8 encoded shapefile, and an UTF8 encoded postgis-enabled
database. I want my shapefile to be encoded in WIN1252, and a particular
field to be in uppercase.

Since I am on windows, I don't have an iconv executable. Therefore, I am
trying to :
- dump the shapefile with shp2pgsql to an sql text file
- load the resulting sql file into a postgresql table
- dump this table into a shapefile with pgsql2shp (in WIN1252)

To load the shapefile into postgresql, I had to dump it without
specifying a '-W' argument, set my client_encoding to UTF8, then load
the sql file into postgresql.

If I look at the data with pgAdmin (with client_encoding set to UTF8),
it looks good : accentuation is fine, special characters are fine.

To dump the data in a WIN1252-encoded shapefile, pgsql2shp has no
encoding argument, so I set my client encoding to WIN1252 thtough the
environment variable PGCLIENTENCODING.

If I just dump the file this way, it seems to be fine. So this command
works fine :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT
mytext, mygeom FROM mytemptable"
->  [621679 rows]

But then, if I dump it through a query to have my field in uppercase, I
get an error 'character 0xc29f of encoding UTF8 has no equivalent in
WIN1252' (translated by myself, the message is in French)
The command is simply :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT
upper(mytext) as mytext, mygeom FROM mytemptable"

So I guess there is a problem with my 'upper' conversion, but I have no
idea what this 0xc29f character could be.

Any help would be greatly appreciated.
Thanks a lot !

--
Arnaud Lesauvage
_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: pgsql2shp : Encoding headache

by InterRob :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Does that last query (invoking the upper() function) actually run well when executed in pgsql console?


Rob

2009/10/16 Arnaud Lesauvage <arnaud.listes@...>
Hi all!

I have an UTF8 encoded shapefile, and an UTF8 encoded postgis-enabled database. I want my shapefile to be encoded in WIN1252, and a particular field to be in uppercase.

Since I am on windows, I don't have an iconv executable. Therefore, I am trying to :
- dump the shapefile with shp2pgsql to an sql text file
- load the resulting sql file into a postgresql table
- dump this table into a shapefile with pgsql2shp (in WIN1252)

To load the shapefile into postgresql, I had to dump it without specifying a '-W' argument, set my client_encoding to UTF8, then load the sql file into postgresql.

If I look at the data with pgAdmin (with client_encoding set to UTF8), it looks good : accentuation is fine, special characters are fine.

To dump the data in a WIN1252-encoded shapefile, pgsql2shp has no encoding argument, so I set my client encoding to WIN1252 thtough the environment variable PGCLIENTENCODING.

If I just dump the file this way, it seems to be fine. So this command works fine :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT mytext, mygeom FROM mytemptable"
->  [621679 rows]

But then, if I dump it through a query to have my field in uppercase, I get an error 'character 0xc29f of encoding UTF8 has no equivalent in WIN1252' (translated by myself, the message is in French)
The command is simply :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT upper(mytext) as mytext, mygeom FROM mytemptable"

So I guess there is a problem with my 'upper' conversion, but I have no idea what this 0xc29f character could be.

Any help would be greatly appreciated.
Thanks a lot !

--
Arnaud Lesauvage
_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: pgsql2shp : Encoding headache

by Arnaud Lesauvage-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

InterRob a écrit :
> Does that last query (invoking the upper() function) actually run well when
> executed in pgsql console?

Hi Rob.
No, if I issue a "SET client_encoding TO win1252;" before running
"SELECT upper(myfield) FROM mytable", I get the same error.

Arnaud
_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Parent Message unknown Re: [GENERAL] pgsql2shp : Encoding headache

by Arnaud Lesauvage-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raymond O'Donnell a écrit :
> If it's any help to you, you can get iconv (and a bunch of other helpful
> stuff) from GnuWin32:
>
>   http://gnuwin32.sourceforge.net/

Thanks for your help Raymond.

I tried iconv but I have other problems now.
I still have to load the file into postgresql because the shapefiles
datafile (.dbf) is associated with an index file, and I have to use
pgsql2shp to rebuild it.
I'll try some more though. Maybe iconv before shp2pgsql, then load with
client_encoding set to WIN1252, then dump.
I'll see how it goes.

Arnaud
_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [GENERAL] pgsql2shp : Encoding headache

by Arnaud Lesauvage-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Arnaud Lesauvage a écrit :
> But then, if I dump it through a query to have my field in uppercase, I
> get an error 'character 0xc29f of encoding UTF8 has no equivalent in
> WIN1252' (translated by myself, the message is in French)
> The command is simply :
> pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT
> upper(mytext) as mytext, mygeom FROM mytemptable"

OK, I narrowed down the problem to the WIN1252 encoding.
Using LATIN1 or LATIN9 for instance works correctly.
Since my application seems to work with LATIN9, I'll go with it.

I am still perplex though. What is this 0xc29f character ? An internet
search tells me that this is some Kanju character. I am quite confident
that if this is true, it would not convert any better to LATIN9 than to
WIN1252.

Also, doing a search like :
SELECT * FROM mytable WHERE upper(myflied) ILIKE u&'%c29f%';
Gives me 0 result.
Am I wrong to think that the error 'character 0xc29f of UTF8' relates to
the character with code point C29F in UTF8 ?

Thanks again for your help/lightings on this matter.

Arnaud

_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [GENERAL] pgsql2shp : Encoding headache

by InterRob :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I would do this last query searching for the 0xC29F character WITHOUT the upper() function on the source table, in the native (to table) UTF8 client encoding. No result either?


Rob

2009/10/16 Arnaud Lesauvage <arnaud.listes@...>
Arnaud Lesauvage a écrit :

But then, if I dump it through a query to have my field in uppercase, I get an error 'character 0xc29f of encoding UTF8 has no equivalent in WIN1252' (translated by myself, the message is in French)
The command is simply :
pgsql2shp -f myouput.shp -u postgres -g mygeom mydatabase "SELECT upper(mytext) as mytext, mygeom FROM mytemptable"

OK, I narrowed down the problem to the WIN1252 encoding.
Using LATIN1 or LATIN9 for instance works correctly.
Since my application seems to work with LATIN9, I'll go with it.

I am still perplex though. What is this 0xc29f character ? An internet search tells me that this is some Kanju character. I am quite confident that if this is true, it would not convert any better to LATIN9 than to WIN1252.

Also, doing a search like :
SELECT * FROM mytable WHERE upper(myflied) ILIKE u&'%c29f%';
Gives me 0 result.
Am I wrong to think that the error 'character 0xc29f of UTF8' relates to the character with code point C29F in UTF8 ?

Thanks again for your help/lightings on this matter.

Arnaud


_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users