escaped unicode characters as aliases

View: New views
5 Messages — Rating Filter:   Alert me  

escaped unicode characters as aliases

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The following from alternateNames.txt contain escaped Unicode
characters in

2296413 788578  mk      Lubaništa   1
58902   113646  fa      Tabrīz      1
2181093 2540850 ar      Warzāzāt
2181103 6533373 fa      Meydān-e Emām Khomeinī

of which these

Tabrīz
Warzazāt
Warzāzāt

are also listed in cities1000.txt as alternatenames.

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: escaped unicode characters as aliases

by Marc Wick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Please don't hesitate to correct it:
http://www.geonames.org/manual.html

Marc

Andrew Dalke wrote:

> The following from alternateNames.txt contain escaped Unicode
> characters in
>
> 2296413 788578  mk      Lubaništa   1
> 58902   113646  fa      Tabrīz      1
> 2181093 2540850 ar      Warzāzāt
> 2181103 6533373 fa      Meydān-e Emām Khomeinī
>
> of which these
>
> Tabrīz
> Warzazāt
> Warzāzāt
>
> are also listed in cities1000.txt as alternatenames.
>
> -- Andrew Dalke <dalke@...>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: escaped unicode characters as aliases

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 8, 7:34 am, Marc Wick <m...@...> wrote:
> Please don't hesitate to correct it:http://www.geonames.org/manual.html

I had not actually known that as an option, as I've been working with
the data files directly.

I tried to do that and found the first difficulty was finding the
records those fields referred to. All I had was:

2296413 788578  mk      Lubaništa   1
58902   113646  fa      Tabrīz      1
2181093 2540850 ar      Warzāzāt
2181103 6533373 fa      Meydān-e Emām Khomeinī

Neither the main geonames.org page nor the advanced search page allow
lookup by geonameid 788578. They think it's some sort of postal code.
I could not find a match in cities1000.txt for that geonameid (though
I could have searched in allCountries.txt), and searching for
"Lubaništa" pointed out some place in Mexico which made no sense.
(The language code is mk for Macedonian.)

I used Google and found out that records are accessible through URLs
like
  http://www.geonames.org/6533373

I used that to go to each record, and in every case the alternate name
string is displayed correctly.

This suggests some sort of import/export Unicode encoding problem.
Likely one which has been fixed, but with remnants of it still in the
system somewhere.

I do not think I am able to edit and fix these four cases.

Best regards,

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: escaped unicode characters as aliases

by Marc Wick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


It is an html encoding and it will therefore display correctly on an
html page.
All that needed to be done was to click on the edit link and save again.
I did it for you.

Best

Marc

Andrew Dalke wrote:

> On Nov 8, 7:34 am, Marc Wick <m...@...> wrote:
>> Please don't hesitate to correct it:http://www.geonames.org/manual.html
>
> I had not actually known that as an option, as I've been working with
> the data files directly.
>
> I tried to do that and found the first difficulty was finding the
> records those fields referred to. All I had was:
>
> 2296413 788578  mk      Lubaništa   1
> 58902   113646  fa      Tabrīz      1
> 2181093 2540850 ar      Warzāzāt
> 2181103 6533373 fa      Meydān-e Emām Khomeinī
>
> Neither the main geonames.org page nor the advanced search page allow
> lookup by geonameid 788578. They think it's some sort of postal code.
> I could not find a match in cities1000.txt for that geonameid (though
> I could have searched in allCountries.txt), and searching for
> "Lubaništa" pointed out some place in Mexico which made no sense.
> (The language code is mk for Macedonian.)
>
> I used Google and found out that records are accessible through URLs
> like
>   http://www.geonames.org/6533373
>
> I used that to go to each record, and in every case the alternate name
> string is displayed correctly.
>
> This suggests some sort of import/export Unicode encoding problem.
> Likely one which has been fixed, but with remnants of it still in the
> system somewhere.
>
> I do not think I am able to edit and fix these four cases.
>
> Best regards,
>
> -- Andrew Dalke <dalke@...>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: escaped unicode characters as aliases

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 8, 4:51 pm, Marc Wick <m...@...> wrote:
> It is an html encoding and it will therefore display correctly on an
> html page.

But it shouldn't, unless the source data is supposed to be HTML
encoded - which it isn't.

Take for example geonameid 6619831, which is for the Victoria & Albert
Museum. Go to that page and view source and you'll see an error in the
HTML:

<meta name="description" content="Victoria & Albert Museum England
Kensington and Chelsea, United Kingdom, museum" />

This is a bug. The '&' should be & and you can see that the title
field is properly escaped:

<title>Victoria & Albert Museum, United Kingdom</title>

Potentially this opens the geonames server up to various sorts of well-
known attacks.

I tried to test this by adding a place named '"> testing' at the
stadium in Boden. That gives me the error message

error while saving:
Cannot parse ' country:SE names:" names:testing fcode:STDM( fc:S )':
Lexical error at line 1, column 54. Encountered: after : "\"
names:testing fcode:STDM( fc:S )"

That implies that special characters aren't properly being escaped
before doing into the database. If I use two double quotes, it worked,
that is

   ""> testing

I'm disconcerted about that. Perhaps there is also a possible
injection attack on the database? I also can't figure out where the
record went so I can delete it. (Or more properly, use the correct
name.) Please feel free to delete the record if it was created, or for
that matter wipe my account if this was too improper.


> All that needed to be done was to click on the edit link and save again.
> I did it for you.

Thanks. But that shouldn't have been the right solution if data
conversion is correctly done all the way through, which is why I
didn't consider doing it.

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---