|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
escaped unicode characters as aliasesThe following from alternateNames.txt contain escaped Unicode characters in 2296413 788578 mk Lubaništa 1 58902 113646 fa Tabrīz 1 2181093 2540850 ar Warzāzāt 2181103 6533373 fa Meydān-e Emām Khomeinī of which these Tabrīz Warzazāt Warzāzāt are also listed in cities1000.txt as alternatenames. -- Andrew Dalke <dalke@...> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: escaped unicode characters as aliasesPlease don't hesitate to correct it: http://www.geonames.org/manual.html Marc Andrew Dalke wrote: > The following from alternateNames.txt contain escaped Unicode > characters in > > 2296413 788578 mk Lubaništa 1 > 58902 113646 fa Tabrīz 1 > 2181093 2540850 ar Warzāzāt > 2181103 6533373 fa Meydān-e Emām Khomeinī > > of which these > > Tabrīz > Warzazāt > Warzāzāt > > are also listed in cities1000.txt as alternatenames. > > -- Andrew Dalke <dalke@...> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: escaped unicode characters as aliasesOn Nov 8, 7:34 am, Marc Wick <m...@...> wrote: > Please don't hesitate to correct it:http://www.geonames.org/manual.html I had not actually known that as an option, as I've been working with the data files directly. I tried to do that and found the first difficulty was finding the records those fields referred to. All I had was: 2296413 788578 mk Lubaništa 1 58902 113646 fa Tabrīz 1 2181093 2540850 ar Warzāzāt 2181103 6533373 fa Meydān-e Emām Khomeinī Neither the main geonames.org page nor the advanced search page allow lookup by geonameid 788578. They think it's some sort of postal code. I could not find a match in cities1000.txt for that geonameid (though I could have searched in allCountries.txt), and searching for "Lubaništa" pointed out some place in Mexico which made no sense. (The language code is mk for Macedonian.) I used Google and found out that records are accessible through URLs like http://www.geonames.org/6533373 I used that to go to each record, and in every case the alternate name string is displayed correctly. This suggests some sort of import/export Unicode encoding problem. Likely one which has been fixed, but with remnants of it still in the system somewhere. I do not think I am able to edit and fix these four cases. Best regards, -- Andrew Dalke <dalke@...> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: escaped unicode characters as aliasesIt is an html encoding and it will therefore display correctly on an html page. All that needed to be done was to click on the edit link and save again. I did it for you. Best Marc Andrew Dalke wrote: > On Nov 8, 7:34 am, Marc Wick <m...@...> wrote: >> Please don't hesitate to correct it:http://www.geonames.org/manual.html > > I had not actually known that as an option, as I've been working with > the data files directly. > > I tried to do that and found the first difficulty was finding the > records those fields referred to. All I had was: > > 2296413 788578 mk Lubaništa 1 > 58902 113646 fa Tabrīz 1 > 2181093 2540850 ar Warzāzāt > 2181103 6533373 fa Meydān-e Emām Khomeinī > > Neither the main geonames.org page nor the advanced search page allow > lookup by geonameid 788578. They think it's some sort of postal code. > I could not find a match in cities1000.txt for that geonameid (though > I could have searched in allCountries.txt), and searching for > "Lubaništa" pointed out some place in Mexico which made no sense. > (The language code is mk for Macedonian.) > > I used Google and found out that records are accessible through URLs > like > http://www.geonames.org/6533373 > > I used that to go to each record, and in every case the alternate name > string is displayed correctly. > > This suggests some sort of import/export Unicode encoding problem. > Likely one which has been fixed, but with remnants of it still in the > system somewhere. > > I do not think I am able to edit and fix these four cases. > > Best regards, > > -- Andrew Dalke <dalke@...> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: escaped unicode characters as aliasesOn Nov 8, 4:51 pm, Marc Wick <m...@...> wrote: > It is an html encoding and it will therefore display correctly on an > html page. But it shouldn't, unless the source data is supposed to be HTML encoded - which it isn't. Take for example geonameid 6619831, which is for the Victoria & Albert Museum. Go to that page and view source and you'll see an error in the HTML: <meta name="description" content="Victoria & Albert Museum England Kensington and Chelsea, United Kingdom, museum" /> This is a bug. The '&' should be & and you can see that the title field is properly escaped: <title>Victoria & Albert Museum, United Kingdom</title> Potentially this opens the geonames server up to various sorts of well- known attacks. I tried to test this by adding a place named '"> testing' at the stadium in Boden. That gives me the error message error while saving: Cannot parse ' country:SE names:" names:testing fcode:STDM( fc:S )': Lexical error at line 1, column 54. Encountered: after : "\" names:testing fcode:STDM( fc:S )" That implies that special characters aren't properly being escaped before doing into the database. If I use two double quotes, it worked, that is ""> testing I'm disconcerted about that. Perhaps there is also a possible injection attack on the database? I also can't figure out where the record went so I can delete it. (Or more properly, use the correct name.) Please feel free to delete the record if it was created, or for that matter wipe my account if this was too improper. > All that needed to be done was to click on the edit link and save again. > I did it for you. Thanks. But that shouldn't have been the right solution if data conversion is correctly done all the way through, which is why I didn't consider doing it. -- Andrew Dalke <dalke@...> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
| Free embeddable forum powered by Nabble | Forum Help |