|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
comma in name in alternatenamesThe geoname schema which describes cities1000.dat is at http://download.geonames.org/export/dump/readme.txt It says alternatenames : alternatenames, comma separated varchar(4000) (varchar(5000) for SQL Server) The entry for geonameid 6292397 is for the city Rüti / Dorfzentrum, Südl. Teil That appears to be the same place as http://en.wikipedia.org/wiki/Rüti,_Zürich The entry contains a ",", which means my parser, which splits on ",", gets messed up. I'm going to change it to be "," not followed by space. I don't know if this is a problem with the name (that it contains a ","), with the documentation (that it doesn't say how names which contains commas are handled), or if it's that I shouldn't be using that field and should instead use the alternateNames.txt file to get these. -- Andrew Dalke <dalke@...> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: comma in name in alternatenamesThe alteratenames field is not meant to be parsed. If you want to know the individual alternate names then you should use the alteratename file. Marc Andrew Dalke wrote: > The geoname schema which describes cities1000.dat is at > > http://download.geonames.org/export/dump/readme.txt > > It says > > alternatenames : alternatenames, comma separated varchar(4000) > (varchar(5000) for SQL Server) > > The entry for geonameid 6292397 is for the city > > Rüti / Dorfzentrum, Südl. Teil > > That appears to be the same place as > > http://en.wikipedia.org/wiki/Rüti,_Zürich > > The entry contains a ",", which means my parser, which splits on ",", > gets messed up. I'm going to change it to be "," not followed by > space. > > I don't know if this is a problem with the name (that it contains a > ","), with the documentation (that it doesn't say how names which > contains commas are handled), or if it's that I shouldn't be using > that field and should instead use the alternateNames.txt file to get > these. > > -- Andrew Dalke <dalke@...> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: comma in name in alternatenamesOn Nov 8, 7:31 am, Marc Wick <m...@...> wrote: > The alteratenames field is not meant to be parsed. Then out of curiosity, why is that field present? And may I ask that the documentation somewhere mention that? Are there other fields which should not be parsed? > If you want to know the individual alternate names > then you should use the alteratename file. I don't understand why those two sources are different. Here are the alternatenames for Gothenburg, Sweden G'oteborg,GOT,Gautaborg,Geteborga,Gjoteborg,Goeteborg,Goteborg,Goteburg,Gotemburgo,Gotenburg,Gothembourg,Gothenburg,Gothoburgum,Gottenborg,Göteborg,Gøteborg,Gēteborga,Γκέτεμποργκ,Гьотеборг,Гётеборг,גוטנבורג, イェーテボリ,哥德堡 There are 23 unique names in that list. Gothenburg is geonameid 2711537 and it has 33 entries in alternateName.txt . 1235974 2711537 Goeteborg 1235975 2711537 Goteburg 1235976 2711537 Gothenburg 1235977 2711537 Gottenborg 1600989 2711537 da Göteborg 1600991 2711537 eo Göteborg 1600995 2711537 hu Göteborg 1600999 2711537 la Gothoburgum 1601003 2711537 nl Gotenburg 1601007 2711537 pt Gotemburgo 1601009 2711537 sv Göteborg 1600984 2711537 de Göteborg 1600986 2711537 es Gotemburgo 1600988 2711537 ca Göteborg 1600992 2711537 fi Göteborg 1600996 2711537 ia Göteborg 1601002 2711537 nds Göteborg 1601006 2711537 pl Göteborg 1634095 2711537 it Göteborg 2256568 2711537 no Gøteborg 1970271 2711537 is Gautaborg 2181201 2711537 iata GOT 1600987 2711537 bg Гьотеборг 1600990 2711537 el Γκέτεμποργκ 1600994 2711537 he גוטנבורג 1600998 2711537 ja イェーテボリ 1601000 2711537 lv Gēteborga 1601008 2711537 ru Гётеборг 1621129 2711537 zh 哥德堡 1600993 2711537 fr Gothembourg 1601005 2711537 no Göteborg 1 1 1600985 2711537 en Gothenburg 1 1 1600997 2711537 id Göteborg of which 19 are unique. I see that "G'oteborg", which is the first name of the alternatenames field of cities1000.txt, is not in the alternateNames.txt file. Knowing no better, I decided to use the alternatenames from cities1000.txt because that one record (which is where I live) had more alternate names, and for the project I'm working on I wanted to maximize the likelihood of getting a match. Cheers, -- Andrew Dalke <dalke@...> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: comma in name in alternatenamesThe field is present because users have asked for it, to search on I suppose. It is redundantly build from the alternate names info. The name "G'oteborg" is an ascii transliteration of the bulgarian name. You shouldn't parse anything that is not designed to be parsed. (or parse it at your own risk). Best Best Marc Andrew Dalke wrote: > On Nov 8, 7:31 am, Marc Wick <m...@...> wrote: >> The alteratenames field is not meant to be parsed. > > Then out of curiosity, why is that field present? And > may I ask that the documentation somewhere mention that? > > Are there other fields which should not be parsed? > >> If you want to know the individual alternate names >> then you should use the alteratename file. > > I don't understand why those two sources are different. > Here are the alternatenames for Gothenburg, Sweden > > G'oteborg,GOT,Gautaborg,Geteborga,Gjoteborg,Goeteborg,Goteborg,Goteburg,Gotemburgo,Gotenburg,Gothembourg,Gothenburg,Gothoburgum,Gottenborg,Göteborg,Gøteborg,Gēteborga,Γκέτεμποργκ,Гьотеборг,Гётеборг,גוטנבורג, > イェーテボリ,哥德堡 > > There are 23 unique names in that list. > > Gothenburg is geonameid 2711537 and it has 33 entries in > alternateName.txt . > > 1235974 2711537 Goeteborg > 1235975 2711537 Goteburg > 1235976 2711537 Gothenburg > 1235977 2711537 Gottenborg > 1600989 2711537 da Göteborg > 1600991 2711537 eo Göteborg > 1600995 2711537 hu Göteborg > 1600999 2711537 la Gothoburgum > 1601003 2711537 nl Gotenburg > 1601007 2711537 pt Gotemburgo > 1601009 2711537 sv Göteborg > 1600984 2711537 de Göteborg > 1600986 2711537 es Gotemburgo > 1600988 2711537 ca Göteborg > 1600992 2711537 fi Göteborg > 1600996 2711537 ia Göteborg > 1601002 2711537 nds Göteborg > 1601006 2711537 pl Göteborg > 1634095 2711537 it Göteborg > 2256568 2711537 no Gøteborg > 1970271 2711537 is Gautaborg > 2181201 2711537 iata GOT > 1600987 2711537 bg Гьотеборг > 1600990 2711537 el Γκέτεμποργκ > 1600994 2711537 he גוטנבורג > 1600998 2711537 ja イェーテボリ > 1601000 2711537 lv Gēteborga > 1601008 2711537 ru Гётеборг > 1621129 2711537 zh 哥德堡 > 1600993 2711537 fr Gothembourg > 1601005 2711537 no Göteborg 1 1 > 1600985 2711537 en Gothenburg 1 1 > 1600997 2711537 id Göteborg > > > of which 19 are unique. I see that "G'oteborg", which is > the first name of the alternatenames field of cities1000.txt, > is not in the alternateNames.txt file. > > Knowing no better, I decided to use the alternatenames > from cities1000.txt because that one record (which is > where I live) had more alternate names, and for the > project I'm working on I wanted to maximize the > likelihood of getting a match. > > Cheers, > > -- Andrew Dalke <dalke@...> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: comma in name in alternatenamesOn Nov 8, 5:03 pm, Marc Wick <m...@...> wrote: > You shouldn't parse anything that is not designed to be parsed. (or > parse it at your own risk). I understand that. I wrote the above to point out that there's nothing which describes which fields are designed to be parsed and which are not. Specifically, http://download.geonames.org/export/dump/readme.txt says: Remark : the field 'alternatenames' in the table 'geoname' is a short version of the 'alternatenames' table. You probably don't need both. If you don't need to know the language of a name variant, the field 'alternatenames' will be sufficient. If you need to know the language of a name variant, then you will need to load the table 'alternatenames' and you can drop the column in the geoname table. I did not need to know the language of the name variant, so I thought I could use this field. I also did not realize that some of the names are machine generated transliterations from other languages. I do not see that documented anywhere, so I assumes there was some other data source involved. Cheers, -- Andrew Dalke <dalke@...> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "GeoNames" group. To post to this group, send email to geonames@... To unsubscribe from this group, send email to geonames+unsubscribe@... For more options, visit this group at http://groups.google.com/group/geonames?hl=en -~----------~----~----~----~------~----~------~--~--- |
| Free embeddable forum powered by Nabble | Forum Help |