comma in name in alternatenames

View: New views
5 Messages — Rating Filter:   Alert me  

comma in name in alternatenames

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The geoname schema which describes cities1000.dat is at

http://download.geonames.org/export/dump/readme.txt

It says

alternatenames    : alternatenames, comma separated varchar(4000)
(varchar(5000) for SQL Server)

The entry for geonameid 6292397 is for the city

   Rüti / Dorfzentrum, Südl. Teil

That appears to be the same place as

http://en.wikipedia.org/wiki/Rüti,_Zürich

The entry contains a ",", which means my parser, which splits on ",",
gets messed up. I'm going to change it to be "," not followed by
space.

I don't know if this is a problem with the name (that it contains a
","), with the documentation (that it doesn't say how names which
contains commas are handled), or if it's that I shouldn't be using
that field and should instead use the alternateNames.txt file to get
these.

 -- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: comma in name in alternatenames

by Marc Wick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The alteratenames field is not meant to be parsed. If you want to know
the individual alternate names then you should use the alteratename file.

Marc

Andrew Dalke wrote:

> The geoname schema which describes cities1000.dat is at
>
> http://download.geonames.org/export/dump/readme.txt
>
> It says
>
> alternatenames    : alternatenames, comma separated varchar(4000)
> (varchar(5000) for SQL Server)
>
> The entry for geonameid 6292397 is for the city
>
>    Rüti / Dorfzentrum, Südl. Teil
>
> That appears to be the same place as
>
> http://en.wikipedia.org/wiki/Rüti,_Zürich
>
> The entry contains a ",", which means my parser, which splits on ",",
> gets messed up. I'm going to change it to be "," not followed by
> space.
>
> I don't know if this is a problem with the name (that it contains a
> ","), with the documentation (that it doesn't say how names which
> contains commas are handled), or if it's that I shouldn't be using
> that field and should instead use the alternateNames.txt file to get
> these.
>
>  -- Andrew Dalke <dalke@...>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: comma in name in alternatenames

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 8, 7:31 am, Marc Wick <m...@...> wrote:
> The alteratenames field is not meant to be parsed.

Then out of curiosity, why is that field present? And
may I ask that the documentation somewhere mention that?

Are there other fields which should not be parsed?

> If you want to know the individual alternate names
> then you should use the alteratename file.

I don't understand why those two sources are different.
Here are the alternatenames for Gothenburg, Sweden

G'oteborg,GOT,Gautaborg,Geteborga,Gjoteborg,Goeteborg,Goteborg,Goteburg,Gotemburgo,Gotenburg,Gothembourg,Gothenburg,Gothoburgum,Gottenborg,Göteborg,Gøteborg,Gēteborga,Γκέτεμποργκ,Гьотеборг,Гётеборг,גוטנבורג,
イェーテボリ,哥德堡

There are 23 unique names in that list.

Gothenburg is geonameid 2711537 and it has 33 entries in
alternateName.txt .

1235974 2711537         Goeteborg
1235975 2711537         Goteburg
1235976 2711537         Gothenburg
1235977 2711537         Gottenborg
1600989 2711537 da      Göteborg
1600991 2711537 eo      Göteborg
1600995 2711537 hu      Göteborg
1600999 2711537 la      Gothoburgum
1601003 2711537 nl      Gotenburg
1601007 2711537 pt      Gotemburgo
1601009 2711537 sv      Göteborg
1600984 2711537 de      Göteborg
1600986 2711537 es      Gotemburgo
1600988 2711537 ca      Göteborg
1600992 2711537 fi      Göteborg
1600996 2711537 ia      Göteborg
1601002 2711537 nds     Göteborg
1601006 2711537 pl      Göteborg
1634095 2711537 it      Göteborg
2256568 2711537 no      Gøteborg
1970271 2711537 is      Gautaborg
2181201 2711537 iata    GOT
1600987 2711537 bg      Гьотеборг
1600990 2711537 el      Γκέτεμποργκ
1600994 2711537 he      גוטנבורג
1600998 2711537 ja      イェーテボリ
1601000 2711537 lv      Gēteborga
1601008 2711537 ru      Гётеборг
1621129 2711537 zh      哥德堡
1600993 2711537 fr      Gothembourg
1601005 2711537 no      Göteborg        1       1
1600985 2711537 en      Gothenburg      1       1
1600997 2711537 id      Göteborg


of which 19 are unique. I see that "G'oteborg", which is
the first name of the alternatenames field of cities1000.txt,
is not in the  alternateNames.txt file.

Knowing no better, I decided to use the alternatenames
from cities1000.txt because that one record (which is
where I live) had more alternate names, and for the
project I'm working on I wanted to maximize the
likelihood of getting a match.

Cheers,

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: comma in name in alternatenames

by Marc Wick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The field is present because users have asked for it, to search on I
suppose. It is redundantly build from the alternate names info. The name
"G'oteborg" is an ascii transliteration of the bulgarian name.

You shouldn't parse anything that is not designed to be parsed. (or
parse it at your own risk).

Best

Best

Marc

Andrew Dalke wrote:

> On Nov 8, 7:31 am, Marc Wick <m...@...> wrote:
>> The alteratenames field is not meant to be parsed.
>
> Then out of curiosity, why is that field present? And
> may I ask that the documentation somewhere mention that?
>
> Are there other fields which should not be parsed?
>
>> If you want to know the individual alternate names
>> then you should use the alteratename file.
>
> I don't understand why those two sources are different.
> Here are the alternatenames for Gothenburg, Sweden
>
> G'oteborg,GOT,Gautaborg,Geteborga,Gjoteborg,Goeteborg,Goteborg,Goteburg,Gotemburgo,Gotenburg,Gothembourg,Gothenburg,Gothoburgum,Gottenborg,Göteborg,Gøteborg,Gēteborga,Γκέτεμποργκ,Гьотеборг,Гётеборг,גוטנבורג,
> イェーテボリ,哥德堡
>
> There are 23 unique names in that list.
>
> Gothenburg is geonameid 2711537 and it has 33 entries in
> alternateName.txt .
>
> 1235974 2711537         Goeteborg
> 1235975 2711537         Goteburg
> 1235976 2711537         Gothenburg
> 1235977 2711537         Gottenborg
> 1600989 2711537 da      Göteborg
> 1600991 2711537 eo      Göteborg
> 1600995 2711537 hu      Göteborg
> 1600999 2711537 la      Gothoburgum
> 1601003 2711537 nl      Gotenburg
> 1601007 2711537 pt      Gotemburgo
> 1601009 2711537 sv      Göteborg
> 1600984 2711537 de      Göteborg
> 1600986 2711537 es      Gotemburgo
> 1600988 2711537 ca      Göteborg
> 1600992 2711537 fi      Göteborg
> 1600996 2711537 ia      Göteborg
> 1601002 2711537 nds     Göteborg
> 1601006 2711537 pl      Göteborg
> 1634095 2711537 it      Göteborg
> 2256568 2711537 no      Gøteborg
> 1970271 2711537 is      Gautaborg
> 2181201 2711537 iata    GOT
> 1600987 2711537 bg      Гьотеборг
> 1600990 2711537 el      Γκέτεμποργκ
> 1600994 2711537 he      גוטנבורג
> 1600998 2711537 ja      イェーテボリ
> 1601000 2711537 lv      Gēteborga
> 1601008 2711537 ru      Гётеборг
> 1621129 2711537 zh      哥德堡
> 1600993 2711537 fr      Gothembourg
> 1601005 2711537 no      Göteborg        1       1
> 1600985 2711537 en      Gothenburg      1       1
> 1600997 2711537 id      Göteborg
>
>
> of which 19 are unique. I see that "G'oteborg", which is
> the first name of the alternatenames field of cities1000.txt,
> is not in the  alternateNames.txt file.
>
> Knowing no better, I decided to use the alternatenames
> from cities1000.txt because that one record (which is
> where I live) had more alternate names, and for the
> project I'm working on I wanted to maximize the
> likelihood of getting a match.
>
> Cheers,
>
> -- Andrew Dalke <dalke@...>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: comma in name in alternatenames

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 8, 5:03 pm, Marc Wick <m...@...> wrote:
> You shouldn't parse anything that is not designed to be parsed. (or
> parse it at your own risk).

I understand that. I wrote the above to point out that there's nothing
which describes which fields are designed to be parsed and which are
not.

Specifically, http://download.geonames.org/export/dump/readme.txt
says:

    Remark : the field 'alternatenames' in the table 'geoname' is a
    short version of the 'alternatenames' table. You probably don't
    need both. If you don't need to know the language of a name
variant,
    the field 'alternatenames' will be sufficient. If you need to know
    the language of a name variant, then you will need to load the
table
    'alternatenames' and you can drop the column in the geoname table.

I did not need to know the language of the name variant, so I thought
I could use this field.

I also did not realize that some of the names are machine generated
transliterations from other languages. I do not see that documented
anywhere, so I assumes there was some other data source involved.

Cheers,

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---