unexpected aliases in alternateNames.txt

View: New views
6 Messages — Rating Filter:   Alert me  

unexpected aliases in alternateNames.txt

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


There are some alternate names I did not expect both in cities1000.txt
and in alternateNames.txt, easily found by grepping for " - " (space
hyphen space):


1894665 3165524 gl      Turín - Torino
1972921 2802361 gl      Bélxica - België
1986656 2525764 gl      Agrixento - Agrigento
1977253 223816  gl      Xibutí - Djibouti

It looks like there was a bad import of a set of Galician names.

There are a few problems with other languages, like

2014633 2787387 id      Saint-Josse-ten-Noode - Sint-Joost-ten-Node
2013778 2783474 id      Woluwe-Saint-Pierre - Sint-Pieters-Woluwe

and some anonymous ones, like

2181169 1816670         Beijing - Pekin

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: unexpected aliases in alternateNames.txt

by Marc Wick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


please feel free to correct it.

Marc

Andrew Dalke wrote:

> There are some alternate names I did not expect both in cities1000.txt
> and in alternateNames.txt, easily found by grepping for " - " (space
> hyphen space):
>
>
> 1894665 3165524 gl      Turín - Torino
> 1972921 2802361 gl      Bélxica - België
> 1986656 2525764 gl      Agrixento - Agrigento
> 1977253 223816  gl      Xibutí - Djibouti
>
> It looks like there was a bad import of a set of Galician names.
>
> There are a few problems with other languages, like
>
> 2014633 2787387 id      Saint-Josse-ten-Noode - Sint-Joost-ten-Node
> 2013778 2783474 id      Woluwe-Saint-Pierre - Sint-Pieters-Woluwe
>
> and some anonymous ones, like
>
> 2181169 1816670         Beijing - Pekin
>
> -- Andrew Dalke <dalke@...>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: unexpected aliases in alternateNames.txt

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 8, 7:32 am, Marc Wick <m...@...> wrote:
> please feel free to correct it.

There are 305 such names which are obviously wrong for the Galician
language and can be fixed with something like

perl -pe 's/(^\d+\t\d+\tgl\t)([^-]+ - )(.*$)/$1$3/'

plus another three which I spotted by hand when scanning for a "-" in
the alternate name.

Shall I send in a diff against alternateNames.txt? While I now know
about the web based interface to make corrections manually, I don't
really want to do 300+ edits by hand. If each takes 30 seconds that's
over 2.5 hours of work.

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: unexpected aliases in alternateNames.txt

by Marc Wick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


It will need someone who speaks Galego to determine which one of the two
names is Galego and which one an other language.

Marc

Andrew Dalke wrote:

> On Nov 8, 7:32 am, Marc Wick <m...@...> wrote:
>> please feel free to correct it.
>
> There are 305 such names which are obviously wrong for the Galician
> language and can be fixed with something like
>
> perl -pe 's/(^\d+\t\d+\tgl\t)([^-]+ - )(.*$)/$1$3/'
>
> plus another three which I spotted by hand when scanning for a "-" in
> the alternate name.
>
> Shall I send in a diff against alternateNames.txt? While I now know
> about the web based interface to make corrections manually, I don't
> really want to do 300+ edits by hand. If each takes 30 seconds that's
> over 2.5 hours of work.
>
> -- Andrew Dalke <dalke@...>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: unexpected aliases in alternateNames.txt

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 8, 5:13 pm, Marc Wick <m...@...> wrote:
> It will need someone who speaks Galego to determine which one of the two
> names is Galego and which one an other language.

That is of course your prerogative. I will point out that the relevant
Wikipedia site (which is where I assume the names came from, since
they are also hyphenated) imply that the first name is Galician and
the second name is the 'foreign' name. For a clear example:

http://gl.wikipedia.org/wiki/Agrixento
>    Agrixento - Agrigento
> Agrixento, Agrigento en italiano. Cidade capital da provincia de
> Agrixento, Sicilia, Italia.. 55.000 habitantes.

Which is quite easily interpreted, from its similarities to Spanish
and other Romance languages, as "Agrixento, Agrigento in Italian.
Capital city of the province of Argixento, Sicily, Italy. 55,000
inhabitants."

Similarly, http://gl.wikipedia.org/wiki/Bélxica
>   Bélxica - België
> O Reino de Bélxica (Koninkrijk België en neerlandés, Royaume de
> Belgique en francés e Königreich Belgien en alemán) é un país
> da Europa Noroccidental

Which would be "The Kingdom of Bélxica (Koninkrijk België in Dutch,
Royaume de Belgique in French and Königreich Belgien in German) is a
country in Northwest Europe."

http://gl.wikipedia.org/wiki/Tur%C3%ADn
>    Turín - Torino
> Turín é unha comuna italiana, capital da rexión de Piemonte cunha poboación de 900.608 persoas

Translated: "Turín is a Italian municipality, capital of the Piemote
region with a population of 900,608 people."


I had looked over the 300+ name pairs and everyone one of them looks
like it has Galacian first and the foreign name second.

I assumed this was an import error from whatever the primary data
source was, and I also assumed that that conversion was not done by
someone who knows the language.

Again, feel free to defer this transformation until someone who knows
Galacian spots the error and is willing to report it, or that someone
like me doesn't mind changing 300+ values by hand.. I did not know
that that was a requirement.

Best regards,

-- Andrew Dalke <dalke@...>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: unexpected aliases in alternateNames.txt

by Andrew Dalke-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 8, 7:32 am, Marc Wick <m...@...> wrote:
> please feel free to correct it.

Plus, I can't change the "Beijin - Pekin" entry, which is two
different transliterations of the same name, because

     The record you want to edit is locked for updates for userlevel1

Cheers,

-- Andrew Dalke <dalke@...>


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To post to this group, send email to geonames@...
To unsubscribe from this group, send email to geonames+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/geonames?hl=en
-~----------~----~----~----~------~----~------~--~---