On Mon, Sep 5, 2011 at 21:48, Bjoern Hoehrmann <derhoermi@...> wrote:
> Well, there are various problems as for your purposes you would need
> many relationships like linking names to geographical regions and to
> time even for seemingly simple things like gender, what may be a dis-
> tinctly female name at a given time and place might well be used for
> males elsewhere. Obviously this is commercially valuable data, so you
> don't get sophisticated republishable databases for free, if at all.
I figure so, and I'm okay with it not being perfect. But I think it'd
be a nice thing to do to make that kind of database/service more
broadly available, and if possible to extract some algorithm that
doesn't require a giant database that still has some ability to
approximate things (e.g. by simple transformations of full name, maybe
also given browser locale information).
And of course, as you point out, many names are in fact ambiguous as
to traits like gender… but one could still try to get demographic
data, perhaps, from e.g. census figures.
As I said, my goal is only to be able to give a reasonably good guess
(that would then be, one hopes, edited and approved by the user in
question), as a kind of PoC to accompany this article as to how one
can derive alternate forms, etc.
> * all 5 "Sai" are male (see caveats)
FWIW, I am too (well, at least insofar as we're using non-text-field
genders*). TTBOMK most people named Sai are Indian, and it is in fact
a male name there. (Mine is completely unrelated to that etymology,
* Actually, that minds me: should there be an analogous W3C document
for how to treat gender?
I'd be willing to draft one, but I'm completely unfamiliar with your
processes about how such things are decided.