« Return to Thread: CapitalizationStandardEnglish

Re: CapitalizationStandardEnglish

by Brian Schweitzer :: Rate this Message:

Reply to Author | View in Thread

These are very problematic. 

The old Guess Case basically looked at these, shrugged, and made them all lower cased. 

The new Guess Case tries a little harder; it tries to identify adjectives that would otherwise be prepositions by looking for what I call in the code "sentence/fragment ending punctuation", namely ,.:;/ and so on, as well as end of line.  It's pretty unlikely to be a preposition if it matches "word\s[,\.:;$]" - ie, for "Come On", on is not being used as a preposition. 

The "Come On Eileen" and other such cases are much more difficult; the only real way I could come up with to identify these would need a word list of every single possible partner words for those prepositions phrases, to differentiate "Come on Down Eileen" from "Come On Eileen".  Now, I think creating such a list, or using it if we actually managed to create it, is pretty much an impossibly large task.  If anyone can think of other rules to try to match some more of these, great...  but I don't think GC can do much more to try and be intelligent without very large dictionary lists, which would also slow the code down a bit.  As for noting them, that would essentially require a heads up type of notice for any time any of the words appeared - and the list of words is long enough, and they're common enough, that such heads up notices would be generated so frequently, I would fear users would begin to ignore them entirely.  (The boy who cried wolf type of problem...)

Brian

On Mon, Apr 20, 2009 at 2:56 PM, Christopher Key <cjk32@...> wrote:
Chris B wrote:
> my personal favourite is "Come On Eileen" vs "Come on Eileen" - two
> rather different meanings :) i've just done a search on that and
> lo-and-behold there's plenty of the latter. i suppose the best
> solution would be to politely warn the editor who added/changed that
> release by adding a note to their edit, and then change it back.
> adding a track annotation about it would probably help, as well.
>
Thanks, that certainly does illustrate the problem well, as well as
pointing out the importance of correct stress in spoken English!

Having a dig through the database, there are quite a few that were
incorrectly capitalised until recently.  There are also multiple
references to [1], which points out that the same problem exists for
similar titles.  It's a difficult problem, and I guess that the best
Guess Case could do would be to warn users when dealing with potentially
ambiguous titles.

For original track, I'll revert back to 'The In Set' (I'm assuming that
this is correct), and add a note to the annotation.  That way, if it
gets 'corrected' again, it may also stand some chance of being reverted
back.

Regards,

Chris

_______________________________________________
MusicBrainz-users mailing list
MusicBrainz-users@...
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users


_______________________________________________
MusicBrainz-users mailing list
MusicBrainz-users@...
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users

 « Return to Thread: CapitalizationStandardEnglish