|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
Clarify removing data, simply because we have too muchHi everyone,
This email is due to http://musicbrainz.org/show/edit/?editid=11115778 , and the related 100+ edits. These edits are removing every single Wikipedia URL-artist AR from Wolfgang Amadeus Mozart, except the links to the English and German Wikipedias, simply on the argument that "NOTHING warrants 100 wikilinks for Mozart." While KRSCuan has 6 editors willing to vote yes, in my opinion, there is no justification for these destructive edits. The Wikipedia AR's guideline says only that "This Advanced Relationship Type is used to link an artist or a release to its corresponding Wikipedia page. The full URL should be used, not just the Wiki Name. " It says nothing about limitations on which Wikipedias should or should not be linked, so long as they are valid links. There are benefits to having links for any Wikipedia URL that is valid: * As mentioned as far back as 2005, in a similar debate (in which adding the additional URLs passed 8:6 http://musicbrainz.org/show/edit/?editid=2836695 ), Wikipedia pages are not simply translations of each other. Each page is independantly written, and thus there can be entirely unique information to be found on the page inone language, which is not present on similar pages in other languages. * There are data client benefits: A client who wants only pages in Russian can easily filter the AR type's URLs for pages using whatever foo for foo.wikipedia.org they wish. The BBC uses this AR now to show Wikipedia data in English; a Russian data client wishing to show only pages in Russian, or a Chinese client wishing to show pages only in Chinese, etc, can easily filter the data to get the correct subset of urls. It is not a simple task, however, to get to that same URL if all you have is some random Wikipedia URL - they then have to load the Wikipedia page on the fly, read its non-URL interlink list, determine if the one they want is there, then construct the URL. ie, it's something they likely won't do, and it's significantly less data-client-friendly than simply filtering existing URLs we could provide. (If MusicBrainz ever decided to add Wikipedia text, and we were in a future server release with i18n support, this would make i18n Wikipedia text inclusion simpler for us, for exactly the same reasons.) * MusicBrainz users speak other languages than English, and many are not at all fluent in English (if they speak it at all). For the Japanese user, is not a link to a page in Japanese more helpful than a link to a page in English? The only negatives given in the edit have been: * Link rot * The list of Wikipedia URLs can grow incomplete * Claims of an unwritten rule that the AR should only "link to native language and English, with some exceptions." Link rot - links that go bad, or become redirects - is not solved in this way. We do indeed need a bot to autocheck urls for link rot, but without that, I don't see how this is not an argument against our adding *any* URL ARs. Certainly, if we remove a ton of URLs, then we don't care if those URLs change; but is the sane answer to "URLs can change" really to remove all the URLs before they have a chance to change? As for incomplete lists, no MusicBrainz data likely is ever fully complete in a future-proof manner. All data constantly can be updated, with new releases, new REs, new ARs interlinkages, new URLs to AR, etc. Sure, the list of Wikipedia URLs for an artist ends up missing some, because new language Wikipedias have grown up. I don't buy this as being a rationale to thus not try to link to Wikipedia; how does it really make sense to say "Well, we have 100 links, but now there's 103 pages to link to. I (the editor) just noticed the missing three. But, instead of linking the new 3 pages, I think it makes more sense to remove 98 of the existing pages, and not bother to even try to link to any valid page." As for this unwritten rule somehow permitting these destructive edits, I don't buy it. The edit history doesn't support that rule. Yes, there have been attempts by some to create such a rule, dating back to the 2005 edit linked above. However, there's a reason that rule is not part of the guideline; namely, it frequently has failed, when put to a vote, and there is far from majority, or universal, acceptance. To my mind, some such unwritten rule is not a rule at all, it's merely an idea *some users* put forward. Not being part of any guideline, however, I do not at all accept it as valid rationale for destructive edits. In my opinion, creating over a hundred remove AR edits, all at once, and with no consultation of the style or users lists, is an attempt to push them through regardless of validity. These ARs are perfectly ok and valid per every single guideline and style principle. While many of us check the "remove release" edits, how many actually check all the "remove AR" edits - or are willing to cast over 100 no votes all at once? Brian _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too muchHello, On Sun, Aug 30, 2009 at 09:49:51AM -0400, Brian Schweitzer wrote: > Hi everyone, > > This email is due to http://musicbrainz.org/show/edit/?editid=11115778 , and > the related 100+ edits. I haven't looked at the other edits, but that particular edit looks fine. I do think a wikipedia link should be useful. If an artist is barely notable and has a single stub in wikipedia I think it's OK to link to that, but I also consider that an exception. In this case, with a very famous composor, I think we should be more critical of the articles we link to. I assume the english and german articles are informative and well-written, and thus are useful resources to link to. Looking a bit further the dutch version is fine too, but for example afrikaans (af), frisian (fy) and the edit mentioned above (pam) are stubs or barely more than stubs. These stubs are not useful links IMO, and those users interested in them can still easily reach them by following any of the existing wikipedia links and then picking their favourite language on the left-hand side. (and this is even more true for software doing automated processing of these links). So, in my opinion. The mass adding of wikipedia URLs to this composer was not a particular useful thing to do, regardless if it is technically within the rules set out in the guidelines. Similarly, mass removing of them also isn't a great idea. -- kuno / warp. ps. I'm not voting on any of these because I prefer to spend my time elsewhere. If I had the time, I'd vote on each of these according to what I wrote in this post. _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too muchThese stubs are not useful links IMO, and those users interested in them I do somewhat agree on the "usefulness" factor, though I think that brings an element of "greyness" to something that doesn't need it. IMHO, we shouldn't be trying to act as judge over the worthwhileness of any given link; we should simply be looking at "Does it meet the definition given by the AR", (and "Is there something in/about the page that could potentially cause legal difficulties if we linked to it"). As for software doing automated processing, that's the point - there should be no reason software is needed to do such link processing; if we're saying we provide Wikipedia links, we should provide them, not just provide one and leave it to the data client to figure out themselves if then there is a different Wikipedia page that's actually the one they want. Brian _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too much2009/8/30 Brian Schweitzer <brian.brianschweitzer@...>:
>> These stubs are not useful links IMO, and those users interested in them >> can still easily reach them by following any of the existing wikipedia >> links and then picking their favourite language on the left-hand side. >> (and this is even more true for software doing automated processing of >> these links). > > I do somewhat agree on the "usefulness" factor, though I think that brings > an element of "greyness" to something that doesn't need it. IMHO, we > shouldn't be trying to act as judge over the worthwhileness of any given > link; we should simply be looking at "Does it meet the definition given by > the AR", (and "Is there something in/about the page that could potentially > cause legal difficulties if we linked to it"). > > As for software doing automated processing, that's the point - there should > be no reason software is needed to do such link processing; if we're saying > we provide Wikipedia links, we should provide them, not just provide one and > leave it to the data client to figure out themselves if then there is a > different Wikipedia page that's actually the one they want. > > Brian > > _______________________________________________ > MusicBrainz-users mailing list > MusicBrainz-users@... > http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users > I think all ARs on MB have an implicit usefulness criteria, even if none of the guidelines say so at present. It's common sense really, though it probably should be a general requirement if we had to explicitly prevent the mechanical addition of masses of links from around the web that add little to no value. Yes, 'value' and 'usefulness' are fuzzy concepts, but that's why we have a voting system to begin with. On that basis, I would prefer it if those adding links were users of that data themselves as that gives an implicit confidence in the data presented. One would hope that those developing software which does automated processing based on MB data would respond to difficulties such as only have an English link to a Wikipedia page by adding a link to the known good resource, rather than developing a hack to parse the Wikipedia page. That implies there's another problem altogether. I also think that such data consumers would find it more useful for their software to link to a comprehensive page in a non-ideal language than a three or four line stub in their native tongue. I don't see any advantage to arbitrary limits on the number of links (of any type) where each link contributes something to the database. That's as true of a Wikipedia link in a different language as it is to a Discogs page that gives evidence for a different release event. I also think arguments on completeness are fundamental flawed as Brian says. Yes, Wikipedia pages (and others) can change over time, but that doesn't mean we should link to them in the hope that in ten years time a stub has developed into a page with unique information not provided by the existing other links. In the case of the edit Brian linked to, the Wikipedia page has four lines. Two of them provide birth and death information which is already in MusicBrainz itself, never mind other links. With NGS, his nationality will also be in MB making even more of the page redundant. The users of that Wikipedia page have chosen to make a value-based judgement already in declaring it to be a stub. Exactly what value is there in linking to it? -- Andrew :-) Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) Support Free Java! Contribute to GNU Classpath and the OpenJDK http://www.gnu.org/software/classpath http://openjdk.java.net PGP Key: 94EFD9D8 (http://subkeys.pgp.net) Fingerprint: F8EF F1EA 401E 2E60 15FA 7927 142C 2591 94EF D9D8 _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too muchOn Sun, Aug 30, 2009 at 11:08:23AM -0400, Brian Schweitzer wrote:
> > > > These stubs are not useful links IMO, and those users interested in them > > can still easily reach them by following any of the existing wikipedia > > links and then picking their favourite language on the left-hand side. > > (and this is even more true for software doing automated processing of > > these links). > > > I do somewhat agree on the "usefulness" factor, though I think that brings > an element of "greyness" to something that doesn't need it. IMHO, we > shouldn't be trying to act as judge over the worthwhileness of any given > link; we should simply be looking at "Does it meet the definition given by > the AR", (and "Is there something in/about the page that could potentially > cause legal difficulties if we linked to it"). I don't see any specific problem with allowing editors to judge usefulness, for most things we are lucky that usefulness or worthiness can be decided on mb-style already by voting no to proposals which attempt to add new ARs to link to certain sites. That doesn't mean that for those ARs we do have, you should add anything which fits the AR description. Note, I don't advocate being overly strict on this either. In general if someone adds a particular link, I assume it is useful to them, why else would they add it? But edits like this: http://musicbrainz.org/show/edit/?editid=8339720 violate that assumption. Wtf were you thinking adding such links? You don't speak frisian, there is no way this AR is useful to you (and this goes for most of the ARs you added in that period). In this particular case, it isn't useful to anyone. There is no frisian alive today that doesn't also have dutch as a native language. The vast majority of those users would probably even prefer the english article over the current frisian one. > As for software doing automated processing, that's the point - there should > be no reason software is needed to do such link processing; if we're saying > we provide Wikipedia links, we should provide them, not just provide one and > leave it to the data client to figure out themselves if then there is a > different Wikipedia page that's actually the one they want. Automated clients have no way to determine which links are useful. If you for example have a media player which shows the content of a wikipedia article when playing a certain artist, it will likely fetch the article in whatever is the users preferred language. In my case, that could be frisian. So, I would be presented with a non-informative stub, whereas i would get a more informative dutch or english article if we do not link to stubs. The author of such a media player can always get a list of other versions available even if we just link to a single wikipedia article. So it's not particularly useful for automated clients to have that full list available in musicbrainz, especially because it will likely never be complete, and include broken links because wikipedia is also a dynamic site where bad quality stub articles are regularly deleted. Wikipedia is authorative on their content, I feel linking to every seperate translation on wikipedia just because it doesn't 404 creates an inferior copy of the left hand language selection menu in wikipedia. Let them manage that list, it's their content, they are good at it. -- kuno / warp. _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too much2009/8/30 Kuno Woudt <kuno@...>:
> On Sun, Aug 30, 2009 at 11:08:23AM -0400, Brian Schweitzer wrote: >> > >> > These stubs are not useful links IMO, and those users interested in them >> > can still easily reach them by following any of the existing wikipedia >> > links and then picking their favourite language on the left-hand side. >> > (and this is even more true for software doing automated processing of >> > these links). >> > >> I do somewhat agree on the "usefulness" factor, though I think that brings >> an element of "greyness" to something that doesn't need it. IMHO, we >> shouldn't be trying to act as judge over the worthwhileness of any given >> link; we should simply be looking at "Does it meet the definition given by >> the AR", (and "Is there something in/about the page that could potentially >> cause legal difficulties if we linked to it"). > > I don't see any specific problem with allowing editors to judge > usefulness, for most things we are lucky that usefulness or worthiness > can be decided on mb-style already by voting no to proposals which > attempt to add new ARs to link to certain sites. That doesn't mean that > for those ARs we do have, you should add anything which fits the AR > description. > > Note, I don't advocate being overly strict on this either. In general > if someone adds a particular link, I assume it is useful to them, why > else would they add it? > > But edits like this: http://musicbrainz.org/show/edit/?editid=8339720 > violate that assumption. Wtf were you thinking adding such links? You > don't speak frisian, there is no way this AR is useful to you (and this > goes for most of the ARs you added in that period). > > In this particular case, it isn't useful to anyone. There is no frisian > alive today that doesn't also have dutch as a native language. The vast > majority of those users would probably even prefer the english article > over the current frisian one. > >> As for software doing automated processing, that's the point - there should >> be no reason software is needed to do such link processing; if we're saying >> we provide Wikipedia links, we should provide them, not just provide one and >> leave it to the data client to figure out themselves if then there is a >> different Wikipedia page that's actually the one they want. > > Automated clients have no way to determine which links are useful. If > you for example have a media player which shows the content of a > wikipedia article when playing a certain artist, it will likely fetch > the article in whatever is the users preferred language. In my case, > that could be frisian. So, I would be presented with a non-informative > stub, whereas i would get a more informative dutch or english article if > we do not link to stubs. > > The author of such a media player can always get a list of other > versions available even if we just link to a single wikipedia article. > So it's not particularly useful for automated clients to have that full > list available in musicbrainz, especially because it will likely never > be complete, and include broken links because wikipedia is also a > dynamic site where bad quality stub articles are regularly deleted. > > Wikipedia is authorative on their content, I feel linking to every > seperate translation on wikipedia just because it doesn't 404 creates > an inferior copy of the left hand language selection menu in wikipedia. > Let them manage that list, it's their content, they are good at it. > > -- kuno / warp. > > > _______________________________________________ > MusicBrainz-users mailing list > MusicBrainz-users@... > http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users > Kuno, I agree completely wholeheartedly with your sentiments here. This kind of mass edits, where the editor is by and large unaware of the value or content of their additions, are dangerous IMHO (the automated lookup example being a fine case in point). So they meet the guidelines - so what? As you point out, the guidelines themselves were generated as a result of due process with the intention of codifying some of the more general usefulness criteria. I don't believe they weren't introduced to use as a defence for creating otherwise highly objectionable edits. They are guidelines after all, not rules of law and come second to cases where a vocal majority chose to vote in contrary to them. This doesn't make the guideline itself wrong, per say, but just further illustrates that a lot of MB additions can not be captured by simple rules alone. If everything was completely automatable, we'd just let bots compile the entire database. We have real editors and a voting system specifically to obtain a higher quality database than can be achieved by automation alone. When I add edits to MB, I try to do so within the remit of my own knowledge. I work on releases I either own or, at the very least, know a significant amount about, and I can pretty much tell when other editors have done the same. The best entries on MB are those that are a labour of love. At the opposite extreme, there are releases that are largely nonsense, where in many cases it would have been quicker for me to request a removal and enter the release again (and simpler in terms of the number of edits generated). Unless Brian is an amazingly skilled human being, I very much doubt he knows the 100+ languages these edits involve, and thus he should probably leave the majority to those who value their addition. If there is no such person, then there is probably little point in their addition to MB. -- Andrew :-) Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) Support Free Java! Contribute to GNU Classpath and the OpenJDK http://www.gnu.org/software/classpath http://openjdk.java.net PGP Key: 94EFD9D8 (http://subkeys.pgp.net) Fingerprint: F8EF F1EA 401E 2E60 15FA 7927 142C 2591 94EF D9D8 _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too much* Kuno Woudt <kuno@...> [30-08-2009 20:10 EEST]:
[...] > I don't see any specific problem with allowing editors to judge > usefulness, for most things we are lucky that usefulness or worthiness > can be decided on mb-style already by voting no to proposals which > attempt to add new ARs to link to certain sites. That doesn't mean that > for those ARs we do have, you should add anything which fits the AR > description. > > Note, I don't advocate being overly strict on this either. In general > if someone adds a particular link, I assume it is useful to them, why > else would they add it? This I agree with wholeheartedly. Even if there was a page in klingon, how useful would that be? "Sometimes, less is more." I think that applies in this case. Would linking to a WP page in Klingon be useful? Sure, there'd no real 'harm' to linking to it other than 'clutter' (but the 'clutter' aspect would be a UI issue), but would it be *useful*? I don't think it would be... _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too muchOn Sun, Aug 30, 2009 at 5:13 PM, Edward J. Shornock <ed.shornock@...> wrote: * Kuno Woudt <kuno@...> [30-08-2009 20:10 EEST]: the utility argument only goes so far, however. Yes, a page in Uzbec may not be all that generally informative or useful. "Allow all languages" may be overly broad, but I think "only English and the artist's own native language(s)" is too narrow. So far, this argument keeps being used for "marginal" languages. (Though, the edit calls a language spoken by 3 million people marginal, so I'm not sure how we're even defining that... :P). Uzbec, Klingon, etc may not be widely spoken, but http://www.photius.com/rankings/languages2.html shows that there is reason to consider at least "useful and common" languages. Mandarin Chinese has over twice the native speakers as English. Spanish, Arabic, Bengali, Hindi, Russian, Portugese, Japanese, German, Javanese, Korean, French, and Turkish - just to take the other top 15 languages, also have millions of speakers. According to those counts, English has 480 million speakers, while the other top 14 languages together account for 3 *trillion* people on the planet. I'd be willing to bet, especially considering that many of those languages don't even use Latin script, that some majority of those 3 trillion people don't speak a word of English - international language of business or not. So no, Uzbec or Klingon may not have much utility... but "English and artist's own language(s)" seems rather western- and English-centric. MB may currently be English-only, but it's moving towards i18n, and codifying such a limiting rule seems like a movement in the wrong direction. Brian _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Clarify removing data, simply because we have too muchOn Sun, Aug 30, 2009 at 10:14:22PM -0400, Brian Schweitzer wrote:
> the utility argument only goes so far, however. Yes, a page in Uzbec may > not be all that generally informative or useful. I cannot really judge that, neither can you. Let our users who actually speak Uzbec (if any) add that link if it is useful to them. > "Allow all languages" may be overly broad, but I think "only English and > the artist's own native language(s)" is too narrow. No one has been arguing for "only english + artist's language" here on the list. (I haven't read the entire discussion in the edit notes, perhaps that argument was used there). I think it is fine for those links to be added, not by you however because you cannot judge their usefulness (nor can I). -- kuno / warp. _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
| Free embeddable forum powered by Nabble | Forum Help |