|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: Verify data integrity2008/5/27 Grant <emailgrant@...>:
> Vinyl records are cataloged according to > pressing, but I haven't heard of CDs being tracked in the same way > yet. discogs does (in theory). eg: http://www.discogs.com/release/1330378 http://www.discogs.com/release/1337304 so do some more elaborate collection sites (eg http://www.livenirvana.com/digitalnirvana/ ) just FYI :) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>>> How big of a problem is this multiple pressing issue? Could an album
>>> have many pressings that are just barely non-identical? If not, maybe >>> they should be tracked separately with their unique checksums. >>> Different pressings of vinyl records are "tracked" separately. >> >> Different manufacturers will make discs with minute differences. All it >> takes is a frame in the wrong place and the checksum won't match. >> Currently in AR there's no protection against this, and like you said it >> has multiple checksums instead. The problem is that one release might have >> three ids, while another which has a slight offset difference has 0, >> resulting in no matches for whoever rips it. It's quite common depending >> on what you rip, and just as annoying every time. > > I understand how that works. I'm wondering if MBz should track > different pressings separately as far as the checksum is concerned. > If there are usually only a few pressings per album, it sounds like a > reasonable thing to do. Vinyl records are cataloged according to > pressing, but I haven't heard of CDs being tracked in the same way > yet. > > - Grant > >> -- Per (Wizzcat) What do you guys think about all of this? I could put some money into it. What is the path forward? - Grant _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity> I reverse engineered the AR system a year ago or so, there's a perl
> script that performs AR checking available from, > http://www.srcf.ucam.org/~cjk32/ARCue/ > > The checksums are the (mod 2^32) sum of each 32bit LR sample multiplied > by it's offset within the track. The first and last five frames of the > first and last tracks are ignored to prevent problems with drives that > cannot overread into the lead-in or lead-out. > > I do like the way accurate rip works, but there are some limitations, > and I've been wondering about how an improved system might operate. > > AR seems to work around the following principle. There are two kinds of > errors one can suffer from, systematic errors and random noise. > > The only realistic systematic error that will be encountered is an > constant offset of the samples read (e.g. when asked for sample 0, the > drive actually return sample 15), and EAC+AR deals with this by > establishing the drive's offset, correcting by this amount, and making > it difficult for the user to change it. > > The second kind of error is random noise, caused by a damaged disc, > failing drive laser etc. There errors are manifested as random changes > in the data read, and will not be consistent across multiple reads > (ignoring any caching performed by the drive). Because these errors are > random and infrequent, if two independent reads of a disc give the same > data (or almost equivalently, the same checksum), then it is > overwhelmingly likely that both reads of the disc read the correct > data. AR collects all checksum submissions for a given discid, and when > it gets 2 or more the same for a given track / disc id, it considers > them correct. As it is possible for multiple pressings to have > different audio data, but the same disc id, it is quite possible to have > multiple valid checksums for each track on that disc. > > > > There are a few problems with the current system. > > Firstly, the measured drive read offsets used by the whole AR+EAC system > seem incorrect. The offset for one drive was established using an > ingenious, but flawed mechanism that gave in incorrect value. As this > drive offset was then used a refenence to determine all others, they all > share the same error. More recent tests using a different and arguably > better method have given a different drive offset, whic is much more > likely to be correct. > > Secondly, AR doesn't allow any validation of the leading and trailing > five frames of audio; some drives cannot read this data, and it is hence > not included in the checksums. > > It cannot deal (I believe) with audio hidden in the pregap. > > My personal preference would be to use an AR like system, but with MD5 > hashes based upon all the data in the track (i.e. not cutting of leading > and trailing frames), and using the newly measured 'correct' offset. > Such hashes would be collected for each track of each discid, and where > 2 or more match, they would be published as a correct hash for that > track. The MD5 calculated for any track would be the same as the FLAC > MD5 checksum. > > This system isn't ideal though, given the effort and infrastructure > already invested into the existing system. One way to take advantage of > the existing data might be to also calculate AR checksums using the > current method, and accept submissions of both as a set. The confidence > level for the AR checksums could then be applied to the MD5 hashes that > they span. For example, if the AR checksums indicated that tracks 1-3 > were correct with a confidence of 50, you could then be sure that the > MD5 hash for track 2 was also correct, (because the range over which the > AR checksums for tracks 1-2 is calculated wholly covers the range over > which the MD5 hash for track 2 is calculated). I've been thinking about this and it does sound tough to pull off. How about a really simple approach? Is CD pressing info noted on the disc? If so, how about a user can submit their CD's pressing info and FLAC checksums to MBz if they ripped the CD twice with EAC. The submissions are recorded and whichever track+pressing checksum is verified by users the most times is considered correct by MBz. - Grant > Chris _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity2008/5/30 Grant <emailgrant@...>:
> I've been thinking about this and it does sound tough to pull off. > How about a really simple approach? > > Is CD pressing info noted on the disc? the matrix codes (the small numbers near the hole) often vary between pressings, but i don't think they always vary with the pressing. also, they're often unreadable/obscured _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> I've been thinking about this and it does sound tough to pull off.
>> How about a really simple approach? >> >> Is CD pressing info noted on the disc? > > the matrix codes (the small numbers near the hole) often vary between > pressings, but i don't think they always vary with the pressing. also, > they're often unreadable/obscured Even simpler: If you have special user privileges and you ripped the CD twice with EAC, you can submit your FLAC checksums. Everyone else can verify their checksums against those in MBz. - Grant _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity> Firstly, the measured drive read offsets used by the whole AR+EAC system
> seem incorrect. The offset for one drive was established using an > ingenious, but flawed mechanism that gave in incorrect value. As this > drive offset was then used a refenence to determine all others, they all > share the same error. More recent tests using a different and arguably > better method have given a different drive offset, whic is much more > likely to be correct. Chris, what is this better method? - Grant _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityGrant wrote:
>> Firstly, the measured drive read offsets used by the whole AR+EAC system >> seem incorrect. The offset for one drive was established using an >> ingenious, but flawed mechanism that gave in incorrect value. As this >> drive offset was then used a refenence to determine all others, they all >> share the same error. More recent tests using a different and arguably >> better method have given a different drive offset, whic is much more >> likely to be correct. >> > > Chris, what is this better method? > > Details of the experimental method on the third page, but the first two give useful background information. Chris _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>>>> Even so, if the FLAC CRC matches with AR's, the editor can be
>>>> confident in adding it to the MBz DB right? Or maybe the AR CRC is >>>> against whole discs as opposed to individual tracks? >>>> - Grant >>>> >>> My bad for calling everything CRC I guess, but no they don't use the same >>> hashing algorithm and will never match. >>> >> >> How does AR do it's checksumming? Is it calculated based on a WAV or >> ISO of the entire disc? >> >> If it can be determined that a FLAC rip matches with AR, the embedded >> FLAC checksum, although different from whatever AR uses, could be >> added to the MBz DB with certainty right? >> >> - Grant >> > I reverse engineered the AR system a year ago or so, there's a perl > script that performs AR checking available from, > http://www.srcf.ucam.org/~cjk32/ARCue/ > > The checksums are the (mod 2^32) sum of each 32bit LR sample multiplied > by it's offset within the track. The first and last five frames of the > first and last tracks are ignored to prevent problems with drives that > cannot overread into the lead-in or lead-out. > > I do like the way accurate rip works, but there are some limitations, > and I've been wondering about how an improved system might operate. > > AR seems to work around the following principle. There are two kinds of > errors one can suffer from, systematic errors and random noise. > > The only realistic systematic error that will be encountered is an > constant offset of the samples read (e.g. when asked for sample 0, the > drive actually return sample 15), and EAC+AR deals with this by > establishing the drive's offset, correcting by this amount, and making > it difficult for the user to change it. > > The second kind of error is random noise, caused by a damaged disc, > failing drive laser etc. There errors are manifested as random changes > in the data read, and will not be consistent across multiple reads > (ignoring any caching performed by the drive). Because these errors are > random and infrequent, if two independent reads of a disc give the same > data (or almost equivalently, the same checksum), then it is > overwhelmingly likely that both reads of the disc read the correct > data. AR collects all checksum submissions for a given discid, and when > it gets 2 or more the same for a given track / disc id, it considers > them correct. As it is possible for multiple pressings to have > different audio data, but the same disc id, it is quite possible to have > multiple valid checksums for each track on that disc. > > > > There are a few problems with the current system. > > Firstly, the measured drive read offsets used by the whole AR+EAC system > seem incorrect. The offset for one drive was established using an > ingenious, but flawed mechanism that gave in incorrect value. As this > drive offset was then used a refenence to determine all others, they all > share the same error. More recent tests using a different and arguably > better method have given a different drive offset, whic is much more > likely to be correct. > > Secondly, AR doesn't allow any validation of the leading and trailing > five frames of audio; some drives cannot read this data, and it is hence > not included in the checksums. > > It cannot deal (I believe) with audio hidden in the pregap. > > My personal preference would be to use an AR like system, but with MD5 > hashes based upon all the data in the track (i.e. not cutting of leading > and trailing frames), and using the newly measured 'correct' offset. > Such hashes would be collected for each track of each discid, and where > 2 or more match, they would be published as a correct hash for that > track. The MD5 calculated for any track would be the same as the FLAC > MD5 checksum. > > This system isn't ideal though, given the effort and infrastructure > already invested into the existing system. One way to take advantage of > the existing data might be to also calculate AR checksums using the > current method, and accept submissions of both as a set. The confidence > level for the AR checksums could then be applied to the MD5 hashes that > they span. For example, if the AR checksums indicated that tracks 1-3 > were correct with a confidence of 50, you could then be sure that the > MD5 hash for track 2 was also correct, (because the range over which the > AR checksums for tracks 1-2 is calculated wholly covers the range over > which the MD5 hash for track 2 is calculated). > > Any thoughts? > > Chris There is a new Windows program called tripleflac which checks a flac file against the AccurateRip database. If the flac file's offset is different from the one in the AR database, it will tell you. You can then adjust the offset of the flac file with something like CUETools (Windows) and then verify the flac file against the AR database via tripleflac or ARcue. Also, AccurateRip2 is said to work around the different pressings issue. MBz? - Grant _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |