« Return to Thread: Verify data integrity

Re: Verify data integrity

by Grant-4 :: Rate this Message:

Reply to Author | View in Thread

> I reverse engineered the AR system a year ago or so, there's a perl
> script that performs AR checking available from,
> http://www.srcf.ucam.org/~cjk32/ARCue/
>
> The checksums are the (mod 2^32) sum of each 32bit LR sample multiplied
> by it's offset within the track.  The first and last five frames of the
> first and last tracks are ignored to prevent problems with drives that
> cannot overread into the lead-in or lead-out.
>
> I do like the way accurate rip works, but there are some limitations,
> and I've been wondering about how an improved system might operate.
>
> AR seems to work around the following principle.  There are two kinds of
> errors one can suffer from, systematic errors and random noise.
>
> The only realistic systematic error that will be encountered is an
> constant offset of the samples read (e.g. when asked for sample 0, the
> drive actually return sample 15), and EAC+AR deals with this by
> establishing the drive's offset, correcting by this amount, and making
> it difficult for the user to change it.
>
> The second kind of error is random noise, caused by a damaged disc,
> failing drive laser etc.  There errors are manifested as random changes
> in the data read, and will not be consistent across multiple reads
> (ignoring any caching performed by the drive).  Because these errors are
> random and infrequent, if two independent reads of a disc give the same
> data (or almost equivalently, the same checksum), then it is
> overwhelmingly likely that both reads of the disc read the correct
> data.  AR collects all checksum submissions for a given discid, and when
> it gets 2 or more the same for a given track / disc id, it considers
> them correct.  As it is possible for multiple pressings to have
> different audio data, but the same disc id, it is quite possible to have
> multiple valid checksums for each track on that disc.
>
>
>
> There are a few problems with the current system.
>
> Firstly, the measured drive read offsets used by the whole AR+EAC system
> seem incorrect.  The offset for one drive was established using an
> ingenious, but flawed mechanism that gave in incorrect value.  As this
> drive offset was then used a refenence to determine all others, they all
> share the same error.  More recent tests using a different and arguably
> better method have given a different drive offset, whic is much more
> likely to be correct.
>
> Secondly, AR doesn't allow any validation of the leading and trailing
> five frames of audio; some drives cannot read this data, and it is hence
> not included in the checksums.
>
> It cannot deal (I believe) with audio hidden in the pregap.
>
> My personal preference would be to use an AR like system, but with MD5
> hashes based upon all the data in the track (i.e. not cutting of leading
> and trailing frames), and using the newly measured 'correct' offset.
> Such hashes would be collected for each track of each discid, and where
> 2 or more match, they would be published as a correct hash for that
> track.  The MD5 calculated for any track would be the same as the FLAC
> MD5 checksum.
>
> This system isn't ideal though, given the effort and infrastructure
> already invested into the existing system.  One way to take advantage of
> the existing data might be to also calculate AR checksums using the
> current method, and accept submissions of both as a set.  The confidence
> level for the AR checksums could then be applied to the MD5 hashes that
> they span.  For example, if the AR checksums indicated that tracks 1-3
> were correct with a confidence of 50, you could then be sure that the
> MD5 hash for track 2 was also correct, (because the range over which the
> AR checksums for tracks 1-2 is calculated wholly covers the range over
> which the MD5 hash for track 2 is calculated).

I've been thinking about this and it does sound tough to pull off.
How about a really simple approach?

Is CD pressing info noted on the disc?  If so, how about a user can
submit their CD's pressing info and FLAC checksums to MBz if they
ripped the CD twice with EAC.  The submissions are recorded and
whichever track+pressing checksum is verified by users the most times
is considered correct by MBz.

- Grant

> Chris

_______________________________________________
MusicBrainz-users mailing list
MusicBrainz-users@...
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users

 « Return to Thread: Verify data integrity