>>>> Even so, if the FLAC CRC matches with AR's, the editor can be
>>>> confident in adding it to the MBz DB right? Or maybe the AR CRC is
>>>> against whole discs as opposed to individual tracks?
>>>> - Grant
>>>>
>>> My bad for calling everything CRC I guess, but no they don't use the same
>>> hashing algorithm and will never match.
>>>
>>
>> How does AR do it's checksumming? Is it calculated based on a WAV or
>> ISO of the entire disc?
>>
>> If it can be determined that a FLAC rip matches with AR, the embedded
>> FLAC checksum, although different from whatever AR uses, could be
>> added to the MBz DB with certainty right?
>>
>> - Grant
>>
> I reverse engineered the AR system a year ago or so, there's a perl
> script that performs AR checking available from,
>
http://www.srcf.ucam.org/~cjk32/ARCue/>
> The checksums are the (mod 2^32) sum of each 32bit LR sample multiplied
> by it's offset within the track. The first and last five frames of the
> first and last tracks are ignored to prevent problems with drives that
> cannot overread into the lead-in or lead-out.
>
> I do like the way accurate rip works, but there are some limitations,
> and I've been wondering about how an improved system might operate.
>
> AR seems to work around the following principle. There are two kinds of
> errors one can suffer from, systematic errors and random noise.
>
> The only realistic systematic error that will be encountered is an
> constant offset of the samples read (e.g. when asked for sample 0, the
> drive actually return sample 15), and EAC+AR deals with this by
> establishing the drive's offset, correcting by this amount, and making
> it difficult for the user to change it.
>
> The second kind of error is random noise, caused by a damaged disc,
> failing drive laser etc. There errors are manifested as random changes
> in the data read, and will not be consistent across multiple reads
> (ignoring any caching performed by the drive). Because these errors are
> random and infrequent, if two independent reads of a disc give the same
> data (or almost equivalently, the same checksum), then it is
> overwhelmingly likely that both reads of the disc read the correct
> data. AR collects all checksum submissions for a given discid, and when
> it gets 2 or more the same for a given track / disc id, it considers
> them correct. As it is possible for multiple pressings to have
> different audio data, but the same disc id, it is quite possible to have
> multiple valid checksums for each track on that disc.
>
>
>
> There are a few problems with the current system.
>
> Firstly, the measured drive read offsets used by the whole AR+EAC system
> seem incorrect. The offset for one drive was established using an
> ingenious, but flawed mechanism that gave in incorrect value. As this
> drive offset was then used a refenence to determine all others, they all
> share the same error. More recent tests using a different and arguably
> better method have given a different drive offset, whic is much more
> likely to be correct.
>
> Secondly, AR doesn't allow any validation of the leading and trailing
> five frames of audio; some drives cannot read this data, and it is hence
> not included in the checksums.
>
> It cannot deal (I believe) with audio hidden in the pregap.
>
> My personal preference would be to use an AR like system, but with MD5
> hashes based upon all the data in the track (i.e. not cutting of leading
> and trailing frames), and using the newly measured 'correct' offset.
> Such hashes would be collected for each track of each discid, and where
> 2 or more match, they would be published as a correct hash for that
> track. The MD5 calculated for any track would be the same as the FLAC
> MD5 checksum.
>
> This system isn't ideal though, given the effort and infrastructure
> already invested into the existing system. One way to take advantage of
> the existing data might be to also calculate AR checksums using the
> current method, and accept submissions of both as a set. The confidence
> level for the AR checksums could then be applied to the MD5 hashes that
> they span. For example, if the AR checksums indicated that tracks 1-3
> were correct with a confidence of 50, you could then be sure that the
> MD5 hash for track 2 was also correct, (because the range over which the
> AR checksums for tracks 1-2 is calculated wholly covers the range over
> which the MD5 hash for track 2 is calculated).
>
> Any thoughts?
>
> Chris
file against the AccurateRip database. If the flac file's offset is
different from the one in the AR database, it will tell you. You can