« Return to Thread: Extending MusicBrainz to hold audio checksums

Extending MusicBrainz to hold audio checksums

by Eric Shattow-3 :: Rate this Message:

Reply to Author | View in Thread

How can we extend MusicBrainz so that people ripping digital media may verify their audio content is correctly media-shifted from audio CD sources?

There is a higher challenge here. Reading CD audio is an inexact science:

- Labels may release multiple pressings of the same content, leading to the same audio at slightly differing sample offsets
- Consumer CD drives are not consistent at which sample offset they start reading from
- No drive on the market can reliably detect all errors when reading audio from CD media

If we can store file audio data checksums in a meaningful way, should we?

Possible approach:

1) No "zero" reference. Cover all possible read offsets given drives available on the market today. Make a lot of SHA-1 checksums (3072 count) per track, and per album. This amounts to be 33KiB * (num tracks + 1) of checksum data per release. This has the benefit of working without drive calibration, as most CD pressings contain no useful data in the missing samples (which is like maximum 5 sectors i.e. 5/72 of a second audio). Drawbacks are the required time to compute checksums and the nearly 400KiB of checksum data per release.

2) Above but with a "zero" reference. Maintain a list of approved "zero offset" drives (I own such a drive, the Plextor PX-712SA). This differs from the AccurateRip(TM) method by 30 samples. Checksums stored in the database will be moderated and voted on by persons submitting from approved hardware only. This reduces the data storage requirement to less than 5KiB per release. Client verification software is still tasked with heavy computational load to generate all possible checksums as described in method #1

3) Guess what the actual audio content is for the release and cover all possible read offsets for the release's audio content as a whole. It would also have some kind of "inner checksum" calculated at offset from the start and finish of useful audio. Required storage is less than 70KiB per release.

This concept of verifying CD audio rips does kind of walk the line between what does and does not apply to the purpose of MusicBrainz database.

Thoughts? Comments welcomed.
_______________________________________________
MusicBrainz-users mailing list
MusicBrainz-users@...
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users

 « Return to Thread: Extending MusicBrainz to hold audio checksums