|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Verify data integrityI'd like to rip my CD collection to FLAC files and be 100% sure I've
created perfect bit-for-bit copies. Does musicbrainz and/or picard have any functionality that would help with this? I've read that every FLAC file contains a checksum. Maybe that would help? - Grant _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Mon, 26 May 2008 16:16:45 +0200, Grant <emailgrant@...> wrote:
> I'd like to rip my CD collection to FLAC files and be 100% sure I've > created perfect bit-for-bit copies. Does musicbrainz and/or picard > have any functionality that would help with this? I've read that > every FLAC file contains a checksum. Maybe that would help? The FLAC checksum won't tell you whether the rip was good since it's created from the wav file you get after the ripping process, it only tells you whether or not the file has become corrupted at some point after that. What you want is Exact Audio Copy and AccurateRip, [1] is a reasonable guide for how to set up EAC for the best ripping, though I'm sure you can find some newer ones if you look. The important thing is that you configure AccurateRip, just feed it cds until it's happy. After that you can rip your collection with Burst mode or whatever else you want. Just watch out for any tracks which don't pass AccurateRip, or which aren't in the database. In that case you might want to use Secure mode and maybe even Test & Copy (copies it twice and compares crcs). For most albums burst is fine though since AR already checks the quality of the rip. [1] http://www.hydrogenaudio.org/forums/index.php?showtopic=30959 - Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> I'd like to rip my CD collection to FLAC files and be 100% sure I've
>> created perfect bit-for-bit copies. Does musicbrainz and/or picard >> have any functionality that would help with this? I've read that >> every FLAC file contains a checksum. Maybe that would help? > > The FLAC checksum won't tell you whether the rip was good since it's > created from the wav file you get after the ripping process, it only tells > you whether or not the file has become corrupted at some point after that. > > What you want is Exact Audio Copy and AccurateRip, [1] is a reasonable > guide for how to set up EAC for the best ripping, though I'm sure you can > find some newer ones if you look. The important thing is that you > configure AccurateRip, just feed it cds until it's happy. After that you > can rip your collection with Burst mode or whatever else you want. Just > watch out for any tracks which don't pass AccurateRip, or which aren't in > the database. In that case you might want to use Secure mode and maybe Thank you very much for the info. That database sounds interesting. Is there a way to access it other than from within the software? Can musicbrainz be used to aid in this process? I'd like to be able to verify CD rips I didn't make myself as well. - Grant > even Test & Copy (copies it twice and compares crcs). For most albums > burst is fine though since AR already checks the quality of the rip. > > [1] http://www.hydrogenaudio.org/forums/index.php?showtopic=30959 > > - Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Mon, 26 May 2008 16:55:57 +0200, Grant <emailgrant@...> wrote:
> Thank you very much for the info. That database sounds interesting. > Is there a way to access it other than from within the software? Practically speaking it's only when you rip cds with either Exact Audio Copy or DBPowerAmp. > Can musicbrainz be used to aid in this process? I'd like to be able > to verify CD rips I didn't make myself as well. Technically it could quite easily do so, but someone would have to program the functionality. It's possible that it could be done from a plugin, but that depends on how much data is exposed seeing as you need to crc every frame of audio. I'll investigate it, I was planning to write an app to verify flac rips anyway but if I can use Picard that would save me a lot of work. -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityI've often though about this and think MusicBrainz really would be a
good place to add checksum information. AccurateRip is proprietary and can only be used in the applications that the author approves of, which is completely b0rked. By adding checksums to discids it should be possible to verify rips both when ripping and after the fact, which I think is of great value for any bit-perfectionist. Technically it doesn't seem much more difficult than storing PUIDs, but obviously it would take someone to implement it on the server side. Philip On 5/26/08, Per Øyvind Øygard <peroyo@...> wrote: > On Mon, 26 May 2008 16:55:57 +0200, Grant <emailgrant@...> wrote: > > > Thank you very much for the info. That database sounds interesting. > > Is there a way to access it other than from within the software? > > > Practically speaking it's only when you rip cds with either Exact Audio > Copy or DBPowerAmp. > > > > Can musicbrainz be used to aid in this process? I'd like to be able > > to verify CD rips I didn't make myself as well. > > > Technically it could quite easily do so, but someone would have to program > the functionality. It's possible that it could be done from a plugin, but > that depends on how much data is exposed seeing as you need to crc every > frame of audio. I'll investigate it, I was planning to write an app to > verify flac rips anyway but if I can use Picard that would save me a lot > of work. > > -- Per (Wizzcat) > > > _______________________________________________ > MusicBrainz-users mailing list > MusicBrainz-users@... > http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users > _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity> I've often though about this and think MusicBrainz really would be a
> good place to add checksum information. AccurateRip is proprietary and > can only be used in the applications that the author approves of, > which is completely b0rked. By adding checksums to discids it should > be possible to verify rips both when ripping and after the fact, which > I think is of great value for any bit-perfectionist. Technically it > doesn't seem much more difficult than storing PUIDs, but obviously it > would take someone to implement it on the server side. Hallelujah! I agree and I'm very glad to see you guys are interested in this. FLAC files apparently calculate and save a checksum automatically with every file. Maybe that should be used so calculation isn't even necessary. This obviously won't do anything for MP3 or OGG files but they're lossy anyway and a checksum concept would never work with them. - Grant > Philip > > On 5/26/08, Per Øyvind Øygard <peroyo@...> wrote: >> On Mon, 26 May 2008 16:55:57 +0200, Grant <emailgrant@...> wrote: >> >> > Thank you very much for the info. That database sounds interesting. >> > Is there a way to access it other than from within the software? >> >> >> Practically speaking it's only when you rip cds with either Exact Audio >> Copy or DBPowerAmp. >> >> >> > Can musicbrainz be used to aid in this process? I'd like to be able >> > to verify CD rips I didn't make myself as well. >> >> >> Technically it could quite easily do so, but someone would have to program >> the functionality. It's possible that it could be done from a plugin, but >> that depends on how much data is exposed seeing as you need to crc every >> frame of audio. I'll investigate it, I was planning to write an app to >> verify flac rips anyway but if I can use Picard that would save me a lot >> of work. >> >> -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> Thank you very much for the info. That database sounds interesting.
>> Is there a way to access it other than from within the software? > > Practically speaking it's only when you rip cds with either Exact Audio > Copy or DBPowerAmp. > >> Can musicbrainz be used to aid in this process? I'd like to be able >> to verify CD rips I didn't make myself as well. > > Technically it could quite easily do so, but someone would have to program > the functionality. It's possible that it could be done from a plugin, but > that depends on how much data is exposed seeing as you need to crc every > frame of audio. I'll investigate it, I was planning to write an app to If you use the automatically embedded FLAC fingerprint you wouldn't need to crc every frame of audio though. We would need a method for determining who has the correct fingerprint though. - Grant > verify flac rips anyway but if I can use Picard that would save me a lot > of work. > > -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Mon, 26 May 2008 17:34:36 +0200, Philip Jägenstedt <philip@...>
wrote: > I've often though about this and think MusicBrainz really would be a > good place to add checksum information. AccurateRip is proprietary and > can only be used in the applications that the author approves of, > which is completely b0rked. By adding checksums to discids it should > be possible to verify rips both when ripping and after the fact, which > I think is of great value for any bit-perfectionist. Technically it > doesn't seem much more difficult than storing PUIDs, but obviously it > would take someone to implement it on the server side. > Philip It's an interesting thought, but the problem is that such crc calculation needs to be integrated with a reliable ripper to be of any value. Picard isn't that, and I doubt it ever will be, I mean why bother. Sure, you could compete against AR and offer a plugin for EAC, but I think you'd have a big problem getting people to use it, and without users it's quite worthless. Personally I think it'd be a better idea to see if Spoon (the AR author) would want to cooperate with MBz. It'd be a win-win, with us gaining rip verification and him gaining accurate metadata. It wouldn't be that hard either, it's very similar to disc ids as it is, just comes with additional crcs for every file. -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Mon, 26 May 2008 17:50:41 +0200, Grant <emailgrant@...> wrote:
> If you use the automatically embedded FLAC fingerprint you wouldn't > need to crc every frame of audio though. We would need a method for > determining who has the correct fingerprint though. > - Grant The Flac CRC is useless for anything other than a statistical purposes. It can tell you that 30% of people with that track has that crc, but that's hardly a guarantee of a bit-perfect rip. AR works very differently in that the db only contains data from unique rips. That way you don't run the risk of getting the same data added multiple times and tainting the db. -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> If you use the automatically embedded FLAC fingerprint you wouldn't
>> need to crc every frame of audio though. We would need a method for >> determining who has the correct fingerprint though. >> - Grant > > The Flac CRC is useless for anything other than a statistical purposes. It > can tell you that 30% of people with that track has that crc, but that's > hardly a guarantee of a bit-perfect rip. AR works very differently in that > the db only contains data from unique rips. That way you don't run the > risk of getting the same data added multiple times and tainting the db. Maybe the FLAC CRCs should be added to the MBz DB like any other piece of info? The editer would be sure they have the correct CRC by checking with EAC. If that sounds OK, it's got to be easy to implement. - Grant > -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Mon, 26 May 2008 18:18:04 +0200, Grant <emailgrant@...> wrote:
> Maybe the FLAC CRCs should be added to the MBz DB like any other piece > of info? The editer would be sure they have the correct CRC by > checking with EAC. If that sounds OK, it's got to be easy to > implement. > - Grant Actually disregard what I said, it's useless for verification. Because of drive offsets and errors you will potentially run into hundreds of different CRC values, removing any reliability it could potentially have. -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> Maybe the FLAC CRCs should be added to the MBz DB like any other piece
>> of info? The editer would be sure they have the correct CRC by >> checking with EAC. If that sounds OK, it's got to be easy to >> implement. >> - Grant > > Actually disregard what I said, it's useless for verification. Because of > drive offsets and errors you will potentially run into hundreds of > different CRC values, removing any reliability it could potentially have. Even so, if the FLAC CRC matches with AR's, the editor can be confident in adding it to the MBz DB right? Or maybe the AR CRC is against whole discs as opposed to individual tracks? - Grant > -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityObviously checksums (not necessarily cyclic redundancy check "CRC")
would need to be coupled with some other data and not added without some checks and balances. I personally have no faith in AccurateRip or its author, but if some cooperation were possible that would be even better of course. Philip On 5/26/08, Per Øyvind Øygard <peroyo@...> wrote: > On Mon, 26 May 2008 18:18:04 +0200, Grant <emailgrant@...> wrote: > > > Maybe the FLAC CRCs should be added to the MBz DB like any other piece > > of info? The editer would be sure they have the correct CRC by > > checking with EAC. If that sounds OK, it's got to be easy to > > implement. > > - Grant > > > Actually disregard what I said, it's useless for verification. Because of > drive offsets and errors you will potentially run into hundreds of > different CRC values, removing any reliability it could potentially have. > > > -- Per (Wizzcat) > > _______________________________________________ > MusicBrainz-users mailing list > MusicBrainz-users@... > http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users > _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Mon, 26 May 2008 18:31:35 +0200, Grant <emailgrant@...> wrote:
> Even so, if the FLAC CRC matches with AR's, the editor can be > confident in adding it to the MBz DB right? Or maybe the AR CRC is > against whole discs as opposed to individual tracks? > - Grant My bad for calling everything CRC I guess, but no they don't use the same hashing algorithm and will never match. -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> Even so, if the FLAC CRC matches with AR's, the editor can be
>> confident in adding it to the MBz DB right? Or maybe the AR CRC is >> against whole discs as opposed to individual tracks? >> - Grant > > My bad for calling everything CRC I guess, but no they don't use the same > hashing algorithm and will never match. How does AR do it's checksumming? Is it calculated based on a WAV or ISO of the entire disc? If it can be determined that a FLAC rip matches with AR, the embedded FLAC checksum, although different from whatever AR uses, could be added to the MBz DB with certainty right? - Grant > -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityGrant wrote:
>>> Even so, if the FLAC CRC matches with AR's, the editor can be >>> confident in adding it to the MBz DB right? Or maybe the AR CRC is >>> against whole discs as opposed to individual tracks? >>> - Grant >>> >> My bad for calling everything CRC I guess, but no they don't use the same >> hashing algorithm and will never match. >> > > How does AR do it's checksumming? Is it calculated based on a WAV or > ISO of the entire disc? > > If it can be determined that a FLAC rip matches with AR, the embedded > FLAC checksum, although different from whatever AR uses, could be > added to the MBz DB with certainty right? > > - Grant > script that performs AR checking available from, http://www.srcf.ucam.org/~cjk32/ARCue/ The checksums are the (mod 2^32) sum of each 32bit LR sample multiplied by it's offset within the track. The first and last five frames of the first and last tracks are ignored to prevent problems with drives that cannot overread into the lead-in or lead-out. I do like the way accurate rip works, but there are some limitations, and I've been wondering about how an improved system might operate. AR seems to work around the following principle. There are two kinds of errors one can suffer from, systematic errors and random noise. The only realistic systematic error that will be encountered is an constant offset of the samples read (e.g. when asked for sample 0, the drive actually return sample 15), and EAC+AR deals with this by establishing the drive's offset, correcting by this amount, and making it difficult for the user to change it. The second kind of error is random noise, caused by a damaged disc, failing drive laser etc. There errors are manifested as random changes in the data read, and will not be consistent across multiple reads (ignoring any caching performed by the drive). Because these errors are random and infrequent, if two independent reads of a disc give the same data (or almost equivalently, the same checksum), then it is overwhelmingly likely that both reads of the disc read the correct data. AR collects all checksum submissions for a given discid, and when it gets 2 or more the same for a given track / disc id, it considers them correct. As it is possible for multiple pressings to have different audio data, but the same disc id, it is quite possible to have multiple valid checksums for each track on that disc. There are a few problems with the current system. Firstly, the measured drive read offsets used by the whole AR+EAC system seem incorrect. The offset for one drive was established using an ingenious, but flawed mechanism that gave in incorrect value. As this drive offset was then used a refenence to determine all others, they all share the same error. More recent tests using a different and arguably better method have given a different drive offset, whic is much more likely to be correct. Secondly, AR doesn't allow any validation of the leading and trailing five frames of audio; some drives cannot read this data, and it is hence not included in the checksums. It cannot deal (I believe) with audio hidden in the pregap. My personal preference would be to use an AR like system, but with MD5 hashes based upon all the data in the track (i.e. not cutting of leading and trailing frames), and using the newly measured 'correct' offset. Such hashes would be collected for each track of each discid, and where 2 or more match, they would be published as a correct hash for that track. The MD5 calculated for any track would be the same as the FLAC MD5 checksum. This system isn't ideal though, given the effort and infrastructure already invested into the existing system. One way to take advantage of the existing data might be to also calculate AR checksums using the current method, and accept submissions of both as a set. The confidence level for the AR checksums could then be applied to the MD5 hashes that they span. For example, if the AR checksums indicated that tracks 1-3 were correct with a confidence of 50, you could then be sure that the MD5 hash for track 2 was also correct, (because the range over which the AR checksums for tracks 1-2 is calculated wholly covers the range over which the MD5 hash for track 2 is calculated). Any thoughts? Chris _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Mon, 26 May 2008 20:11:19 +0200, Christopher Key <cjk32@...>
wrote: > I reverse engineered the AR system a year ago or so, there's a perl > script that performs AR checking available from, > http://www.srcf.ucam.org/~cjk32/ARCue/ Oh hey, I've made quite a bit use of the script so thanks! :) > My personal preference would be to use an AR like system, but with MD5 > hashes based upon all the data in the track (i.e. not cutting of leading > and trailing frames), and using the newly measured 'correct' offset. > Such hashes would be collected for each track of each discid, and where > 2 or more match, they would be published as a correct hash for that > track. The MD5 calculated for any track would be the same as the FLAC > MD5 checksum. This sounds great, but the problem I have is that it doesn't address different pressings. It's not a showstopper obviously, but it would be a huge improvement over AR currently. Using the entire track also sounds good, but how would you deal with drives which can't rip lead-out/in? Having two hashes for each track seems somewhat sub-optimal. I'd applaud any effort to include track hashing in MBz though. Having all the data locked down over at AccurateRip isn't a great situation to be in, especially if you remember CDDB. -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> I reverse engineered the AR system a year ago or so, there's a perl
>> script that performs AR checking available from, >> http://www.srcf.ucam.org/~cjk32/ARCue/ > > Oh hey, I've made quite a bit use of the script so thanks! :) > >> My personal preference would be to use an AR like system, but with MD5 >> hashes based upon all the data in the track (i.e. not cutting of leading >> and trailing frames), and using the newly measured 'correct' offset. >> Such hashes would be collected for each track of each discid, and where >> 2 or more match, they would be published as a correct hash for that >> track. The MD5 calculated for any track would be the same as the FLAC >> MD5 checksum. > > This sounds great, but the problem I have is that it doesn't address > different pressings. It's not a showstopper obviously, but it would be a > huge improvement over AR currently. Using the entire track also sounds > good, but how would you deal with drives which can't rip lead-out/in? > Having two hashes for each track seems somewhat sub-optimal. How big of a problem is this multiple pressing issue? Could an album have many pressings that are just barely non-identical? If not, maybe they should be tracked separately with their unique checksums. Different pressings of vinyl records are "tracked" separately. > I'd applaud any effort to include track hashing in MBz though. Having all > the data locked down over at AccurateRip isn't a great situation to be in, > especially if you remember CDDB. Let's not let this die. I think this is the next killer app for MBz. - Grant > -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrityOn Tue, 27 May 2008 01:51:47 +0200, Grant <emailgrant@...> wrote:
> How big of a problem is this multiple pressing issue? Could an album > have many pressings that are just barely non-identical? If not, maybe > they should be tracked separately with their unique checksums. > Different pressings of vinyl records are "tracked" separately. Different manufacturers will make discs with minute differences. All it takes is a frame in the wrong place and the checksum won't match. Currently in AR there's no protection against this, and like you said it has multiple checksums instead. The problem is that one release might have three ids, while another which has a slight offset difference has 0, resulting in no matches for whoever rips it. It's quite common depending on what you rip, and just as annoying every time. -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
|
|
Re: Verify data integrity>> How big of a problem is this multiple pressing issue? Could an album
>> have many pressings that are just barely non-identical? If not, maybe >> they should be tracked separately with their unique checksums. >> Different pressings of vinyl records are "tracked" separately. > > Different manufacturers will make discs with minute differences. All it > takes is a frame in the wrong place and the checksum won't match. > Currently in AR there's no protection against this, and like you said it > has multiple checksums instead. The problem is that one release might have > three ids, while another which has a slight offset difference has 0, > resulting in no matches for whoever rips it. It's quite common depending > on what you rip, and just as annoying every time. I understand how that works. I'm wondering if MBz should track different pressings separately as far as the checksum is concerned. If there are usually only a few pressings per album, it sounds like a reasonable thing to do. Vinyl records are cataloged according to pressing, but I haven't heard of CDs being tracked in the same way yet. - Grant > -- Per (Wizzcat) _______________________________________________ MusicBrainz-users mailing list MusicBrainz-users@... http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-users |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |