> I'm trying to protect against the gzip compression of the tar archive varying from generation to generation. gzip compression uses entropy -- random numbers. If you have two identical tar archives, and gzip compress them with the same settings, the resulting gzip files will not be byte for byte identical, and thus they'll have different checksums:
> It seems unlikely that github has an infinite amount of disk space to forever retain any tarball of any revision of any repository that some user may only have requested one time and nobody will ever request again. So I would assume they keep this generated tarball around for a period of time, maybe 24-48 hours, and then delete it if it hasn't been requested again.
On Apr 22, 2012, at 15:55, Ryan Schmidt wrote:
> On Apr 22, 2012, at 12:17, Sean Farley wrote:
>> You can see that they both generate the same checksums. For the above
>> link, the sha1sum reports:
>> $ sha1sum ~/Downloads/AndreaCrotti-yasnippet-snippets-1441728.tar.gz
> now let's wait 24-48 hours and try again and see if we still get the same checksum.
I remembered that I downloaded a .tar.gz of a revision of some project from github in October 2011. I tried downloading the same revision now, and to my surprise, found both the old and the new .tar.gz archives to have the same checksums. So either github is using exorbitant amounts of disk space to keep these old archives around, or has installed a custom version of gzip whose random seed can be controlled or by some other means ensure that the gzip output of repeated runs is identical. That's good news, so I suppose we can indeed fix the github portgroup now to fetch distfiles even when git.branch is specified.
We have most definitely observed the effect I described, however, with bitbucket: