non-deterministic compression for CREDITS.gz in libppl9 amd64 & armel

View: New views
3 Messages — Rating Filter:   Alert me  

non-deterministic compression for CREDITS.gz in libppl9 amd64 & armel

by Neil Williams-4 :: Rate this Message:

| View Threaded | Show Only this Message

$ dget http://ftp.uk.debian.org/debian/pool/main/p/ppl/libppl9_0.11.2-6_armel.deb
$ dpkg -X libppl9_0.11.2-6_armel.deb .
$ cp ./usr/share/doc/libppl9/CREDITS.gz .

$ md5sum CREDITS.gz
0e52e84eebf41588865742edaff7b3c0  CREDITS.gz
$ gunzip CREDITS.gz
$ gzip -9nf CREDITS
$ md5sum CREDITS.gz
99e2b9f8972ce00cfe57e3735881015e  CREDITS.gz

This test was done on abel.debian.org - an armel machine using gzip
1.3.12-9 but the original armel package was built using gzip 1.4-1 on
the buildd.

(This md5sum matches that of the same file in the amd64 package.)
http://ftp.uk.debian.org/debian/pool/main/p/ppl/libppl9_0.11.2-6_amd64.deb
99e2b9f8972ce00cfe57e3735881015e  usr/share/doc/libppl9/CREDITS.gz

$ od -tx1 < CREDITS.gz > CREDITS-redone.gz.od
$ od -tx1 < ./usr/share/doc/libppl9/CREDITS.gz > CREDITS-original.gz.od
$ diff -u CREDITS-original.gz.od CREDITS-redone.gz.od
--- CREDITS-original.gz.od 2012-02-06 20:34:43.000000000 +0000
+++ CREDITS-redone.gz.od 2012-02-06 20:34:29.000000000 +0000
@@ -393,6 +393,6 @@
 0014200 16 78 3d a3 79 3d 1c f0 b0 c2 5f e9 f6 0b 5b 4c
 0014220 77 8b 91 89 1d 13 b7 58 16 f3 5b 10 1e 20 d1 f3
 0014240 d3 44 79 f2 05 9a 9c e7 87 42 b9 b5 34 42 56 55
-0014260 95 a1 bb 55 ec 78 cb f2 7f ba 11 41 f7 b3 d4 0f
-0014300 d6 b0 a7 11 7b 4c 00 00
-0014310
+0014260 95 a1 bb 55 ec 78 cb f2 7f ba 11 41 f7 ea 3f d6
+0014300 b0 a7 11 7b 4c 00 00
+0014307

i.e. the armel package contains the anomalous file but it can be
converted to the same file as the amd64 package by redoing the
compression.

The manifestation of this bug is clear when trying to install the
MultiArch build-dependencies for cross-compilers:

$ sudo apt-get install libcloog-ppl-dev:armel

Selecting previously unselected package libppl9:armel.
(Reading database ... 167711 files and directories currently installed.)
Unpacking libppl9:armel (from .../libppl9_0.11.2-6_armel.deb) ...
dpkg: error processing /var/cache/apt/archives/libppl9_0.11.2-6_armel.deb (--unpack):
 './usr/share/doc/libppl9/CREDITS.gz' is different from the same file on the system


--


Neil Williams
=============
http://www.linux.codehelp.co.uk/



attachment0 (205 bytes) Download Attachment

Re: non-deterministic compression for CREDITS.gz in libppl9 amd64 & armel

by Paul Eggert :: Rate this Message:

| View Threaded | Show Only this Message

I can't reproduce the problem on x86-64 with vanilla
gzip 1.4 and vanilla gzip 1.3.12.  So the problem appears to be
either architecture-dependent, or it's a property of
the Debian patches to gzip, or something like that, and
I expect we'll need more information about how to
reproduce the problem.  It looks like the problem is with
1.3.12-9 on armel so you might want to focus your attention
there.


Re: non-deterministic compression for CREDITS.gz in libppl9 amd64 & kfreebsd-amd64

by Neil Williams-4 :: Rate this Message:

| View Threaded | Show Only this Message

On Mon, 06 Feb 2012 14:21:15 -0800
Paul Eggert <eggert@...> wrote:

> I can't reproduce the problem on x86-64 with vanilla
> gzip 1.4 and vanilla gzip 1.3.12.  So the problem appears to be
> either architecture-dependent, or it's a property of
> the Debian patches to gzip, or something like that, and
> I expect we'll need more information about how to
> reproduce the problem.  It looks like the problem is with
> 1.3.12-9 on armel so you might want to focus your attention
> there.

The broken CREDITS.gz was created with gzip 1.4 from Debian unstable. I
happened to use 1.3.12 to test on a different armel machine but the
whole problem with this bug is that it is non-deterministic and simply
repeating the compression can "fix" the apparent problem.

I added the extra information because the two versions of CREDITS.gz
are available via the packages specified, so rather than having to rely
on my own debug information, there is the opportunity to view/analyse
the actual .gz files involved in a situation where the checksums can be
checked and validated and the build logs exist so that the actual
version of gzip installed can be checked too.

gzip: already installed (1.4-1)
https://buildd.debian.org/status/fetch.php?pkg=ppl&arch=armel&ver=0.11.2-6&stamp=1318428664

For comparison, the i386 build used the same version of gzip on the
same file and gave a different .gz file:
i386:
99e2b9f8972ce00cfe57e3735881015e  usr/share/doc/libppl9/CREDITS.gz
armel:
0e52e84eebf41588865742edaff7b3c0  usr/share/doc/libppl9/CREDITS.gz

i386 log:
https://buildd.debian.org/status/fetch.php?pkg=ppl&arch=i386&ver=0.11.2-6&stamp=1318344010

More examples may well turn up soon as more people install the
MultiArch-aware version of dpkg which allows packages to be alongside
each other. This assumes and requires that files compressed on one
architecture are the same as the same file compressed on a different
architecture. It is quite possible that the bug in gzip is independent
of the architecture itself but that is how all of these issues are going
to show up.

Indeed, a quick check shows that this is not architecture-specific. The
kfreebsd-amd64 log shows that CREDITS.gz is a larger file than
linux-amd64:

https://buildd.debian.org/status/fetch.php?pkg=ppl&arch=kfreebsd-amd64&ver=0.11.2-6&stamp=1318348840

kfreebsd-amd64:
6344 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz

linux-amd64:
6343 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz

http://ftp.uk.debian.org/debian/pool/main/p/ppl/libppl9_0.11.2-6_kfreebsd-amd64.deb

0e52e84eebf41588865742edaff7b3c0  usr/share/doc/libppl9/CREDITS.gz

Same as armel but different to armhf, i386 and amd64.

I see no reason why a change of kernel or of gcc compiler flags for the
same version of gzip (all 1.4) would cause such non-deterministic
results from using gzip -9n

There is something else going on here, something internal to gzip which
is changing certain bytes inside the compressed file - in the same
manner. It is strange indeed for four separate machines to produce two
matching pairs of the same discrepancy when running the same code.

Ignore my tests with an older version of gzip - these results are all
with gzip 1.4-1. It doesn't matter if I decompress/recompress on amd64
or armel, the discrepancy goes away. The problem is that we cannot
anticipate when the discrepancy will occur, leading to packages failing
to install in random and unpredictable patterns.

This bug is going to be hard to reproduce but the results of it are
neither architecture dependent nor version dependent. Interestingly,
other text files in the same package, compressed on the same machine,
using the same options to gzip, do not differ. It's the peculiar
requirements of MultiArch which have brought this to light and in the
majority of cases the results of gzip -9n on the same file are
identical - but not always.

--


Neil Williams
=============
http://www.linux.codehelp.co.uk/



attachment0 (205 bytes) Download Attachment