Bug#554689: /usr/bin/dpkg-source: is slow

View: New views
4 Messages — Rating Filter:   Alert me  

Bug#554689: /usr/bin/dpkg-source: is slow

by Mike Hommey-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Package: dpkg-dev
Version: 1.15.4.1
Severity: normal
File: /usr/bin/dpkg-source
Tags: patch

As seen in (but unrelated to) bug #554612, running dpkg-source -b is slow on
big trees, where the main contender is Dpkg::Source::Patch::add_diff_file.

%Time    Sec.     #calls   sec/call  F  name
32.52   69.8110    44702   0.001562     Dpkg::Source::Patch::add_diff_file
29.95   64.2837    44829   0.001434     Dpkg::IPC::fork_and_exec
10.22   21.9370    44829   0.000489     Dpkg::IPC::wait_child
 7.14   15.3309    97591   0.000157     File::Spec::Unix::abs2rel
 4.25    9.1206   585681   0.000016     File::Spec::Unix::canonpath

Here, the main problem is obviously forking 44829 times diff -u, while the
vast majority of files in the orig tarball haven't been touched (which is
mostly true on all packages).

The attached patch (which may have style and correctness issue) implements
a very simple check in perl (so, without a fork) to see if files differ
before running diff. The result is stunning:

>From 3 minutes and 30 seconds on iceape in format 3.0 (quilt), dpkg-source
-b goes down to 35 seconds (where 15 are spent bunzipping).

This is where the time is spent, now:

%Time    Sec.     #calls   sec/call  F  name
24.41   14.1649      128   0.110663     Dpkg::IPC::wait_child
19.46   11.2948    97594   0.000116     File::Spec::Unix::abs2rel
13.77    7.9901    44703   0.000179     Dpkg::Source::Patch::add_diff_file
 9.37    5.4382   585699   0.000009     File::Spec::Unix::canonpath

It looks much better.

My gut feeling is that it should improve run time on most if not all packages.
(and more importantly, will help big packages maintainers)

I'm pretty sure reading by blocks of a multiple of 4k instead of reading line
by line could be faster, too.

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.31-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages dpkg-dev depends on:
ii  binutils                      2.20-2     The GNU assembler, linker and bina
ii  bzip2                         1.0.5-3    high-quality block-sorting file co
ii  dpkg                          1.15.4.1   Debian package management system
ii  libtimedate-perl              1.1900-1   Time and date functions for Perl
ii  lzma                          4.43-14    Compression method of 7z format in
ii  make                          3.81-7     An utility for Directing compilati
ii  patch                         2.5.9-5    Apply a diff file to an original
ii  perl [perl5]                  5.10.1-6   Larry Wall's Practical Extraction
ii  perl-modules                  5.10.1-6   Core Perl modules

Versions of packages dpkg-dev recommends:
ii  build-essential           11.4           Informational list of build-essent
ii  fakeroot                  1.14.3         Gives a fake root environment
ii  gcc [c-compiler]          4:4.3.3-9+nmu1 The GNU C compiler
ii  gcc-4.1 [c-compiler]      4.1.2-27       The GNU C compiler
ii  gcc-4.2 [c-compiler]      4.2.4-6        The GNU C compiler
ii  gcc-4.3 [c-compiler]      4.3.4-6        The GNU C compiler
ii  gcc-4.4 [c-compiler]      4.4.2-2        The GNU C compiler
ii  gnupg                     1.4.10-2       GNU privacy guard - a free PGP rep
ii  gpgv                      1.4.10-2       GNU privacy guard - signature veri

Versions of packages dpkg-dev suggests:
ii  debian-keyring [debian-mainta 2009.08.27 GnuPG (and obsolete PGP) keys of D

-- no debconf information

-- debsums errors found:
debsums: changed file /usr/share/perl5/Dpkg/Source/Patch.pm (from dpkg-dev package)
debsums: changed file /usr/share/perl5/Dpkg/Source/Package/V2.pm (from dpkg-dev package)

--- /usr/share/perl5/Dpkg/Source/Patch.pm
+++ /usr/share/perl5/Dpkg/Source/Patch.pm
@@ -58,6 +58,19 @@
 
 sub add_diff_file {
     my ($self, $old, $new, %opts) = @_;
+    open(OLD, "<", $old);
+    open(NEW, "<", $new);
+    my $match = 1;
+    while (<OLD>) {
+        if ($_ ne <NEW>) {
+            $match = 0;
+            last;
+        }
+    }
+    close OLD;
+    close NEW;
+    return 1 if ($match);
+
     $opts{"include_timestamp"} = 0 unless exists $opts{"include_timestamp"};
     my $handle_binary = $opts{"handle_binary_func"} || sub {
         my ($self, $old, $new) = @_;

Bug#554689: /usr/bin/dpkg-source: is slow

by Raphael Hertzog-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 06 Nov 2009, Mike Hommey wrote:
> The attached patch (which may have style and correctness issue) implements
> a very simple check in perl (so, without a fork) to see if files differ
> before running diff. The result is stunning:

Thanks for the good idea. The change I made is the following:
--- a/scripts/Dpkg/Source/Patch.pm
+++ b/scripts/Dpkg/Source/Patch.pm
@@ -32,6 +32,7 @@ use File::Find;
 use File::Basename;
 use File::Spec;
 use File::Path;
+use File::Compare;
 use Fcntl ':mode';
 #XXX: Needed for sub-second timestamps, require recent perl
 #use Time::HiRes qw(stat);
@@ -63,6 +64,8 @@ sub add_diff_file {
         my ($self, $old, $new) = @_;
         $self->_fail_with_msg($new, _g("binary file contents changed"));
     };
+    # Optimization to avoid forking diff if unnecessary
+    return 1 if compare($old, $new, 4096) == 0;
     # Default diff options
     my @options;
     if ($opts{"options"}) {

Building iceape, before:
real 1m11.078s
user 0m22.501s
sys 0m36.946s

After:
real 0m14.949s
user 0m9.377s
sys 0m2.948s

Cheers,
--
Raphaël Hertzog



--
To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Bug#554689: /usr/bin/dpkg-source: is slow

by Mike Hommey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 06, 2009 at 08:45:57PM +0100, Raphael Hertzog wrote:
> Building iceape, before:
> real 1m11.078s
> user 0m22.501s
> sys 0m36.946s
>
> After:
> real 0m14.949s
> user 0m9.377s
> sys 0m2.948s

We can go even further, with another optimization.

As appeared in the profiles, abs2rel and canonpath take a lot of time.
It look like abs2rel is bloated for what it is used for.

Replacing:
  my $fn = File::Spec->abs2rel($_, $new);
with
  my $fn = length($_) > length($new) + 1 ? substr($_, length($new) + 1) : '.';

and
  my $fn = File::Spec->abs2rel($_, $old);
with
  my $fn = length($_) > length($old) + 1 ? substr($_, length($old) + 1) : '.';

in add_diff_directory, make another 25% improvement. I'm now down to 25s
on iceape-2.0-1 in 3.0 (quilt) format with a bzip2 tarball, which is much
closer to bunzip2 (15s) + diff (2s), and down to 7s on the
iceape-1.1.17-2 source (where it took 11s before, compared to your 15s).

The canonpath calls disappear from the profile once the abs2rel calls
are modified as stated above.

Cheers,

Mike



--
To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...


Bug#554689: /usr/bin/dpkg-source: is slow

by Raphael Hertzog-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 07 Nov 2009, Mike Hommey wrote:
> We can go even further, with another optimization.
>
> As appeared in the profiles, abs2rel and canonpath take a lot of time.
> It look like abs2rel is bloated for what it is used for.

I applied the patch as well. Thanks!

Cheers,
--
Raphaël Hertzog



--
To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@...
with a subject of "unsubscribe". Trouble? Contact listmaster@...