GNU make to consider files checksum

View: New views
14 Messages — Rating Filter:   Alert me  

GNU make to consider files checksum

by Giuseppe Scrivano-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I could find on this ML archives only a thread about this subject: to
consider the file checksum instead of the timestamp.
Other systems like scons already support this feature and it would be
great to have it for GNU Make too.

I attached a patch against the current CVS to add --use-checksum to
GNU Make, it is just a proof-of-concept but it shows that adding this
feature can really boost a remake.

In this way, simply touching a file will not cause it to be
recompiled, as it was easy to imagine but for example let's say you
modify a comment in the file test.c; using the standard make you will
have to:

test.c -> test.o -> test

Using a checksum you will have only:

test.c -> test.o

because the .o file is unchanged.

This scenario is what surprised me more as it is a very common one and
can save a lot of time at linking time.

The biggest problem is how save information, in the patch the checksum
for file a is saved in the file a.checksum, but I don't think this can
be a reasonable solution; probably hide them in a subdirectory is not
a so bad idea.

Concurrent accesses are not a problem using files, they will be used
almost in the same way as the timestamp information is used now;
anyway, in the worst case the hash will be different and the file will
be recompiled.

Beside use a better algorithm to find a hash for the file, MD5 is my
first thought, and hopefully find another way to store data (but still
I think files are the best choice), do you have other ideas or
suggestions?

Regards,
Giuseppe


? checksum_patch.diff
Index: file.c
===================================================================
RCS file: /sources/make/make/file.c,v
retrieving revision 1.90
diff -u -r1.90 file.c
--- file.c 4 Nov 2007 21:54:01 -0000 1.90
+++ file.c 11 Apr 2008 21:20:54 -0000
@@ -1,6 +1,6 @@
-/* Target file management for GNU Make.
+/* Target file management fo GNU Make.
 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software
+1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
 Foundation, Inc.
 This file is part of GNU Make.
 
@@ -189,8 +189,80 @@
       f->last = new;
     }
 
+  new->last_checksum = read_checksum (new);
+
+
   return new;
 }
+
+
+/* Compute the checksum for the file.  */
+
+int
+compute_checksum(struct file *new)
+{
+  int checksum = 0;
+  FILE *f;
+  char buffer [4096];
+  
+  f = fopen (new->name, "r");
+  if (f != NULL)
+    {
+      size_t nbr;
+      int i;
+      do
+        {
+          nbr = fread (buffer, 4096, 1, f);
+          
+          for (i = 0; i < nbr; i++)
+            checksum = 21 * checksum + 23 * buffer[i];
+          
+        }
+      while (nbr);
+      fclose (f);
+    }
+  return checksum;
+}
+
+int
+read_checksum(struct file *new)
+{
+  int checksum = 0;
+  FILE *f;
+  char * checksum_file = (char*) xmalloc (strlen (new->name) + 10);
+  
+  sprintf (checksum_file, "%s.checksum", new->name);
+  
+  f = fopen (checksum_file, "r");
+  if (f != NULL)
+    {
+      fread (&checksum, 4, 1, f);
+      fclose (f);
+    }
+  
+  
+  free (checksum_file);
+  return checksum;
+}
+
+void
+write_checksum(struct file *new)
+{
+  FILE *f;
+  char * checksum_file = (char*) xmalloc (strlen (new->name) + 10);
+  
+  sprintf (checksum_file, "%s.checksum", new->name);
+  
+  f = fopen (checksum_file, "w");
+  if (f != NULL)
+    {
+      fwrite (&new->checksum, 4, 1, f);
+      fclose (f);
+    }
+
+  free (checksum_file);
+}
+
 
 /* Rehash FILE to NAME.  This is not as simple as resetting
    the `hname' member, since it must be put in a new hash bucket,
Index: filedef.h
===================================================================
RCS file: /sources/make/make/filedef.h,v
retrieving revision 2.30
diff -u -r2.30 filedef.h
--- filedef.h 4 Jul 2007 19:35:18 -0000 2.30
+++ filedef.h 11 Apr 2008 21:20:54 -0000
@@ -1,6 +1,6 @@
 /* Definition of target file data structures for GNU Make.
 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software
+1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
 Foundation, Inc.
 This file is part of GNU Make.
 
@@ -94,6 +94,8 @@
                                    pattern-specific variables.  */
     unsigned int considered:1;  /* equal to 'considered' if file has been
                                    considered on current scan of goal chain */
+    int checksum; /* Actual checksum of the file.  */
+    int last_checksum; /* Last checksum registered on the file.  */
   };
 
 
@@ -103,6 +105,9 @@
 
 struct file *lookup_file (const char *name);
 struct file *enter_file (const char *name);
+int compute_checksum(struct file *new);
+int read_checksum(struct file *new);
+void write_checksum(struct file *new);
 struct dep *parse_prereqs (char *prereqs);
 void remove_intermediates (int sig);
 void snap_deps (void);
Index: main.c
===================================================================
RCS file: /sources/make/make/main.c,v
retrieving revision 1.227
diff -u -r1.227 main.c
--- main.c 4 Nov 2007 21:54:01 -0000 1.227
+++ main.c 11 Apr 2008 21:20:57 -0000
@@ -1,6 +1,6 @@
 /* Argument parsing and main program of GNU Make.
 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software
+1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
 Foundation, Inc.
 This file is part of GNU Make.
 
@@ -226,6 +226,12 @@
 unsigned int default_job_slots = 1;
 static unsigned int master_job_slots = 0;
 
+
+/* Define if the checksum of a file should be considered.  */
+
+int use_checksum = 0;
+
+
 /* Value of job_slots that means no limit.  */
 
 static unsigned int inf_jobs = 0;
@@ -365,6 +371,8 @@
                               Consider FILE to be infinitely new.\n"),
     N_("\
   --warn-undefined-variables  Warn when an undefined variable is referenced.\n"),
+    N_("\
+  --use-checksum  Use the files checksum.\n"),
     NULL
   };
 
@@ -411,6 +419,7 @@
     { 'S', flag_off, &keep_going_flag, 1, 1, 0, 0, &default_keep_going_flag,
       "no-keep-going" },
     { 't', flag, &touch_flag, 1, 1, 1, 0, 0, "touch" },
+    { 'U', flag, &use_checksum, 1, 1, 1, 0, 0, "checksum" },
     { 'v', flag, &print_version_flag, 1, 1, 0, 0, 0, "version" },
     { CHAR_MAX+3, string, &verbosity_flags, 1, 1, 0, 0, 0,
       "verbosity" },
@@ -432,6 +441,7 @@
     { "new-file", required_argument, 0, 'W' },
     { "assume-new", required_argument, 0, 'W' },
     { "assume-old", required_argument, 0, 'o' },
+    { "use-checksum", optional_argument, 0, 'U' },
     { "max-load", optional_argument, 0, 'l' },
     { "dry-run", no_argument, 0, 'n' },
     { "recon", no_argument, 0, 'n' },
Index: make.h
===================================================================
RCS file: /sources/make/make/make.h,v
retrieving revision 1.131
diff -u -r1.131 make.h
--- make.h 4 Nov 2007 21:54:01 -0000 1.131
+++ make.h 11 Apr 2008 21:20:57 -0000
@@ -517,6 +517,7 @@
 
 extern unsigned int commands_started;
 
+extern int use_checksum;
 extern int handling_fatal_signal;
 
 
Index: remake.c
===================================================================
RCS file: /sources/make/make/remake.c,v
retrieving revision 1.137
diff -u -r1.137 remake.c
--- remake.c 5 Nov 2007 14:15:20 -0000 1.137
+++ remake.c 11 Apr 2008 21:20:59 -0000
@@ -1,6 +1,6 @@
 /* Basic dependency engine for GNU Make.
 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software
+1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
 Foundation, Inc.
 This file is part of GNU Make.
 
@@ -505,7 +505,6 @@
           d->file->dontcare = file->dontcare;
         }
 
-
       dep_status |= check_dep (d->file, depth, this_mtime, &maybe_make);
 
       /* Restore original dontcare flag. */
@@ -515,6 +514,7 @@
       if (! d->ignore_mtime)
         must_make = maybe_make;
 
+
       check_renamed (d->file);
 
       {
@@ -546,6 +546,7 @@
   /* Now we know whether this target needs updating.
      If it does, update all the intermediate files we depend on.  */
 
+
   if (must_make || always_make_flag)
     {
       for (d = file->deps; d != 0; d = d->next)
@@ -764,6 +765,11 @@
       DBF (DB_VERBOSE, _("Recipe of `%s' is being run.\n"));
       return 0;
     }
+  else if (use_checksum)
+    {
+      file->checksum = compute_checksum (file);
+      write_checksum (file);
+    }
 
   switch (file->update_status)
     {
@@ -946,8 +952,31 @@
       check_renamed (file);
       mtime = file_mtime (file);
       check_renamed (file);
-      if (mtime == NONEXISTENT_MTIME || mtime > this_mtime)
- *must_make_ptr = 1;
+
+      if (mtime == NONEXISTENT_MTIME)
+        {
+          *must_make_ptr = 1;
+        }
+      else if(mtime > this_mtime)
+        {
+          if (use_checksum && file->last_checksum )
+            {
+              file->checksum = compute_checksum (file);
+
+              if (file->checksum != file->last_checksum)
+                *must_make_ptr = 1;
+
+            }
+          else
+            *must_make_ptr = 1;
+
+          if (use_checksum)
+            {
+              if (!file->checksum)
+                file->checksum = compute_checksum (file);
+              write_checksum (file);
+            }
+        }
     }
   else
     {
@@ -972,10 +1001,33 @@
       check_renamed (file);
       mtime = file_mtime (file);
       check_renamed (file);
-      if (mtime != NONEXISTENT_MTIME && mtime > this_mtime)
-        /* If the intermediate file actually exists and is newer, then we
-           should remake from it.  */
- *must_make_ptr = 1;
+      if (mtime != NONEXISTENT_MTIME)
+        {
+          *must_make_ptr = 1;
+        }
+      else if(mtime > this_mtime)
+        {
+          if (use_checksum && file->last_checksum)
+            {
+              file->checksum = compute_checksum (file);
+              if (file->checksum != file->last_checksum)
+                  *must_make_ptr = 1;
+            }
+          else
+            /* If the intermediate file actually exists and is newer, then we
+               should remake from it.  */
+            *must_make_ptr = 1;
+
+
+          if (use_checksum)
+            {
+              if (!file->checksum)
+                file->checksum = compute_checksum (file);
+              write_checksum (file);
+            }
+
+        
+        }
       else
  {
           /* Otherwise, update all non-intermediate files we depend on, if
@@ -1002,20 +1054,20 @@
   if (lastd == 0)
     {
       file->deps = d->next;
-                      free_dep (d);
+          free_dep (d);
       d = file->deps;
     }
   else
     {
       lastd->next = d->next;
-                      free_dep (d);
+          free_dep (d);
       d = lastd->next;
     }
   continue;
  }
 
       d->file->parent = file;
-              maybe_make = *must_make_ptr;
+        maybe_make = *must_make_ptr;
       dep_status |= check_dep (d->file, depth, this_mtime,
                                        &maybe_make);
               if (! d->ignore_mtime)

_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> From: Giuseppe Scrivano <gscrivano@...>
> Date: Fri, 11 Apr 2008 23:45:02 +0200
>
> Other systems like scons already support this feature and it would be
> great to have it for GNU Make too.
>
> I attached a patch against the current CVS to add --use-checksum to
> GNU Make, it is just a proof-of-concept but it shows that adding this
> feature can really boost a remake.

Thanks.  (I'm not the head maintainer, so please wait for Paul and
others to respond.)

> +int
> +compute_checksum(struct file *new)
> +{
> +  int checksum = 0;
> +  FILE *f;
> +  char buffer [4096];
> +  
> +  f = fopen (new->name, "r");

This needs to use "rb", not "r".

Also, what about directories? they cannot be fopen'ed and fread, at
least not on all supported systems.


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Giuseppe Scrivano-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/4/12, Eli Zaretskii <eliz@...>:

> Thanks.  (I'm not the head maintainer, so please wait for Paul and
>  others to respond.)
>
>  > +int
>  > +compute_checksum(struct file *new)
>  > +{
>  > +  int checksum = 0;
>  > +  FILE *f;
>  > +  char buffer [4096];
>  > +
>  > +  f = fopen (new->name, "r");
>
>  This needs to use "rb", not "r".

Thank you for the reply, yes it should be "rb".


>  Also, what about directories? they cannot be fopen'ed and fread, at
>  least not on all supported systems.

IHMO directories should not be considered, while reading the mtime can
be done quickly, considering a checksum for all the files contained in
the directory is very expensive (in my proof-of-concept patch I didn't
put any control code).

Regards,
Giuseppe


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Tim Murphy-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I am also not a maintainer :-).

One small concern that I have with checksums is that it might take a really long time to check large files or a lot of files.

This would indicate that one needs to be able to switch checksumming on/off for different prerequisites.

I think that there should be a way to indicate the "type" of dependency that is implied:

e.g. make already has "ORDER ONLY" dependencies  which check based on existence rather than mtime.

/tmp/out/fred.o: fred.c | /tmp/out

.... means that the directory /tmp/out must exist before one can write fred.o but if the dir is newer than fred.o then there is no need to regenerate fred.o

so for checksums one might do something like:

/tmp/out/fred.o: |cksum fred.c | /tmp/out

There are other dependency methods that matter  and one should leave room to add them later too e.g.:
1. Whitespace changes in the file only
2. Comments changed but nothing else
3. Source control system says file has not changed  so even though it was checked out and has a new mtime, don't rebuild it.

That's just my 2 pence :-)

Regards,

Tim


On 12/04/2008, Giuseppe Scrivano <gscrivano@...> wrote:
2008/4/12, Eli Zaretskii <eliz@...>:

> Thanks.  (I'm not the head maintainer, so please wait for Paul and
>  others to respond.)
>
>  > +int
>  > +compute_checksum(struct file *new)
>  > +{
>  > +  int checksum = 0;
>  > +  FILE *f;
>  > +  char buffer [4096];
>  > +
>  > +  f = fopen (new->name, "r");
>
>  This needs to use "rb", not "r".


Thank you for the reply, yes it should be "rb".



>  Also, what about directories? they cannot be fopen'ed and fread, at
>  least not on all supported systems.


IHMO directories should not be considered, while reading the mtime can
be done quickly, considering a checksum for all the files contained in
the directory is very expensive (in my proof-of-concept patch I didn't
put any control code).

Regards,

Giuseppe



_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make



--
You could help some brave and decent people to have access to uncensored news by making a donation at:

http://www.thezimbabwean.co.uk/
_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Brian Dessent :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tim Murphy wrote:

> One small concern that I have with checksums is that it might take a
> really long time to check large files or a lot of files.

True.  However if you have, say, 100MB of objects then the time to
calculate checksums is almost certainly going to be smaller than the
time it takes to unnecessarily re-link those 100MB of objects when one
source file has trivially changed.  In other words, the savings scale
with the cost.  And the savings have a possibility to scale faster than
cost, such as when the checksum proves it unnecessary to rebuild a
library that would otherwise have triggered the re-linking of numerous
other binaries in the tree.

Autoconf for a long time has had a very primitive version of this type
of logic with config.h, since nearly every source file in a tree
typically depends on config.h.

What would be even neater would be if gcc could implement something
analogous to what happened with -MD: generate a md5sum as a side effect
of compilation.  Then make could use that with no overhead, just as it
currently possible to use the dependency output of -MD.

Brian


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Giuseppe Scrivano-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Eli Zaretskii wrote:
> Thanks.  (I'm not the head maintainer, so please wait for Paul and
> others to respond.)
I sent a message to this mailing list some months ago but I still didn't
get an answer.  Doesn't GNU Make want to consider files checksum in
addition to mtime?

Giuseppe


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Paul Smith-20 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2008-08-28 at 09:06 +0200, Giuseppe Scrivano wrote:
> I sent a message to this mailing list some months ago but I still
> didn't get an answer.  Doesn't GNU Make want to consider files
> checksum in addition to mtime?

There was a Google SOC project for GNU make which added "user-definable
out of date" criteria; these could be defined on a per-target basis and,
as per the name, were defined by the user, not hardcoded (as md5sum
would be).  For example, you can short-circuit an expensive md5sum check
by simply comparing the file sizes: most of the time they will be
different and if so you can skip md5sum altogether.

The major change this implies is that you must have a "stateful make"; a
make that stores state from previous invocations, then reads it the next
time.  Normal make is stateless; or at least it uses only the state
provided by the filesystem and not its own state.


The project was successful in that the changes were delivered; however,
the user interface implementation is, in my opinion, too baroque at the
moment.  Its use model confuses me, anyway.  This is not so much the
fault of the student as my fault: I simply did not have enough time to
be a good mentor for the project and provide enough direction.  I knew
this would be an issue (I didn't solicit anyone to do this work but
someone contacted me and really wanted to do it, and I wanted it done)
but I hoped I would find the time.  And, a lot of really good work was
done... it's just the presentation to the user that I think needs more
effort.

--
-------------------------------------------------------------------------------
 Paul D. Smith <psmith@...>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.mad-scientist.us
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Philip Guenther-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Apr 11, 2008 at 2:45 PM, Giuseppe Scrivano<gscrivano@...> wrote:
> I could find on this ML archives only a thread about this subject: to
> consider the file checksum instead of the timestamp.
> Other systems like scons already support this feature and it would be
> great to have it for GNU Make too.

This is a long dead thread (it's been sitting in my mailbox for a
year, ouch), but I'll throw in my two cents that a makefile can
implement this for itself with pattern rules.  Consider:

----
%.o: %.c
%.o.new: %.c
        $(COMPILE.c) -o $@ $<
%.o: %.o.new
        @{ [ -f $@.md5 ] && md5sum -c --status $@.md5; } ||     \
        { md5sum $< >$@.md5; cp $< $@; }

.SECONDARY:
-----

Poof, if you touch a .c file without making changes that affect the
compiler output, the executable will not be relinked.  Indeed, the
presence of the .SECONDARY target means the only thing that will be
rerun each time is the md5sum.

Yes, this is non-trivial to use, but it's also completely flexible,
letting you use whatever checksum comparison you want (need to strip
comments or RCS/CVS tags from the file before checksumming it?  Sure!)
and can be used Right Now.

Anyway, we now return you to your originally scheduled mailing list.


Philip Guenther


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Giuseppe Scrivano-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Philip,

it looks like a good idea.  Do you think it worths to be discussed with
automake hackers?


Cheers,
Giuseppe


Philip Guenther <guenther@...> writes:

> On Fri, Apr 11, 2008 at 2:45 PM, Giuseppe Scrivano<gscrivano@...> wrote:
>> I could find on this ML archives only a thread about this subject: to
>> consider the file checksum instead of the timestamp.
>> Other systems like scons already support this feature and it would be
>> great to have it for GNU Make too.
>
> This is a long dead thread (it's been sitting in my mailbox for a
> year, ouch), but I'll throw in my two cents that a makefile can
> implement this for itself with pattern rules.  Consider:
>
> ----
> %.o: %.c
> %.o.new: %.c
>         $(COMPILE.c) -o $@ $<
> %.o: %.o.new
>         @{ [ -f $@.md5 ] && md5sum -c --status $@.md5; } ||     \
>         { md5sum $< >$@.md5; cp $< $@; }
>
> .SECONDARY:
> -----
>
> Poof, if you touch a .c file without making changes that affect the
> compiler output, the executable will not be relinked.  Indeed, the
> presence of the .SECONDARY target means the only thing that will be
> rerun each time is the md5sum.
>
> Yes, this is non-trivial to use, but it's also completely flexible,
> letting you use whatever checksum comparison you want (need to strip
> comments or RCS/CVS tags from the file before checksumming it?  Sure!)
> and can be used Right Now.
>
> Anyway, we now return you to your originally scheduled mailing list.
>
>
> Philip Guenther


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Philip Guenther-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Sep 28, 2009 at 11:05 AM, Giuseppe Scrivano <gscrivano@...> wrote:
> it looks like a good idea.  Do you think it worths to be discussed with
> automake hackers?

I'm not actually convinced that this checksumming is a good idea,
mainly because I'm not convinced this is enough of a problem.  The
point of my message was just that this problem *can* be solved at the
makefile level.  Attacking it by changing automake sounds practical
and probably a faster way to a solution, though I would prefer it to
be optional even there.

(Have you measured how often this sort of thing would save
recompilation and/or relinking and how much time it would save then?
What's the comparison to how much time would be spent calculating the
checksums?  If it saves a minute once every 100 compiles but costs a
second in each of those, then it's a net loss...)

Philip Guenther


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Giuseppe Scrivano-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Philip Guenther <guenther@...> writes:

> (Have you measured how often this sort of thing would save
> recompilation and/or relinking and how much time it would save then?
> What's the comparison to how much time would be spent calculating the
> checksums?  If it saves a minute once every 100 compiles but costs a
> second in each of those, then it's a net loss...)

I don't have numbers but I think it can save a lot of time in the
linking phase, that is *really* slow.

Best,
Giuseppe



_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Tim Murphy-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I think that checksumming might benefit some targets.  It would be
nice to be able to implement different "methods" for different targets
- because not all methods work well in all circumstances.

I have one example where every single file in a huge build includes 1
particular header file.  The file defines macros which are the
features that are enabled or disabled in the build.

We know which features are used by particular components so in theory
we could work out not to rebuild components that are not influenced by
what's happened to the header file.  e.g. we could switch on a feature
or add a new feature without forcing a rebuild of the entire source
base.

This requires something like md5 but also some kind of "filter" to
determine what kinds of changes are significant to the particular
target that you are testing the dependency for

You can emulate md5 checksum dependencies  in make of course, using
temporary marker files, but it's a bit ugly and complicated..


Regards,

Tim

2009/9/29 Giuseppe Scrivano <gscrivano@...>:

> Philip Guenther <guenther@...> writes:
>
>> (Have you measured how often this sort of thing would save
>> recompilation and/or relinking and how much time it would save then?
>> What's the comparison to how much time would be spent calculating the
>> checksums?  If it saves a minute once every 100 compiles but costs a
>> second in each of those, then it's a net loss...)
>
> I don't have numbers but I think it can save a lot of time in the
> linking phase, that is *really* slow.
>
> Best,
> Giuseppe
>
>
>
> _______________________________________________
> Bug-make mailing list
> Bug-make@...
> http://lists.gnu.org/mailman/listinfo/bug-make
>



--
You could help some brave and decent people to have access to
uncensored news by making a donation at:

http://www.thezimbabwean.co.uk/


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

RE: GNU make to consider files checksum

by lasse.makholm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Tim Murphy wrote:
> I think that checksumming might benefit some targets.  It would be
> nice to be able to implement different "methods" for different targets
> - because not all methods work well in all circumstances.

> I have one example where every single file in a huge build includes 1
> particular header file.  The file defines macros which are the
> features that are enabled or disabled in the build.
>
> We know which features are used by particular components so in theory
> we could work out not to rebuild components that are not influenced by
> what's happened to the header file.  e.g. we could switch on a feature
> or add a new feature without forcing a rebuild of the entire source
> base.

You can do that already today by simply splitting your global feature
header file into smaller pieces and letting targets depend on only
the relevant pieces rather than everything...

Of course that means you have to know which targets need which
pieces of the feature set to define the correct dependencies but
that's the price you pay for properly functioning incremental
rebuilds. You can't have your cake and eat it too...

This sort dependency generation can usually be automated fairly
easily though, but it's something that is highly dependant on your
software architecture, build structure, programming language,
etc...

For that reason it belongs in your makefiles and not in GNU make
itself...

> This requires something like md5 but also some kind of "filter" to
> determine what kinds of changes are significant to the particular
> target that you are testing the dependency for

IMNSHO, this is not a problem that make can (or should even attempt
to) solve for you. This "filter" as you call it would have to know a lot
about the the syntax your header and code files which makes it a bad
candidate for a core make feature.

This is the classic "global.h" problem of large software builds...

> You can emulate md5 checksum dependencies  in make of course, using
> temporary marker files, but it's a bit ugly and complicated..

This problem is not strictly related to MD5 summing. With MD5 summing
instead of timestamps, your global header file would still change and
cause a full rebuild because this is what you explicitly asked for by saying
that all targets depend on it.

/Lasse


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make

Re: GNU make to consider files checksum

by Tim Murphy-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi :-)

2009/10/5  <lasse.makholm@...>:

>
> Tim Murphy wrote:
>> I think that checksumming might benefit some targets.  It would be
>> nice to be able to implement different "methods" for different targets
>> - because not all methods work well in all circumstances.
>
>> I have one example where every single file in a huge build includes 1
>> particular header file.  The file defines macros which are the
>> features that are enabled or disabled in the build.
>>
>> We know which features are used by particular components so in theory
>> we could work out not to rebuild components that are not influenced by
>> what's happened to the header file.  e.g. we could switch on a feature
>> or add a new feature without forcing a rebuild of the entire source
>> base.
>
> You can do that already today by simply splitting your global feature
> header file into smaller pieces and letting targets depend on only
> the relevant pieces rather than everything...

Yes, we have thought of that.  It's a good answer but it's hard to get
changes like that through into the absolutely enormous thing that it's
all used in.  That's just a human problem but it's more real than any
of the technical problems.  So it's not a solution we can use tomorrow
morning.

On the other hand what it the header file is stdio.h?  You can't
really do anything about that. what if the change is just a comment?

So I was searching for an example and that wasn't the greatest one.
What I really wanted to say was that for large files, an md5 checksum
is potentially a slow way to determine how out of date something is
but for small files it might be much more effective than a timestamp
and still be quick.

i.e. if you invent a new dependency mechanism then you need to be able
to balance where it is used so that it doesn't  end up making some
tasks worse.


>> This requires something like md5 but also some kind of "filter" to
>> determine what kinds of changes are significant to the particular
>> target that you are testing the dependency for
>
> IMNSHO, this is not a problem that make can (or should even attempt
> to) solve for you. This "filter" as you call it would have to know a lot
> about the the syntax your header and code files which makes it a bad
> candidate for a core make feature.

The filter would have to be external (i.e. not part of make) but it
could be much faster if it was a loadable plugin.

>> You can emulate md5 checksum dependencies  in make of course, using
>> temporary marker files, but it's a bit ugly and complicated..
>
> This problem is not strictly related to MD5 summing. With MD5 summing
> instead of timestamps, your global header file would still change and
> cause a full rebuild because this is what you explicitly asked for by saying
> that all targets depend on it.

One would md5 the filtered file (the result of the filter), not the
original.  One's filter would be on the features that affect the
current project.  So the filtered file would be unchanged even if you
changed the original and added new features as long as they weren't
ones that the current project (an exe,dll,lib,whatever) cared about.
This would mean that you need not rebuild any of the object files for
the current project.

So this would require a special-feature-filter-just-for-me.  GNU Make
wouldn't provide it but it might provide a way to load it and thus
make it fast enough to be worth using in a lot of places.

Arranging this in make as it is is complicated and messy because it
would involve creating temporary marker files that contained the md5
in their name.  This would lead to a mess of temporary files which it
would be hard to clean up precisely because by the time you want to
clean them you might not know what their real name is anymore.

We might do this one day without any help from make but I think it's
worthy for make to look beyond timestamps (actually beyond a lot of
stuff) and I am suggesting how.  I am constantly amazed by some of the
great features make has and I just think that a few more amazing
features wouldn't be a bad thing.

Regards,

Tim


--
You could help some brave and decent people to have access to
uncensored news by making a donation at:

http://www.thezimbabwean.co.uk/


_______________________________________________
Bug-make mailing list
Bug-make@...
http://lists.gnu.org/mailman/listinfo/bug-make