Re: config files substitution with awk

View: New views
5 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: config files substitution with awk

by Paul Eggert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In <http://lists.gnu.org/archive/html/autoconf-patches/2006-12/msg00012.html>
Ralf Wildenhues <Ralf.Wildenhues@...> writes:

> - Have a separate bootstrapping mechanism in place for gawk that does not
> need any awk, similar to sed's bootstrap script.

This seems like way overkill to me.  How about if we merely
add 'awk' to the list of programs that 'configure' can run?
'awk' is present on every GNU and Unix system, and has been
present for three decades.  There should be no real problem
in bootstrapping gawk on any of these hosts, just as there
is no real problem in bootstrapping other basic utility
packages like coreutils.

It's true that there are porting issues among the various
awk implementations, but the proposed Autoconf-generated
'configure' scripts should work on them all, even the
"ancient awk" of Solaris.  And the GNU coding standards
already cover this issue by saying "Stick to the generally
supported options for these programs."

Here's a proposed patch to the GNU coding standards to
implement this suggestion.

2006-12-04  Paul Eggert  <eggert@...>

        * make-stds.texi (Utilities in Makefiles): Add awk to the list.

*** make-stds.texi-1 Mon Dec  4 14:44:23 2006
--- make-stds.texi Mon Dec  4 14:50:18 2006
*************** installation should not use any utilitie
*** 155,161 ****
  @c mkfifo mknod tee uname
 
  @example
! cat cmp cp diff echo egrep expr false grep install-info
  ln ls mkdir mv pwd rm rmdir sed sleep sort tar test touch true
  @end example
 
--- 155,161 ----
  @c mkfifo mknod tee uname
 
  @example
! awk cat cmp cp diff echo egrep expr false grep install-info
  ln ls mkdir mv pwd rm rmdir sed sleep sort tar test touch true
  @end example
 



Re: config files substitution with awk

by Karl Berry :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It sounds fine to me.

But for purposes of explaining to rms, can someone tell me (as briefly
as possible :) why it is desirable to switch to awk instead of sticking
with sed?

(I can imagine some reasons, but best to ask, I figure.)

Thanks,
karl



Re: config files substitution with awk

by Ralf Wildenhues :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Karl Berry wrote on Tue, Dec 05, 2006 at 12:14:28AM CET:
>
> But for purposes of explaining to rms, can someone tell me (as briefly
> as possible :) why it is desirable to switch to awk instead of sticking
> with sed?
>
> (I can imagine some reasons, but best to ask, I figure.)

The primary reason for introducing it to Autoconf was, that it allows a
faster substitution of variables.  Roughly speaking, a sed script like
   s/@var1@/text1/g
   s/@var2@/text2/g
   ...

used on an input file of the form
  var1=@var1@
  var2=@var2@
  ...

has an overhead scaling quadratically in the number of variables.
The original proposal has more details and measurements:
http://lists.gnu.org/archive/html/autoconf-patches/2006-11/msg00035.html


FWIW, I am working on a larger change to the list of makefile and
installation utilities in standards.texi.  If you want to wait, I
can try to finish it this weekend.

Cheers,
Ralf



Re: config files substitution with awk

by Paul Eggert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

karl@... (Karl Berry) writes:

> But for purposes of explaining to rms, can someone tell me (as briefly
> as possible :) why it is desirable to switch to awk instead of sticking
> with sed?

Speed.

For example, Ralf tested the change with OpenMPI, and switching from
'sed' to 'awk' sped up 'configure' by over a factor of 4.  With 'sed',
'configure' took over a minute (user+system CPU time).  With 'awk', it
took less than 15 seconds.

The speed comes from awk's hash tables.  Here's how to substitute
values for V variables using 'sed':

   s/@var1@/val1/
   s/@var2@/val2/
   ...
   s/@varV@/valV/

This is O(N**2).  With awk, you can do something roughly like this:

   nfields = split(line, field, "@")
   for (i = 2; i < nfields; i++) {
     key = field[i]
     if (key in S)
       field[i] = S[key]
   }

where S[k] gives you the value for key k.  This is O(N log N).

The actual code and analysis are more complicated -- among other
things, we're assuming input lines have bounded length, which is the
common practical case -- but here's Ralf Wildenhues's precis:

   If you have F config files, each with L lines in which substitutions
   apply, and S substituted variables, then the overall work for creating
   all config files currently scales roughly as

      F * (c1 * (L + S) + c2 * (L * S)) + c3 * S         (assuming 'sed')

   c1 is larger than c2, but the c2 term causes the most work for large
   packages.  If you switch from 'sed' to 'awk', this changes to:

      F * (c1 * (L + S) + c2 * (L * log (S))) + c3 * S   (assuming 'awk')

A longer version of this analysis can be found in
<http://lists.gnu.org/archive/html/autoconf-patches/2006-11/msg00035.html>.



Re: config files substitution with awk

by Karl Berry :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

    faster substitution of variables.

Thanks to both you and Paul for the info.

    FWIW, I am working on a larger change to the list of makefile and
    installation utilities in standards.texi.  If you want to wait, I
    can try to finish it this weekend.

I will wait, no problem.  Minimizing msgs to rms is always desirable.