Question about first scheduling pass

View: New views
2 Messages — Rating Filter:   Alert me  

Question about first scheduling pass

by Bradley Lucier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I find that enabling scheduling before register allocation on x86-64 on
my codes often results in about a 10% increase in performance, so I'd
like to use it more often.

The pre-register-allocation scheduling pass often takes a lot longer
than the post-register-allocation pass on some of my program-generated C
codes, for example

 scheduling            :  69.33 (49%) usr   0.07 ( 3%) sys  85.07 (51%) wall    1954 kB ( 1%) ggc
 scheduling 2          :   0.63 ( 0%) usr   0.00 ( 0%) sys   0.63 ( 0%) wall     357 kB ( 0%) ggc
 TOTAL                 : 140.61             2.76           166.98             238286 kB

This code was compiled with

/pkgs/gcc-mainline-mem-stats/bin/gcc -march=core2 -msse4 -O3 -fschedule-insns -fmem-report -ftime-report -Wno-unused -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp   -I"../include" -c -o "_io.o" -I. -DHAVE_CONFIG_H -D___GAMBCDIR="\"/usr/local/Gambit-C/v4.5.3\"" -D___SYS_TYPE_CPU="\"x86_64\"" -D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -D___CONFIGURE_COMMAND="\"./configure CC=/pkgs/gcc-mainline/bin/gcc -march=core2 -msse4 -O3 -fschedule-insns --enable-multiple-versions --enable-single-host\"" -D___OBJ_EXTENSION="\".o\"" -D___EXE_EXTENSION="\"\"" -D___PRIMAL _io.c -D___LIBRARY 2> _io.out

with the compiler:

heine:~/programs/gcc/mainline/gcc> /pkgs/gcc-mainline-mem-stats/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline-mem-stats/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline-mem-stats/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline-mem-stats --enable-languages=c,c++ --enable-gather-detailed-mem-stats -enable-stage1-languages=c,c++
Thread model: posix
gcc version 4.5.0 20091109 (experimental) [trunk revision 154037] (GCC)

So, pre-register-allocation takes about 1/2 the CPU time of the entire
compile.

I've been trying to figure out why the first scheduling pass takes so
much longer than the second.  (In fact, I've asked this question in one
PR, but I can't find that PR right now.)  In the file sched-rgn.c I
found

        /* This pass implements list scheduling within basic blocks.  It is
           run twice: (1) after flow analysis, but before register allocation,
           and (2) after register allocation.
       
           The first run performs interblock scheduling, moving insns between
           different blocks in the same "region", and the second runs only
           basic block scheduling.
       
So I understand from this that the two scheduling passes are doing two
different things, so it makes sense that they take dramatically
different amounts of time.

What I'd like to know is whether there's a way to modify the first
scheduling pass to be more like the second and then see whether I get
similar speedups to what I'm getting now.  Perhaps the interblock
scheduling is really what's giving me the speedup, and perhaps not.

As a hack, could I just change

      NEXT_PASS (pass_sched);

in passes.c to

      NEXT_PASS (pass_sched2);

Or should I change the definitions of pass_sched and pass_sched2 in
sched-rgn.c?

Also, there are a number of sched*.c files; are there types of
scheduling other than basic-block scheduling and inter-block scheduling
that I could try?

I suppose that if simple basic-block scheduling works well in the first
scheduling pass for certain types of codes, perhaps there could be a
compiler option that allows people to choose it.

Brad


Re: Question about first scheduling pass

by Ian Lance Taylor-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bradley Lucier <lucier@...> writes:

> So, pre-register-allocation takes about 1/2 the CPU time of the entire
> compile.

I believe you, but I find that surprising.

> I've been trying to figure out why the first scheduling pass takes so
> much longer than the second.

It's presumably because the first scheduling pass is done before
register allocation, and therefore has much more flexibility in
rearranging instructions.  The second scheduling pass is done after
register allocation, which means that it can't rearrange instructions
that use the same register, even the instructions use them for
different values.  So the second scheduling pass has much less
freedom.

> What I'd like to know is whether there's a way to modify the first
> scheduling pass to be more like the second and then see whether I get
> similar speedups to what I'm getting now.

I don't know.

> As a hack, could I just change
>
>       NEXT_PASS (pass_sched);
>
> in passes.c to
>
>       NEXT_PASS (pass_sched2);

No.

> Or should I change the definitions of pass_sched and pass_sched2 in
> sched-rgn.c?

You would have to change the code itself.

> Also, there are a number of sched*.c files; are there types of
> scheduling other than basic-block scheduling and inter-block scheduling
> that I could try?

There is -fselective-scheduling and -fselective-scheduling2.

Ian