|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
[PATCH] PR rtl-opt/41833 Improve vec_select of a vec_duplicateHi,
With altivec we can have two vec_splat in a row but there is nothing currently in GCC that is able to optimize the code to be only vec_splat. Currently the two vec_splat's RTL look like: (insn 6 3 7 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:12 (set (reg:V4SI 120 [ D.2899 ]) (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg/v:V4SF 126 [ a ]) 0) (parallel [ (const_int 2 [0x2]) ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD (reg/v:V4SF 126 [ a ]) (nil))) (insn 7 6 8 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13 (set (reg:V4SF 127 [ D.2899 ]) (subreg:V4SF (reg:V4SI 120 [ D.2899 ]) 0)) 925 {*altivec_movv4sf} (expr_list:REG_DEAD (reg:V4SI 120 [ D.2899 ]) (nil))) (insn 8 7 13 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13 (set (reg:V4SI 123 [ D.2903 ]) (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg:V4SF 127 [ D.2899 ]) 0) (parallel [ (const_int 0 [0x0]) ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD (reg:V4SF 127 [ D.2899 ]) (nil))) So when combine these three instructions, we don't optimize the (vec_select:SI (vec_duplicate:V4SI (XYZ:SI) )) into (XYZ:SI) It does not matter what (XYZ:SI) or what element the vec_select is selecting as every element will be the same (XYZ:SI). This patch adds to simplify-rtx.c (simplify_binary_operation_1), this optimization and adds a testcase for the Altivec case. Note this patch has been in the PS3 toolchain for a long time now. OK? Bootstrapped and tested on powerpc64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a vec_duplicate. * gcc.target/powerpc/altivec-33.c: New testcase. Index: testsuite/gcc.target/powerpc/altivec-33.c =================================================================== --- testsuite/gcc.target/powerpc/altivec-33.c (revision 0) +++ testsuite/gcc.target/powerpc/altivec-33.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-options "-O2 -maltivec" } */ + +/* We should only produce one vspltw as we already splatted the value. */ +/* { dg-final { scan-assembler-times "vspltw" 1 } } */ + +#include <altivec.h> + +vector float f(vector float a) +{ + vector float b = vec_splat (a, 2); + return vec_splat (b, 0); +} + + Index: simplify-rtx.c =================================================================== --- simplify-rtx.c (revision 153829) +++ simplify-rtx.c (working copy) @@ -2946,6 +2946,9 @@ simplify_binary_operation_1 (enum rtx_co tmp_op, gen_rtx_PARALLEL (VOIDmode, vec)); return tmp; } + if (GET_CODE (trueop0) == VEC_DUPLICATE + && GET_MODE (XEXP (trueop0, 0)) == mode) + return XEXP (trueop0, 0); } else { |
|
|
Re: [PATCH] PR rtl-opt/41833 Improve vec_select of a vec_duplicateOn Tue, Nov 3, 2009 at 10:18 PM, Andrew Pinski <pinskia@...> wrote:
> Hi, > With altivec we can have two vec_splat in a row but there is nothing > currently in GCC that is able to optimize the code to be only > vec_splat. > Currently the two vec_splat's RTL look like: > (insn 6 3 7 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:12 > (set (reg:V4SI 120 [ D.2899 ]) > (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg/v:V4SF > 126 [ a ]) 0) > (parallel [ > (const_int 2 [0x2]) > ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD > (reg/v:V4SF 126 [ a ]) > (nil))) > > (insn 7 6 8 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13 > (set (reg:V4SF 127 [ D.2899 ]) > (subreg:V4SF (reg:V4SI 120 [ D.2899 ]) 0)) 925 > {*altivec_movv4sf} (expr_list:REG_DEAD (reg:V4SI 120 [ D.2899 ]) > (nil))) > > (insn 8 7 13 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13 > (set (reg:V4SI 123 [ D.2903 ]) > (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg:V4SF 127 > [ D.2899 ]) 0) > (parallel [ > (const_int 0 [0x0]) > ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD > (reg:V4SF 127 [ D.2899 ]) > (nil))) > > So when combine these three instructions, we don't optimize the > (vec_select:SI (vec_duplicate:V4SI (XYZ:SI) )) into (XYZ:SI) It does > not matter what (XYZ:SI) or what element the vec_select is selecting > as every element will be the same (XYZ:SI). > > This patch adds to simplify-rtx.c (simplify_binary_operation_1), this > optimization and adds a testcase for the Altivec case. > > Note this patch has been in the PS3 toolchain for a long time now. > > OK? Bootstrapped and tested on powerpc64-linux-gnu with no regressions. Ok. Bonus if you add a x86 SSE testcase as well. Thanks, Richard. > Thanks, > Andrew Pinski > > ChangeLog: > * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of > a vec_duplicate. > > * gcc.target/powerpc/altivec-33.c: New testcase. > |
| Free embeddable forum powered by Nabble | Forum Help |