[PATCH] PR rtl-opt/41833 Improve vec_select of a vec_duplicate

View: New views
2 Messages — Rating Filter:   Alert me  

[PATCH] PR rtl-opt/41833 Improve vec_select of a vec_duplicate

by Andrew Pinski-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
  With altivec we can have two vec_splat in a row but there is nothing
currently in GCC that is able to optimize the code to be only
vec_splat.
Currently the two vec_splat's RTL look like:
(insn 6 3 7 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:12
(set (reg:V4SI 120 [ D.2899 ])
        (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg/v:V4SF
126 [ a ]) 0)
                (parallel [
                        (const_int 2 [0x2])
                    ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD
(reg/v:V4SF 126 [ a ])
        (nil)))

(insn 7 6 8 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13
(set (reg:V4SF 127 [ D.2899 ])
        (subreg:V4SF (reg:V4SI 120 [ D.2899 ]) 0)) 925
{*altivec_movv4sf} (expr_list:REG_DEAD (reg:V4SI 120 [ D.2899 ])
        (nil)))

(insn 8 7 13 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13
(set (reg:V4SI 123 [ D.2903 ])
        (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg:V4SF 127
[ D.2899 ]) 0)
                (parallel [
                        (const_int 0 [0x0])
                    ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD
(reg:V4SF 127 [ D.2899 ])
        (nil)))

So when combine these three instructions, we don't optimize the
(vec_select:SI (vec_duplicate:V4SI (XYZ:SI) )) into (XYZ:SI) It does
not matter what (XYZ:SI) or what element the vec_select is selecting
as every element will be the same (XYZ:SI).

This patch adds to simplify-rtx.c (simplify_binary_operation_1), this
optimization and adds a testcase for the Altivec case.

Note this patch has been in the PS3 toolchain for a long time now.

OK? Bootstrapped and tested on powerpc64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of
a vec_duplicate.

* gcc.target/powerpc/altivec-33.c: New testcase.

Index: testsuite/gcc.target/powerpc/altivec-33.c
===================================================================
--- testsuite/gcc.target/powerpc/altivec-33.c (revision 0)
+++ testsuite/gcc.target/powerpc/altivec-33.c (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O2 -maltivec" } */
+
+/* We should only produce one vspltw as we already splatted the value.  */
+/* { dg-final { scan-assembler-times "vspltw" 1 } } */
+
+#include <altivec.h>
+
+vector float f(vector float a)
+{
+  vector float b = vec_splat (a, 2);
+  return vec_splat (b, 0);
+}
+
+
Index: simplify-rtx.c
===================================================================
--- simplify-rtx.c (revision 153829)
+++ simplify-rtx.c (working copy)
@@ -2946,6 +2946,9 @@ simplify_binary_operation_1 (enum rtx_co
     tmp_op, gen_rtx_PARALLEL (VOIDmode, vec));
       return tmp;
     }
+  if (GET_CODE (trueop0) == VEC_DUPLICATE
+      && GET_MODE (XEXP (trueop0, 0)) == mode)
+    return XEXP (trueop0, 0);
  }
       else
  {

Re: [PATCH] PR rtl-opt/41833 Improve vec_select of a vec_duplicate

by Richard Guenther-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 3, 2009 at 10:18 PM, Andrew Pinski <pinskia@...> wrote:

> Hi,
>  With altivec we can have two vec_splat in a row but there is nothing
> currently in GCC that is able to optimize the code to be only
> vec_splat.
> Currently the two vec_splat's RTL look like:
> (insn 6 3 7 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:12
> (set (reg:V4SI 120 [ D.2899 ])
>        (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg/v:V4SF
> 126 [ a ]) 0)
>                (parallel [
>                        (const_int 2 [0x2])
>                    ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD
> (reg/v:V4SF 126 [ a ])
>        (nil)))
>
> (insn 7 6 8 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13
> (set (reg:V4SF 127 [ D.2899 ])
>        (subreg:V4SF (reg:V4SI 120 [ D.2899 ]) 0)) 925
> {*altivec_movv4sf} (expr_list:REG_DEAD (reg:V4SI 120 [ D.2899 ])
>        (nil)))
>
> (insn 8 7 13 2 gcc/gcc/testsuite/gcc.target/powerpc/altivec-33.c:13
> (set (reg:V4SI 123 [ D.2903 ])
>        (vec_duplicate:V4SI (vec_select:SI (subreg:V4SI (reg:V4SF 127
> [ D.2899 ]) 0)
>                (parallel [
>                        (const_int 0 [0x0])
>                    ])))) 1102 {altivec_vspltw} (expr_list:REG_DEAD
> (reg:V4SF 127 [ D.2899 ])
>        (nil)))
>
> So when combine these three instructions, we don't optimize the
> (vec_select:SI (vec_duplicate:V4SI (XYZ:SI) )) into (XYZ:SI) It does
> not matter what (XYZ:SI) or what element the vec_select is selecting
> as every element will be the same (XYZ:SI).
>
> This patch adds to simplify-rtx.c (simplify_binary_operation_1), this
> optimization and adds a testcase for the Altivec case.
>
> Note this patch has been in the PS3 toolchain for a long time now.
>
> OK? Bootstrapped and tested on powerpc64-linux-gnu with no regressions.

Ok.  Bonus if you add a x86 SSE testcase as well.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of
> a vec_duplicate.
>
> * gcc.target/powerpc/altivec-33.c: New testcase.
>