WARNING: This server is unstable and will be retired in the next days. If you want to keep this forum available, please request immediately a migration on the Nabble Support forum. Forums that don't receive any migration request will be deleted forever.

 « Return to Thread: RFC: ARM Cortex-A8 and floating point performance

Re: RFC: ARM Cortex-A8 and floating point performance

by Andrew Pinski-2 :: Rate this Message:

| View in Thread



Sent from my iPhone

On Jun 16, 2010, at 6:04 AM, Richard Guenther <richard.guenther@...
 > wrote:

> On Wed, Jun 16, 2010 at 5:52 PM, Siarhei Siamashka
> <siarhei.siamashka@...> wrote:
>> Hello,
>>
>> Currently gcc (at least version 4.5.0) does a very poor job  
>> generating single
>> precision floating point code for ARM Cortex-A8.
>>
>> The source of this problem is the use of VFP instructions which are  
>> run on a
>> slow nonpipelined VFP Lite unit in Cortex-A8. Even turning on  
>> RunFast mode
>> (flush denormals to zero, disable exceptions) just provides a  
>> relatively minor
>> performance gain.
>>
>> The right solution seems to be the use of NEON instructions for  
>> doing most of
>> the single precision calculations.
>>
>> I wonder if it would be difficult to introduce the following  
>> changes to the
>> gcc generated code when optimizing for cortex-a8:
>> 1. Allocate single precision variables only to evenly or oddly  
>> numbered
>> s-registers.
>> 2. Instead of using 'fadds s0, s0, s2' or similar instructions, do
>> 'vadd.f32 d0, d0, d1' instead.
>>
>> The number of single precision floating point registers gets  
>> effectively
>> halved this way. Supporting '-mfloat-abi=hard' may be a bit tricky
>> (packing/unpacking of register pairs may be needed to ensure proper  
>> parameters
>> passing to functions). Also there may be other problems, like  
>> dealing with
>> strict IEEE-754 compliance (maybe a special variable attribute for  
>> relaxing
>> compliance requirements could be useful). But this looks like the  
>> only
>> solution to fix poor performance on ARM Cortex-A8 processor.
>>
>> Actually clang 2.7 seems to be working exactly this way. And it is
>> outperforming gcc 4.5.0 by up to a factor of 2 or 3 on some single  
>> precision
>> floating point tests that I tried on ARM Cortex-A8.
>
> On i?86 we have -mfpmath={sse,x87}, I suppose you could add
> -mfpmath=neon for arm (properly conflicting with -mfloat-abi=hard
> and requiring neon support).

Except unlike sse, neon does not fully support IEEE support. So this  
should only be done with -ffast-math :). The point that it is slow is  
not good enough to change it to be something that is wrong and fast.

>
> Richard.
>
>> --
>> Best regards,
>> Siarhei Siamashka
>>

 « Return to Thread: RFC: ARM Cortex-A8 and floating point performance