Faster bilinear scaling

View: New views
3 Messages — Rating Filter:   Alert me  

Faster bilinear scaling

by Soeren Sandmann-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

This branch:

        http://cgit.freedesktop.org/~sandmann/pixman/log/?h=bilinear

contains a fast path for fetching of bilinearly filtered, scaled
images. It is basically Andre's work, described here:

        http://lists.cairographics.org/archives/cairo/2008-December/016170.html

What I did was

        - Update scaling-test to also test bilinear scaling

        - Remove bilinear_interpolation_left/right() functions in
          favor of just calling bilinear_interpolation().

        - Fix coding style.

The performance improvement for the swfdec-youtube benchmark on a
3.8GHz P4 is around 17%:

Before:

[ # ]  backend                         test   min(s) median(s)  stddev. count
[  0]    image               swfdec-youtube    8.375    8.431   0.44%   6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image               swfdec-youtube    6.942    7.019   0.61%  6/6

Much of the profile of this benchmark is in radial gradients, so other
users of bilinear scaling may see more improvement.

Also, if anyone is interested in adding support for SIMD acceeleration
of fetchers, the the bilinear_interpolation() function is begging to
be written with SSE2 or NEON.

Comments welcome.


Thanks,
Soren
_______________________________________________
cairo mailing list
cairo@...
http://lists.cairographics.org/mailman/listinfo/cairo

Re: Faster bilinear scaling

by Siarhei Siamashka :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 06 October 2009, Soeren Sandmann wrote:

> Hi,
>
> This branch:
>
>         http://cgit.freedesktop.org/~sandmann/pixman/log/?h=bilinear
>
> contains a fast path for fetching of bilinearly filtered, scaled
> images. It is basically Andre's work, described here:
>
>        
> http://lists.cairographics.org/archives/cairo/2008-December/016170.html
>
> What I did was
>
>         - Update scaling-test to also test bilinear scaling
>
>         - Remove bilinear_interpolation_left/right() functions in
>           favor of just calling bilinear_interpolation().
>
>         - Fix coding style.

Nice, any improvements in this area are very much welcome.

> The performance improvement for the swfdec-youtube benchmark on a
> 3.8GHz P4 is around 17%:
>
> Before:
>
> [ # ]  backend                         test   min(s) median(s)  stddev.
> count [  0]    image               swfdec-youtube    8.375    8.431   0.44%
>   6/6
>
> After:
>
> [ # ]  backend                         test   min(s) median(s) stddev.
> count [  0]    image               swfdec-youtube    6.942    7.019   0.61%
>  6/6
>
> Much of the profile of this benchmark is in radial gradients, so other
> users of bilinear scaling may see more improvement.

More specialized benchmarks would be nice to see too. For example benchmark
scaling 99x99 to 101x101 and compare it to a simple copy of 100x100 image.
That would give an estimate about how much this operation is memory throughput
limited and how much it can be potentially improved.

> Also, if anyone is interested in adding support for SIMD acceeleration
> of fetchers, the the bilinear_interpolation() function is begging to
> be written with SSE2 or NEON.

This can be tried indeed.

Also an alternative option for the bilinear filter can be to have two
temporary fetch buffers, don't do any kind of interpolation in the fetcher,
but put pairs of pixels into these buffers. Then do interpolation in a bulk.
Full width of SIMD registers may be utilized better in this case.
Interpolation can be also combined with some compositing operation, for
example OVER is the primary candidate.

Another variation of this is to do horizontal interpolation first and put
partly processed data into two temporary buffers. A possible advantage
of this approach is that horizontally interpolated data can be reused
multiple times quite often, especially when upscaling.

There are many things to try. It also can happen that optimal implementations
may be different for different platforms. But as long as the code is well
covered by regression tests, having more than one implementation should not be
a problem.

--
Best regards,
Siarhei Siamashka
_______________________________________________
cairo mailing list
cairo@...
http://lists.cairographics.org/mailman/listinfo/cairo

Re: Faster bilinear scaling

by Soeren Sandmann-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Siarhei Siamashka <siarhei.siamashka@...> writes:

> More specialized benchmarks would be nice to see too. For example benchmark
> scaling 99x99 to 101x101 and compare it to a simple copy of 100x100 image.
> That would give an estimate about how much this operation is memory throughput
> limited and how much it can be potentially improved.

Indeed, bilinear scaling may be a case where we would not actually be
memory bound, even with an SSE or NEON implementation.

> > Also, if anyone is interested in adding support for SIMD acceeleration
> > of fetchers, the the bilinear_interpolation() function is begging to
> > be written with SSE2 or NEON.
>
> This can be tried indeed.

There is a number of cases where support for implementation defined
fetchers would be useful, including gradients and the two SSE2
fetchers in bugzilla that Steve Snyder wrote.

> But as long as the code is well covered by regression tests, having
> more than one implementation should not be a problem.

Well, sometimes regression tests cause people to get sloppy and assume
that their code works just because no tests failed. Generally, I think
careful code review is desirable along with regression tests.

I went ahead and merged the bilinear optimizations.


Soren
_______________________________________________
cairo mailing list
cairo@...
http://lists.cairographics.org/mailman/listinfo/cairo