|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Pixman glyph performance, and beyond!So I'm reviewing how cairo handles compositing, looking at how we may drive cairo-gl more efficiently. As part of that process, I've had the opportunity to remove some overhead from within cairo-image. However, glyph composition still suffers from substantial overhead since every glyph is composited separately. firefox-talos-gfx on a slow Celeron 600MHz: # Overhead Symbol # ........ ...... 23.76% [.] _pixman_run_fast_path 23.34% [.] sse2_composite_add_n_8888_8888_ca 11.82% [.] sse2_composite_over_n_8888_8888_ca 6.31% [.] pixman_image_composite 4.69% [.] walk_region_internal 4.44% [.] pixman_blt_sse2 3.18% [.] _pixman_image_validate 2.30% [.] sse2_composite_over_n_8_8888 2.23% [.] pixman_compute_composite_region32 2.19% [.] pixman_fill_sse2 1.91% [.] sse2_composite (And to put it in perspective: 46.35% /usr/local/lib/libpixman-1.so.0.17.1 28.25% /home/ickle/src/cairo/src/.libs/libcairo.so.2.10905.0 14.24% [kernel]) Søren has looked at this problem in the past and begun work on fast-path and faster-fast-path branches, looking to cache prior fast-path resolutions. These are not yet as effective as one would hope. How insane would it be to push the get_fast_path() to the user and to be able to pass in the implementation + composite function instead of performing the search every time? This would also be useful for spans. And considering how most cairo operations are first performed to a mask, cairo could very effectively cache the fast path for its most frequent operations. I'm particular interested in suggestions and experiences from the ARM/NEON guys as they seem to be suffering acutely from similar overheads in pixman - and so I presume are also looking at this issue. -ickle -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ cairo mailing list cairo@... http://lists.cairographics.org/mailman/listinfo/cairo |
|
|
Re: Pixman glyph performance, and beyond!Chris Wilson <chris@...> writes:
> So I'm reviewing how cairo handles compositing, looking at how we may > drive cairo-gl more efficiently. As part of that process, I've had the > opportunity to remove some overhead from within cairo-image. However, > glyph composition still suffers from substantial overhead since every > glyph is composited separately. The X server suffers from essentially the same problem, except worse because each glyph is actually a new pixman_image_t, so there is horrible overhead from malloc() etc. for each and every glyph. > firefox-talos-gfx on a slow Celeron 600MHz: > # Overhead Symbol > # ........ ...... > 23.76% [.] _pixman_run_fast_path > 23.34% [.] sse2_composite_add_n_8888_8888_ca > 11.82% [.] sse2_composite_over_n_8888_8888_ca > 6.31% [.] pixman_image_composite > 4.69% [.] walk_region_internal > 4.44% [.] pixman_blt_sse2 > 3.18% [.] _pixman_image_validate > 2.30% [.] sse2_composite_over_n_8_8888 > 2.23% [.] pixman_compute_composite_region32 > 2.19% [.] pixman_fill_sse2 > 1.91% [.] sse2_composite Are these percentages of the 46.35% below, so it is 23.76% of 46.35% = 11.02% for _pixman_run_fast_path? Either way, it's clearly not good. > (And to put it in perspective: > 46.35% /usr/local/lib/libpixman-1.so.0.17.1 > 28.25% /home/ickle/src/cairo/src/.libs/libcairo.so.2.10905.0 > 14.24% [kernel]) > > Søren has looked at this problem in the past and begun work on > fast-path and faster-fast-path branches, looking to cache prior > fast-path resolutions. The latest incarnation of that work is the 'flags' branch here: http://cgit.freedesktop.org/~sandmann/pixman/log/?h=flags which contains several optimizations in this area. Here is a summary of what's in it: - It moves the computation of the various image properties out of the get_fast_path() loop and replaces them bit masks that are much faster to check. - It turns general_composite() and fast_composite_scale_nearest() into fast paths, so that all compositing goes through that path. - It eliminates all the composite methods from pixman_implementation_t - It adds a fast path cache - It speeds up the operator 'strength reduction' that Antoine added a long time ago, by storing the table more compactly and doing the mapping in O(1) time. I need to clean it up, break it into smaller bits, and send them to the list for review. > These are not yet as effective as one would hope. It might be worthwhile rerunning the benchmark against that branch, though I suspect there is still some overhead. Almost anything will show up when the images are as small as glyphs are. > How insane would it be to push the get_fast_path() to the user and to be > able to pass in the implementation + composite function instead of > performing the search every time? This would also be useful for spans. > And considering how most cairo operations are first performed to a mask, > cairo could very effectively cache the fast path for its most frequent > operations. I really think the fast paths need to be kept an implementation detail, because exposing them would constrain what information about the images you could rely on to compute the fast path. For example, right now pixman does not rely on the alignment of the image data when it selects the fast path. This means someone could look up a fast path, then go on to use with several differently-aligned images, which would mean pixman couldn't later on add alignment optimizations. However, I do agree that glyph compositing needs to become much faster in both X and cairo, but I think that a better way would be to move the Render glyph management code into pixman and expose a new pixman_glyph_set_t along with something like a pixman_composite_glyphs() similar to how Render works. This would allow both cairo and X to become substantially faster, while sharing glyph caching code. For spans, I still think that a polygon image type in pixman is the way to go, since again this would benefit both X and cairo. There could certainly be a call to convert it into spans if that is useful to other cairo backends, so that we wouldn't need to have two rasterizers. Thanks, Soren _______________________________________________ cairo mailing list cairo@... http://lists.cairographics.org/mailman/listinfo/cairo |
|
|
Re: Pixman glyph performance, and beyond!Excerpts from Soeren Sandmann's message of Fri Oct 23 02:37:39 +0100 2009:
> The latest incarnation of that work is the 'flags' branch here: > > http://cgit.freedesktop.org/~sandmann/pixman/log/?h=flags > > which contains several optimizations in this area. [snip] > It might be worthwhile rerunning the benchmark against that branch, > though I suspect there is still some overhead. Almost anything will > show up when the images are as small as glyphs are. Very effective, Søren, it eliminated the get_fast_path() overhead entirely: 32.84% [.] sse2_composite_add_n_8888_8888_ca 17.13% [.] sse2_composite_over_n_8888_8888_ca 15.98% [.] pixman_image_composite 5.78% [.] pixman_blt_sse2 5.40% [.] _pixman_image_validate 3.98% [.] pixman_compute_composite_region32 2.12% [.] pixman_fill_sse2 It looks like it's been absorbed into pixman_image_composite(), but the runtime improved by over 10% -- indicative that the lookup overhead was eliminated. Though there is still around 25% to be recovered. > I really think the fast paths need to be kept an implementation > detail, because exposing them would constrain what information about > the images you could rely on to compute the fast path. > > For example, right now pixman does not rely on the alignment of the > image data when it selects the fast path. This means someone could > look up a fast path, then go on to use with several > differently-aligned images, which would mean pixman couldn't later on > add alignment optimizations. I think you've effectively demonstrated that the overhead from selecting the fast path should be negligible. So we should move on to the question of how to push large batches of work to pixman efficiently. > However, I do agree that glyph compositing needs to become much faster > in both X and cairo, but I think that a better way would be to move > the Render glyph management code into pixman and expose a new > > pixman_glyph_set_t > > along with something like a pixman_composite_glyphs() similar to how > Render works. This would allow both cairo and X to become > substantially faster, while sharing glyph caching code. > > For spans, I still think that a polygon image type in pixman is the > way to go, since again this would benefit both X and cairo. There > could certainly be a call to convert it into spans if that is useful > to other cairo backends, so that we wouldn't need to have two > rasterizers. I'm actually not so convinced that this the direction that pixman should be going in. From my perspective cairo requires specific path -> backend geometry converters, and a polygon rasteriser with a span line interface has quickly become the default method for pushing masks around. Whereas traps have been relegated to mostly handling boxes, aside from when the most efficient wire request we have available is CompositeTraps. (Has anyone else noticed that the RLE mask for curved geometry is often an order of magnitude smaller than the equivalent set of trapezoids, almost as small as the original path?) Similarly, I'd rather not add the overhead of an independent layer of glyph management. With that bias, I'd prefer that pixman retained its focus on pixel manipulation routines and we improve the interfaces for performing large sets of similar operations. One issue that we will encounter very soon is the pain caused by forcing the user to emit cairo_show_glyphs() early for each change in font. This can be fixed up in the backends that batch requests and use a consolidated glyph atlas (i.e. there is no level state change and so the geometry is just accumulated onto the previous operation). [There is still substantial overhead from cairo doing the analysis on the extra operations.] Similarly we can move away from an immediate mode, direct access, pixman - and treat pixman more like a GPU, if it is performant. -ickle -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ cairo mailing list cairo@... http://lists.cairographics.org/mailman/listinfo/cairo |
|
|
Re: Pixman glyph performance, and beyond!Hi Chris,
> > The latest incarnation of that work is the 'flags' branch here: > > > > http://cgit.freedesktop.org/~sandmann/pixman/log/?h=flags > > > > which contains several optimizations in this area. > > [snip] > > > It might be worthwhile rerunning the benchmark against that branch, > > though I suspect there is still some overhead. Almost anything will > > show up when the images are as small as glyphs are. > > Very effective, Søren, it eliminated the get_fast_path() overhead entirely: > > 32.84% [.] sse2_composite_add_n_8888_8888_ca > 17.13% [.] sse2_composite_over_n_8888_8888_ca > 15.98% [.] pixman_image_composite > 5.78% [.] pixman_blt_sse2 > 5.40% [.] _pixman_image_validate > 3.98% [.] pixman_compute_composite_region32 > 2.12% [.] pixman_fill_sse2 > > It looks like it's been absorbed into pixman_image_composite(), but the > runtime improved by over 10% -- indicative that the lookup overhead was > eliminated. Though there is still around 25% to be recovered. Some of those 25% might be recovable from the _pixman_image_validate() by precomputing the flags, but other than that, I don't think there is all that much more to be gained without a better interface for glyph rendering. > > However, I do agree that glyph compositing needs to become much faster > > in both X and cairo, but I think that a better way would be to move > > the Render glyph management code into pixman and expose a new > > > > pixman_glyph_set_t > > > > along with something like a pixman_composite_glyphs() similar to how > > Render works. This would allow both cairo and X to become > > substantially faster, while sharing glyph caching code. > > > > For spans, I still think that a polygon image type in pixman is the > > way to go, since again this would benefit both X and cairo. There > > could certainly be a call to convert it into spans if that is useful > > to other cairo backends, so that we wouldn't need to have two > > rasterizers. > > I'm actually not so convinced that this the direction that pixman should > be going in. From my perspective cairo requires specific path -> backend > geometry converters, and a polygon rasteriser with a span line interface > has quickly become the default method for pushing masks around. Whereas > traps have been relegated to mostly handling boxes, aside from when the > most efficient wire request we have available is CompositeTraps. (Has > anyone else noticed that the RLE mask for curved geometry is often an > order of magnitude smaller than the equivalent set of trapezoids, > almost as small as the original path?) One of the major reasons for adding a polygon image to pixman is that it would make a PictPolygon render picture possible, thereby finally eliminating the use of trapezoids, while allowing the X server to do one-pass compositing of geometry, at least in software. I am not proposing to tesselate the polygons in pixman, but to composite them directly scanline by scanline. Using Tor's clever trick [1] of representing scanlines as accumulation buffers, the operation (source IN polygon) OVER dest can be implemented by /* Generate polygon scanline accumulation buffer */ for reach subpixel scanline for each active edge add delta to one pixel in accumulation buffer bresenham step uint8_t m = 0 for each pixel: m += polygon[i] if (m) { s = load_source(); d = load_dest() d = composite (s, m, d); } which is very SIMD-able and allows us to not allocate and zero-fill a potentially large temporary mask. I think this would be faster than the arrays of spans. This would benefit at least the image backend too, but in any case, there is a clear need to do better than trapezoids for uploading geometry to X. It's less convenient for shaders because of the horizontal prefix sum, which is why there could be a pixman_poly_image_make_spans (...) interface to generate spans for the cairo GPU backends, or a callback-based one like the one in current cairo. DDX drivers could use this as well, as could pixman GPU backends. > Similarly, I'd rather not add the overhead of an independent layer > of glyph management. Simply moving the Render glyph code into pixman would be an immediate improvement for non-hardware-accelerated X servers since the remaining overhead in the profile above would essentially disappear. A little longer term, it would allow storing the glyphs as efficiently as possible for the CPU in question (since pixman already has details about the CPU). For example, the SSE2 backend could benefit greatly if the glyphs were stored in a 16 x n image, possibly sorted by approximate use frequency. This would improve both cache performance and allow aligned loads to be used. I'm not sure I understand what 'overhead of an independent layer of glyph management' you see for cairo. For at least the image backend, I don't see anything that a pixman_glyph_set_t could not provide, and it seems like a win to share this code with the X server. For other backends, it may not be as useful, but it won't be harmful either. > With that bias, I'd prefer that pixman retained its focus on pixel > manipulation routines and we improve the interfaces for performing > large sets of similar operations. > One issue that we will encounter very soon is the pain caused by forcing > the user to emit cairo_show_glyphs() early for each change in font. This > can be fixed up in the backends that batch requests and use a > consolidated glyph atlas (i.e. there is no level state change and so the > geometry is just accumulated onto the previous operation). [There is > still substantial overhead from cairo doing the analysis on the extra > operations.] Similarly we can move away from an immediate mode, direct > access, pixman - and treat pixman more like a GPU, if it is performant. Even with glyphs and polygons in pixman, I agree that there is potentially a lot of benefit to be had from submitting more work to pixman at a time. I don't have a clear proposal for what such an interface would look like, but here are some things that are worth thinking about: - How well does the interface help if pixman is multithreaded? - How well does it help if pixman gains GPU backends? - Can we eliminate temporary images. Ie., if someone does cairo_push_group() <some simple stuff> cairo_pop_to_source() cairo_paint() can we do the whole operation in one pass without allocating a temporary mask? One way to do this would be to add a new pixman_image_composited_t image type, that would contain pointers to two other images, then composite them on the fly when the toplevel image is asked to fetch a scanline. This leads to the idea of an 'expression tree' of images as the way to submit a lots of work. - Can it be JIT compiled? There are two quite different approaches to JIT compilation: - Generate code similar to the current fast paths and cache it. This is simpler to get going at first, but also fundamentally requires temporary images. - Generate one-shot code for a lot of operations at once. With the expression tree idea, this might make a lot of sense. The compiler could look at the tree and generate the code that would produce the least amount of memory traffic. Shader code could be generated from this as well. Thanks, Soren [1] http://lists.cairographics.org/archives/cairo/2007-August/011136.html _______________________________________________ cairo mailing list cairo@... http://lists.cairographics.org/mailman/listinfo/cairo |
|
|
|
|
|
Re: Pixman glyph performance, and beyond!On Sat, Oct 24, 2009 at 10:24 AM, Chris Wilson <chris@...> wrote:
The emphasis here, IMO, has to be to maintain cairo's high image quality - If there are unavoidable quality/performance tradeoffs, then as a consumer of cairo (Firefox) I would really like to be able to choose our tradeoff point, because most of the time for us performance is more important than quality. Or better said, slow animation or rendering is a much more obvious quality problem than slightly degraded antialiasing. -- "He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all." [Isaiah 53:5-6] _______________________________________________ cairo mailing list cairo@... http://lists.cairographics.org/mailman/listinfo/cairo |
|
|
Re: Pixman glyph performance, and beyond!On Fri, Oct 23, 2009 at 5:37 PM, Robert O'Callahan <robert@...> wrote:
> If there are unavoidable quality/performance tradeoffs, then as a consumer > of cairo (Firefox) I would really like to be able to choose our tradeoff > point, because most of the time for us performance is more important than > quality. Or better said, slow animation or rendering is a much more obvious > quality problem than slightly degraded antialiasing. And, indeed, during animations and so forth the quality difference may not be perceptible, so we might well want to slip into "not quite as awesome" mode during transitions and animations and scrolling or whatever, and then do the last frame in "great!" mode. If we are doing a bounded animation and can use the GPU, we could even prepare that visually stunning end frame while the animation was being processed... Mike _______________________________________________ cairo mailing list cairo@... http://lists.cairographics.org/mailman/listinfo/cairo |
| Free embeddable forum powered by Nabble | Forum Help |