|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
FYI: bsxfun optimizedhi all,
I optimized bsxfun (binary singleton expansion function applier) to be faster when some common built-in functions are given. The bsxfun operations are also available from liboctave as bsxfun_add (NDArray, NDArray) etc. http://hg.savannah.gnu.org/hgweb/octave/rev/26abff55f6fe The following benchmark illustrates the speed-up (set n to suitable value) n = 200; i = ones (1, n); a = rand (n, n, 1); b = rand (n, n, n); tic; bsxfun (@plus, a, b); toc a = rand (n, 1, 1); tic; bsxfun (@minus, a, b); toc a = rand (1, n, 1); tic; bsxfun (@times, a, b); toc a = rand (n, n, 1); b = rand (1, n, n); tic; bsxfun (@rdivide, a, b); toc b = rand (1, 1, n); tic; bsxfun (@lt, a, b); toc a = rand (2, n, n/2, n); # heavily multi-dim b = rand (1, 1, n/2, n); tic; bsxfun (@gt, a, b); toc on a Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native, with a recent tip, I get: Elapsed time is 0.697962 seconds. Elapsed time is 0.645682 seconds. Elapsed time is 0.709098 seconds. Elapsed time is 0.720162 seconds. Elapsed time is 0.608029 seconds. Elapsed time is 24.2397 seconds. with the new patch, I got Elapsed time is 0.0434361 seconds. Elapsed time is 0.048116 seconds. Elapsed time is 0.0503559 seconds. Elapsed time is 0.0644929 seconds. Elapsed time is 0.0164821 seconds. Elapsed time is 0.098119 seconds. an interesting third option is a naive bsxfun replacement, which just spreads the arrays to a common size and then applies the operator: function r = bsxfun (op, x, y) nd = max (ndims (x), ndims (y)); xidx = yidx = repmat({':'}, 1, nd); for i = 1:nd sx = size (x, i); sy = size (y, i); if (sx == 1) xidx{i} = ones (1, sy); endif if (sy == 1) yidx{i} = ones (1, sx); endif endfor r = op (x(xidx{:}), y(yidx{:})); endfunction with this replacement, I get: Elapsed time is 0.0920911 seconds. Elapsed time is 0.098748 seconds. Elapsed time is 0.097363 seconds. Elapsed time is 0.137549 seconds. Elapsed time is 0.109806 seconds. Elapsed time is 0.18157 seconds. of course, the memory consumption will be up to 3x as big, but in general these are closer to their optimized counterparts than the original code. Currently, only some common operations are covered, such as the basic operations +,-,*,/, relations =,!=,<,>,<=,>= and pairwise min, max are covered, and the optimizations only work if the operation is homogeneous. More can be added if there is interest. One further question that remains is whether to retain the old code, or replace it with the spread-and-apply approach above. The old code is more memory-efficient (it creates the result array and then updates it), but apparently slower, sometimes significantly. Opinions? enjoy -- RNDr. Jaroslav Hajek computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: FYI: bsxfun optimizedtir, 20 10 2009 kl. 11:13 +0200, skrev Jaroslav Hajek:
> One further question that remains is whether to retain the old code, > or replace it with the spread-and-apply approach above. > The old code is more memory-efficient (it creates the result array and > then updates it), but apparently slower, sometimes significantly. > Opinions? Cool! I love these X-has-been-optimised mails :-) Would it be possible to test if the system has enough memory to use the spread-and-apply approach before hand? If so, then you could do spread-and-apply when the memory is available and fall back to the memory-efficient approach otherwise. the down-side is of course that this gives more code to maintain. Søren |
|
|
Re: FYI: bsxfun optimizedOn Tue, Oct 20, 2009 at 11:57 AM, Søren Hauberg <soren@...> wrote:
> tir, 20 10 2009 kl. 11:13 +0200, skrev Jaroslav Hajek: >> One further question that remains is whether to retain the old code, >> or replace it with the spread-and-apply approach above. >> The old code is more memory-efficient (it creates the result array and >> then updates it), but apparently slower, sometimes significantly. >> Opinions? > > Cool! I love these X-has-been-optimised mails :-) > > Would it be possible to test if the system has enough memory to use the > spread-and-apply approach before hand? If so, then you could do > spread-and-apply when the memory is available and fall back to the > memory-efficient approach otherwise. the down-side is of course that > this gives more code to maintain. > > Søren > > That's an interesting idea; however, I have no idea how to check for available memory, and I doubt there's a simple portable way. In fact I think most systems, including Linux, hide the physical memory details pretty well from applications. OTOH, it certainly is possible - cf. GNOME system monitor... In any case, I'll repeat my usual statement that if you're really short of memory, Octave (or Matlab, for that matter) isn't a good language to use at all, like probably any language that lacks a pass-by-reference semantics (even though otherwise it's so cool there's no mutable/immutable mess compared to, say, Python). -- RNDr. Jaroslav Hajek computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: FYI: bsxfun optimizedOn Tue, Oct 20, 2009 at 11:13 AM, Jaroslav Hajek <highegg@...> wrote:
> hi all, > I optimized bsxfun (binary singleton expansion function applier) to be > faster when some common built-in functions are given. The bsxfun > operations are also available from liboctave as bsxfun_add (NDArray, > NDArray) etc. > http://hg.savannah.gnu.org/hgweb/octave/rev/26abff55f6fe > > The following benchmark illustrates the speed-up (set n to suitable value) > > n = 200; > i = ones (1, n); > a = rand (n, n, 1); > b = rand (n, n, n); > tic; bsxfun (@plus, a, b); toc > a = rand (n, 1, 1); > tic; bsxfun (@minus, a, b); toc > a = rand (1, n, 1); > tic; bsxfun (@times, a, b); toc > > a = rand (n, n, 1); > b = rand (1, n, n); > tic; bsxfun (@rdivide, a, b); toc > > b = rand (1, 1, n); > tic; bsxfun (@lt, a, b); toc > > a = rand (2, n, n/2, n); # heavily multi-dim > b = rand (1, 1, n/2, n); > tic; bsxfun (@gt, a, b); toc > > on a Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native, > with a recent tip, I get: > > Elapsed time is 0.697962 seconds. > Elapsed time is 0.645682 seconds. > Elapsed time is 0.709098 seconds. > Elapsed time is 0.720162 seconds. > Elapsed time is 0.608029 seconds. > Elapsed time is 24.2397 seconds. > > with the new patch, I got > > Elapsed time is 0.0434361 seconds. > Elapsed time is 0.048116 seconds. > Elapsed time is 0.0503559 seconds. > Elapsed time is 0.0644929 seconds. > Elapsed time is 0.0164821 seconds. > Elapsed time is 0.098119 seconds. > > an interesting third option is a naive bsxfun replacement, which just > spreads the arrays to a common size and then applies the operator: > > function r = bsxfun (op, x, y) > nd = max (ndims (x), ndims (y)); > xidx = yidx = repmat({':'}, 1, nd); > for i = 1:nd > sx = size (x, i); > sy = size (y, i); > if (sx == 1) > xidx{i} = ones (1, sy); > endif > if (sy == 1) > yidx{i} = ones (1, sx); > endif > endfor > > r = op (x(xidx{:}), y(yidx{:})); > endfunction > > with this replacement, I get: > > Elapsed time is 0.0920911 seconds. > Elapsed time is 0.098748 seconds. > Elapsed time is 0.097363 seconds. > Elapsed time is 0.137549 seconds. > Elapsed time is 0.109806 seconds. > Elapsed time is 0.18157 seconds. > > of course, the memory consumption will be up to 3x as big, but in > general these are closer to their optimized counterparts than the > original code. > > Currently, only some common operations are covered, such as the basic > operations +,-,*,/, relations =,!=,<,>,<=,>= and pairwise min, max are > covered, and the optimizations only work if the operation is > homogeneous. More can be added if there is interest. > > One further question that remains is whether to retain the old code, > or replace it with the spread-and-apply approach above. > The old code is more memory-efficient (it creates the result array and > then updates it), but apparently slower, sometimes significantly. > Opinions? > > enjoy > > -- > RNDr. Jaroslav Hajek > computing expert & GNU Octave developer > Aeronautical Research and Test Institute (VZLU) > Prague, Czech Republic > url: www.highegg.matfyz.cz > Btw, as an example of usage, I optimized the "center" function - which removes the means from an array along a specified dimension, and is a typical example of a bsxfun-like operation... http://hg.savannah.gnu.org/hgweb/octave/rev/fb3543975ed9 benchmark: n = 4000; a = rand (n); tic; center (a); toc tic; center (a(:)); toc old: Elapsed time is 0.192539 seconds. Elapsed time is 0.210404 seconds. Elapsed time is 0.114803 seconds. new: Elapsed time is 0.102689 seconds. Elapsed time is 0.116499 seconds. Elapsed time is 0.113716 seconds. the vector case was already optimized before. In the matrix cases, the expected factor ~2 speed-up is seen (and it saves one temporary array). enjoy -- RNDr. Jaroslav Hajek computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: FYI: bsxfun optimizedOn 10/20/2009 03:10 AM, Jaroslav Hajek wrote:
> In fact I > think most systems, including Linux, hide the physical memory details > pretty well from applications. OTOH, it certainly is possible - cf. > GNOME system monitor... > On Linux systems cat /proc/meminfo tells you quite a lot. > In any case, I'll repeat my usual statement that if you're really > short of memory, Octave (or Matlab, for that matter) isn't a good > language to use at all, like probably any language that lacks a > pass-by-reference semantics (even though otherwise it's so cool > there's no mutable/immutable mess compared to, say, Python). > Agreed. In any case most systems these days provide, or at least allow large virtual address space so the only problem is excessive swapping due to limited real memory. Real memory is cheap these days... |
| Free embeddable forum powered by Nabble | Forum Help |