Improved triu.m and tril.m

View: New views
4 Messages — Rating Filter:   Alert me  

Improved triu.m and tril.m

by Marco Caliari-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear maintainers,

although triu and tril are built-in functions in Matlab and then much
faster, I discovered that it is possible to improve the m files in Octave,
by eliminating the min/max evaluations inside the loops and selecting
whether to replace numbers with zeros or zeros wih numbers.
The results, for my Octave 3.2.3, are:

> A=rand(100);
> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
i = -31
Elapsed time is 0.0221 seconds.
Elapsed time is 0.00397 seconds.
ans = 0
> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
i = -52
Elapsed time is 0.0224 seconds.
Elapsed time is 0.00114 seconds.
ans = 0
> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
i =  22
Elapsed time is 0.0174 seconds.
Elapsed time is 0.00495 seconds.
ans = 0
> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
i =  63
Elapsed time is 0.00873 seconds.
Elapsed time is 0.00261 seconds.
ans = 0
> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
i =  36
Elapsed time is 0.0146 seconds.
Elapsed time is 0.00413 seconds.
ans = 0

For me the speed-up is > 3. Enclosed, the patches wrt 3.2.3.

Best regards,

Marco
[triu.m.diff]

--- triu.m.orig 2009-10-22 17:36:28.000000000 +0200
+++ triu.m 2009-10-22 17:49:10.000000000 +0200
@@ -28,25 +28,70 @@
 
   if (nargin > 0)
     if (isstruct (x))
-       error ("tril: structure arrays not supported");
+       error ("triu: structure arrays not supported");
      endif
     [nr, nc] = size (x);
   endif
   if (nargin == 1)
     k = 0;
   elseif (nargin == 2)
-    if ((k > 0 && k > nc) || (k < 0 && k < -nr))
+    if ((k > 0 && k > nc) || (k < 0 && k < -nr+1))
       error ("triu: requested diagonal out of range");
     endif
   else
     print_usage ();
   endif
-
-  retval = resize (resize (x, 0), nr, nc);
-  for j = max (1, k+1) : nc
-    nr_limit = min (nr, j-k);
-    retval (1:nr_limit, j) = x (1:nr_limit, j);
-  endfor
+  if (nr == nc)
+    if (k <= 0)
+      retval = x;      
+      for j = 1 : nr-1+k
+ retval((j-1)*nr+j+1-k:j*nr) = 0;
+      endfor
+    else
+% k > 0
+      retval = zeros(size(x));
+      for j = nc : -1 : k+1
+ idx = (j-1)*nr+1:(j-1)*nr-k+j;
+ retval(idx) = x(idx);
+      endfor
+    endif
+  elseif (nr > nc)
+    if (k <= floor((1+nc-nr)/2))
+      retval = x;      
+      for j = 1 : min(nc,nr-1+k)
+ retval((j-1)*nr+j+1-k:j*nr) = 0;
+      endfor
+    else
+% k > floor((1+nc-nr)/2)
+      retval = zeros(size(x));
+      for j = nc : -1 : max(1,k+1)
+ idx = (j-1)*nr+1:(j-1)*nr-k+j;
+ retval(idx) = x(idx);
+      endfor
+    endif
+  else
+% nc > nr
+    if (k <= floor((1+nc-nr)/2))
+      retval = x;      
+      for j = 1 : k-1
+ retval((j-1)*nr+1:j*nr) = 0;
+      endfor
+      for j = max(1,k) : nr-1+k
+ retval((j-1)*nr+j+1-k:j*nr) = 0;
+      endfor
+    else
+% k > floor((1+nc-nr)/2)
+      retval = zeros(size(x));
+      for j = nc : -1 : k+nr+1
+ idx = (j-1)*nr+1:j*nr;
+ retval(idx) = x(idx);
+      end
+      for j = min(nc,k+nr) : -1 : k+1
+ idx = (j-1)*nr+1:(j-1)*nr-k+j;
+ retval(idx) = x(idx);
+      endfor
+    endif
+  endif
 
 endfunction
 


[tril.m.diff]

--- tril.m.orig 2009-10-22 17:36:30.000000000 +0200
+++ tril.m 2009-10-22 17:49:14.000000000 +0200
@@ -72,22 +72,66 @@
      endif
     [nr, nc] = size (x);
   endif
-
   if (nargin == 1)
     k = 0;
   elseif (nargin == 2)
-    if ((k > 0 && k > nc) || (k < 0 && k < -nr))
+    if ((k > 0 && k > nc-1) || (k < 0 && k < -nr))
       error ("tril: requested diagonal out of range");
     endif
   else
     print_usage ();
   endif
-
-  retval = resize (resize (x, 0), nr, nc);
-  for j = 1 : min (nc, nr+k)
-    nr_limit = max (1, j-k);
-    retval (nr_limit:nr, j) = x (nr_limit:nr, j);
-  endfor
+  if (nr == nc)
+    if (k < 0)
+      retval = zeros(size(x));      
+      for j = 1 : nr+k
+ idx = (j-1)*nr+j-k:j*nr;
+ retval(idx) = x(idx);
+      endfor
+    else
+% k >= 0
+      retval = x;
+      for j = nc : -1 : k+2
+ retval((j-1)*nr+1:(j-1)*nr-k-1+j) = 0;
+      endfor
+    endif
+  elseif (nr > nc)
+    if (k < floor((1+nc-nr)/2))
+      retval = zeros(size(x));      
+      for j = 1 : min(nc,nr+k)
+ idx = (j-1)*nr+j-k:j*nr;
+ retval(idx) = x(idx);
+      endfor
+    else
+% k >= floor((1+nc-nr)/2)
+      retval = x;
+      for j = nc : -1 : max(1,k+2)
+ retval((j-1)*nr+1:(j-1)*nr-k-1+j) = 0;
+      endfor
+    endif
+  else
+% nc > nr
+    if (k < floor((1+nc-nr)/2))
+      retval = zeros(size(x));      
+      for j = 1 : k
+ idx = (j-1)*nr+1:j*nr;
+ retval(idx) = x(idx);
+      endfor
+      for j = max(1,k+1) : nr+k
+ idx = (j-1)*nr+j-k:j*nr;
+ retval(idx) = x(idx);
+      endfor
+    else
+% k >= floor((1+nc-nr)/2)
+      retval = x;
+      for j = nc : -1 : k+nr+2
+ retval((j-1)*nr+1:j*nr) = 0;
+      end
+      for j = min(nc,k+1+nr) : -1 : k+2
+ retval((j-1)*nr+1:(j-1)*nr-k-1+j) = 0;
+      endfor
+    endif
+  endif
 
 endfunction
 
@@ -109,4 +153,3 @@
 %!error tril ();
 
 %!error tril (1, 2, 3);
-


Re: Improved triu.m and tril.m

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/10/22 Marco Caliari <marco.caliari@...>:

> Dear maintainers,
>
> although triu and tril are built-in functions in Matlab and then much
> faster, I discovered that it is possible to improve the m files in Octave,
> by eliminating the min/max evaluations inside the loops and selecting
> whether to replace numbers with zeros or zeros wih numbers.
> The results, for my Octave 3.2.3, are:
>
>> A=rand(100);
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = -31
> Elapsed time is 0.0221 seconds.
> Elapsed time is 0.00397 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = -52
> Elapsed time is 0.0224 seconds.
> Elapsed time is 0.00114 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i =  22
> Elapsed time is 0.0174 seconds.
> Elapsed time is 0.00495 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i =  63
> Elapsed time is 0.00873 seconds.
> Elapsed time is 0.00261 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i =  36
> Elapsed time is 0.0146 seconds.
> Elapsed time is 0.00413 seconds.
> ans = 0
>
> For me the speed-up is > 3. Enclosed, the patches wrt 3.2.3.
>
> Best regards,
>
> Marco

Interesting. For comparison, here is my benchmark:

n = 500;
A = rand (n);
k = 1;
tril (eye (2)); # to avoid startup penalty
for iter = 1:4
  tic; tril (A, k); toc
endfor

my tip gives:

Elapsed time is 0.0236731 seconds.
Elapsed time is 0.0239391 seconds.
Elapsed time is 0.0221 seconds.
Elapsed time is 0.022387 seconds.

the relevant code snippet is

  retval = resize (resize (x, 0), nr, nc);
  for j = 1 : min (nc, nr+k)
    nr_limit = max (1, j-k);
    retval (nr_limit:nr, j) = x (nr_limit:nr, j);
  endfor

one can save a lot by doing a vectorized max once and eliminating the
repeated range:

  retval = zeros (nr, nc, class (x));
  jj = 1:min (nc, nr+k);
  ii = max (1, jj - k);
  for j = jj
    i = ii(j):nr;
    retval (i, j) = x (i, j);
  endfor

this gives:

Elapsed time is 0.0148931 seconds.
Elapsed time is 0.0148301 seconds.
Elapsed time is 0.0152121 seconds.
Elapsed time is 0.014981 seconds.

using cellslices + vertcat + reshape is faster still:

  m = min (nc, nr+k);
  ii = max (1, (1:m) - k);

  sl(2,:) = cellslices (x(:), ii, nr*ones (1, m));
  sl(1,:) = cellslices (zeros (nr, 1), ones (1, m), ii - 1);

  retval = reshape (vertcat (sl{:}), nr, nc);

gives

Elapsed time is 0.00466204 seconds.
Elapsed time is 0.00521612 seconds.
Elapsed time is 0.00525379 seconds.
Elapsed time is 0.00514317 seconds.

for comparison, your patch:

Elapsed time is 0.00818515 seconds.
Elapsed time is 0.00864601 seconds.
Elapsed time is 0.00886297 seconds.
Elapsed time is 0.00826406 seconds.

(quite good, compared to the loop-free code). Winner is number 3.
triu can be handled similarly. Neither of these approaches works
efficiently for sparse functions, though, so it should get a special
code (probably using find). But anyway, I think implementing
triu/tril/vech as compiled functions should be trivial and likely the
fastest.

Out of interest, do you depend on triu/tril efficiency in an application?

regards

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


Re: Improved triu.m and tril.m

by Marco Caliari-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> using cellslices + vertcat + reshape is faster still:
>
>  m = min (nc, nr+k);
>  ii = max (1, (1:m) - k);
>
>  sl(2,:) = cellslices (x(:), ii, nr*ones (1, m));
>  sl(1,:) = cellslices (zeros (nr, 1), ones (1, m), ii - 1);
>
>  retval = reshape (vertcat (sl{:}), nr, nc);

Great, even if, it does not work for me... I think the problem is related
to what I get with 3.2.3

octave:1> cellslices([1,2],1,0)
error: subscript indices must be either positive integers or logicals.

Maybe the result should be

{
   [1,1] = [](1x0)
}

Also this appears strange to me

octave:1> cellslices([1,2,3],[2,3],[1,1])
error: invalid range used as index.

> Out of interest, do you depend on triu/tril efficiency in an application?

Not really. I had a code with fft and matrix manipulations and it was
slower in Octave than in Matlab just because of a final triu.

Cheers,

Marco

Re: Improved triu.m and tril.m

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 23, 2009 at 9:52 AM, Marco Caliari <marco.caliari@...> wrote:

>> using cellslices + vertcat + reshape is faster still:
>>
>>  m = min (nc, nr+k);
>>  ii = max (1, (1:m) - k);
>>
>>  sl(2,:) = cellslices (x(:), ii, nr*ones (1, m));
>>  sl(1,:) = cellslices (zeros (nr, 1), ones (1, m), ii - 1);
>>
>>  retval = reshape (vertcat (sl{:}), nr, nc);
>
> Great, even if, it does not work for me... I think the problem is related to
> what I get with 3.2.3
>
> octave:1> cellslices([1,2],1,0)
> error: subscript indices must be either positive integers or logicals.
>

Yes, this was already fixed in development tip.

> Maybe the result should be
>
> {
>  [1,1] = [](1x0)
> }
>
> Also this appears strange to me
>
> octave:1> cellslices([1,2,3],[2,3],[1,1])
> error: invalid range used as index.
>

Huh. One more bug. A fix is here:
http://hg.savannah.gnu.org/hgweb/octave/rev/95ad9c2a27e2


>> Out of interest, do you depend on triu/tril efficiency in an application?
>
> Not really. I had a code with fft and matrix manipulations and it was slower
> in Octave than in Matlab just because of a final triu.
>
> Cheers,
>
> Marco
>

Based on a code sent today by David Bateman, I implemented a compiled
(and extended) version of triu/tril:
http://hg.savannah.gnu.org/hgweb/octave/rev/b134960cea23

so you may check that out. Btw, If you want to help improve Octave's
performance by measuring speed and benchmarks, I'd strongly recommend
you work with the development sources.

regards

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz