Faster Array transpose

View: New views
14 Messages — Rating Filter:   Alert me  

Faster Array transpose

by dbateman3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The array transpose in Octave uses an extremely simplistic method

template <class T>
Array<T>
Array<T>::transpose (void) const
{
  assert (ndims () == 2);

  octave_idx_type nr = dim1 ();
  octave_idx_type nc = dim2 ();

  if (nr > 1 && nc > 1)
    {
      Array<T> result (dim_vector (nc, nr));

      for (octave_idx_type j = 0; j < nc; j++)
    for (octave_idx_type i = 0; i < nr; i++)
      result.xelem (j, i) = xelem (i, j);

      return result;
    }
  else
    {
      // Fast transpose for vectors and empty matrices
      return Array<T> (*this, dim_vector (nc, nr));
    }
}

This has poor performance due to the number of cache missings. I know
there were discussions about flagging a transposed matrix rather than
actually doing it to use with the BLAS functions, but the overhead was
felt to be too high. If we can't do that can we at least use a method like

template <class T>
Array<T>
Array<T>::transpose (void) const
{
  assert (ndims () == 2);

  octave_idx_type nr = dim1 ();
  octave_idx_type nc = dim2 ();

  if (nr > 1 && nc > 1)
    {
      Array<T> result (dim_vector (nc, nr));

      // Blocked transpose to attempt to avoid cache misses.
#define TRANSN 8

      // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool
      // on some compilers.
      Array<T> tmp (TRANSN * TRANSN);
      T *buf = tmp.fortran_vec ();

      octave_idx_type ii = 0, jj;
      for (jj = 0; jj < (nc - TRANSN + 1); jj += TRANSN)
    {
      for (ii = 0; ii < (nr - TRANSN + 1); ii += TRANSN)
        {
          // Copy to buffer
          for (octave_idx_type j = jj, k = 0; j < jj + TRANSN; j++)
        for (octave_idx_type i = ii; i < ii + TRANSN; i++)
          buf[k++] = xelem (i, j);

          // Copy from buffer
          for (octave_idx_type i = ii; i < ii + TRANSN; i++)
        for (octave_idx_type j = jj, k = 0; j < jj + TRANSN;
             j++, k+=TRANSN)
          result.xelem (j, i) = buf[i + k];
        }

      if (ii < nr)
        for (octave_idx_type j = jj; j < jj + TRANSN; j++)
          for (octave_idx_type i = ii; i < nr; i++)
        result.xelem (j, i) = xelem (i, j);
    }

      for (octave_idx_type j = jj; j < nc; j++)
    for (octave_idx_type i = ii; i < nr; i++)
      result.xelem (j, i) = xelem (i, j);

#undef TRANSN

      return result;
    }
  else
    {
      // Fast transpose for vectors and empty matrices
      return Array<T> (*this, dim_vector (nc, nr));
    }
}

That copies to a tile and then recopies to the result matrix. This
reduces the cache misses by a factor of 2/TRANSN at the cost of an
additional copy. For TRANSN=8 about and using the script

N = [128, 129, 1024, 1025, 2048, 2049, 4096, 4097];
nruns = 10;

t = zeros (1, length (N));
for i = 1: length (N)
  A = randn (N(i), N(i));

  for j = 1: nruns
    t0 = cputime ();
    B = A .';
    t(i) = t(i) + cputime() - t0;
  endfor
  t (i) = t (i) ./ nruns;

  printf("N = %4d, time = %g sec\n", N(i), t(i));
  fflush (stdout);
endfor
 
I see the times

N =  128, time = 0.0029999 sec
N =  129, time = 0.0003333 sec
N = 1024, time = 0.0279981 sec
N = 1025, time = 0.0219985 sec
N = 2048, time = 0.234985 sec
N = 2049, time = 0.0796615 sec
N = 4096, time = 0.957271 sec
N = 4097, time = 0.41564 sec

before this change and

N =  128, time = 0.0026665 sec
N =  129, time = 0.0003333 sec
N = 1024, time = 0.010666 sec
N = 1025, time = 0.0073329 sec
N = 2048, time = 0.0849944 sec
N = 2049, time = 0.0563296 sec
N = 4096, time = 0.339311 sec
N = 4097, time = 0.301647 sec

after. So at least on my machine such a change seems to be a consistent
win in terms of the performance of the transpose. I made no attempt to
really optimize the above code and perhaps larger values of TRANSN will
given even better results. We might also directory reference the data
rather than use the xelem method to access it for further speed as we
can partly do the index calculation in advance. Something similar might
also be done with the hermitian method as well.

Regards
David


Re: Faster Array transpose

by dbateman3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I think the code below is a good compromise for this function. Sorry I
can't easily create a changeset as thare are other uncomitted changes in
Array.cc in by repository at the moment

D.


template <class T>
Array<T>
Array<T>::transpose (void) const
{
  assert (ndims () == 2);

  octave_idx_type nr = dim1 ();
  octave_idx_type nc = dim2 ();

  if (nr >= 8 && nc >= 8)
    {
      Array<T> result (dim_vector (nc, nr));

      // Blocked transpose to attempt to avoid cache misses.

      // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool
      // on some compilers.
      Array<T> tmp (64);
      T *buf = tmp.fortran_vec ();

      octave_idx_type ii = 0, jj;
      for (jj = 0; jj < (nc - 8 + 1); jj += 8)
        {
          for (ii = 0; ii < (nr - 8 + 1); ii += 8)
            {
              // Copy to buffer
              for (octave_idx_type j = jj, k = 0, idxj = jj * nr;
                   j < jj + 8; j++, idxj += nr)
                for (octave_idx_type i = ii; i < ii + 8; i++)
                  buf[k++] = xelem (i + idxj);

              // Copy from buffer
              for (octave_idx_type i = ii, idxi = ii * nc; i < ii + 8;
                   i++, idxi += nc)
                for (octave_idx_type j = jj, k = i; j < jj + 8;
                     j++, k+=8)
                  result.xelem (j + idxi) = buf[k];
            }

          if (ii < nr)
            for (octave_idx_type j = jj; j < jj + 8; j++)
              for (octave_idx_type i = ii; i < nr; i++)
                result.xelem (j, i) = xelem (i, j);
        }

      for (octave_idx_type j = jj; j < nc; j++)
        for (octave_idx_type i = ii; i < nr; i++)
          result.xelem (j, i) = xelem (i, j);

      return result;
    }
  else if (nr > 1 && nc > 1)
    {
      Array<T> result (dim_vector (nc, nr));

      for (octave_idx_type j = 0; j < nc; j++)
        for (octave_idx_type i = 0; i < nr; i++)
          result.xelem (j, i) = xelem (i, j);

      return result;
    }
  else
    {
      // Fast transpose for vectors and empty matrices
      return Array<T> (*this, dim_vector (nc, nr));
    }
}


[Changeset] Re: Faster Array transpose

by dbateman3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David Bateman wrote:
> I think the code below is a good compromise for this function. Sorry I
> can't easily create a changeset as thare are other uncomitted changes in
> Array.cc in by repository at the moment
>
> D.

It wasn't correct.. I've written this up in my attempt to clean up the
Array class for the integration of the single precision type.. The
Changeset is attached, but I suspect it might not apply without the rest
of the single precision changes (which I consider pretty much complete
in themselves). With this changes the transpose and hermitian methods go
into the Array class, with only a one specialization for the DiagArray2
class. With this change "make check" passes correctly and the speed
difference with the bench mark

N = [128, 129, 1024, 1025, 2048, 2049, 4096, 4097];
nruns = 10;

t = zeros (1, length (N));
for i = 1: length (N)
  A = 1i*randn (N(i), N(i));

  for j = 1: nruns
    t0 = cputime ();
    B = A';
    t(i) = t(i) + cputime() - t0;
  endfor
  t (i) = t (i) ./ nruns;

  printf("N = %4d, time = %g sec\n", N(i), t(i));
  fflush (stdout);
endfor

I got previous without the patch

N =  128, time = 0.0016666 sec
N =  129, time = 0.0003333 sec
N = 1024, time = 0.0439971 sec
N = 1025, time = 0.0336645 sec
N = 2048, time = 0.276649 sec
N = 2049, time = 0.172656 sec
N = 4096, time = 1.19492 sec
N = 4097, time = 0.700954 sec

and with it

N =  128, time = 0 sec
N =  129, time = 0 sec
N = 1024, time = 0.0319979 sec
N = 1025, time = 0.0243318 sec
N = 2048, time = 0.170655 sec
N = 2049, time = 0.117326 sec
N = 4096, time = 0.745285 sec
N = 4097, time = 0.558964 sec

and similar improvements for the hermitian operator.

D.





# HG changeset patch
# User David Bateman <dbateman@...>
# Date 1209938758 -7200
# Node ID 5ca24a40998f84f8d693aa8baeb3b85b1b06933c
# Parent  2442bbe5a6932984c8e9ec7378954f4e218e621d
Cache optimized hermitian/transpose methods

diff --git a/liboctave/Array.cc b/liboctave/Array.cc
--- a/liboctave/Array.cc
+++ b/liboctave/Array.cc
@@ -1203,7 +1203,48 @@ Array<T>::transpose (void) const
   octave_idx_type nr = dim1 ();
   octave_idx_type nc = dim2 ();
 
-  if (nr > 1 && nc > 1)
+  if (nr >= 8 && nc >= 8)
+    {
+      Array<T> result (dim_vector (nc, nr));
+
+      // Blocked transpose to attempt to avoid cache misses.
+
+      // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool
+      // on some compilers.
+      T buf[64];
+
+      octave_idx_type ii = 0, jj;
+      for (jj = 0; jj < (nc - 8 + 1); jj += 8)
+ {
+  for (ii = 0; ii < (nr - 8 + 1); ii += 8)
+    {
+      // Copy to buffer
+      for (octave_idx_type j = jj, k = 0, idxj = jj * nr;
+   j < jj + 8; j++, idxj += nr)
+ for (octave_idx_type i = ii; i < ii + 8; i++)
+  buf[k++] = xelem (i + idxj);
+
+      // Copy from buffer
+      for (octave_idx_type i = ii, idxi = ii * nc; i < ii + 8;
+   i++, idxi += nc)
+ for (octave_idx_type j = jj, k = i - ii; j < jj + 8;
+     j++, k+=8)
+  result.xelem (j + idxi) = buf[k];
+    }
+
+  if (ii < nr)
+    for (octave_idx_type j = jj; j < jj + 8; j++)
+      for (octave_idx_type i = ii; i < nr; i++)
+ result.xelem (j, i) = xelem (i, j);
+ }
+
+      for (octave_idx_type j = jj; j < nc; j++)
+ for (octave_idx_type i = 0; i < nr; i++)
+  result.xelem (j, i) = xelem (i, j);
+
+      return result;
+    }
+  else if (nr > 1 && nc > 1)
     {
       Array<T> result (dim_vector (nc, nr));
 
@@ -1217,6 +1258,68 @@ Array<T>::transpose (void) const
     {
       // Fast transpose for vectors and empty matrices
       return Array<T> (*this, dim_vector (nc, nr));
+    }
+}
+
+template <class T>
+Array<T>
+Array<T>::hermitian (T (*fcn) (const T&)) const
+{
+  assert (ndims () == 2);
+
+  octave_idx_type nr = dim1 ();
+  octave_idx_type nc = dim2 ();
+
+  if (nr >= 8 && nc >= 8)
+    {
+      Array<T> result (dim_vector (nc, nr));
+
+      // Blocked transpose to attempt to avoid cache misses.
+
+      // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool
+      // on some compilers.
+      T buf[64];
+
+      octave_idx_type ii = 0, jj;
+      for (jj = 0; jj < (nc - 8 + 1); jj += 8)
+ {
+  for (ii = 0; ii < (nr - 8 + 1); ii += 8)
+    {
+      // Copy to buffer
+      for (octave_idx_type j = jj, k = 0, idxj = jj * nr;
+   j < jj + 8; j++, idxj += nr)
+ for (octave_idx_type i = ii; i < ii + 8; i++)
+  buf[k++] = xelem (i + idxj);
+
+      // Copy from buffer
+      for (octave_idx_type i = ii, idxi = ii * nc; i < ii + 8;
+   i++, idxi += nc)
+ for (octave_idx_type j = jj, k = i - ii; j < jj + 8;
+     j++, k+=8)
+  result.xelem (j + idxi) = fcn (buf[k]);
+    }
+
+  if (ii < nr)
+    for (octave_idx_type j = jj; j < jj + 8; j++)
+      for (octave_idx_type i = ii; i < nr; i++)
+ result.xelem (j, i) = fcn (xelem (i, j));
+ }
+
+      for (octave_idx_type j = jj; j < nc; j++)
+ for (octave_idx_type i = 0; i < nr; i++)
+  result.xelem (j, i) = fcn (xelem (i, j));
+
+      return result;
+    }
+  else
+    {
+      Array<T> result (dim_vector (nc, nr));
+
+      for (octave_idx_type j = 0; j < nc; j++)
+ for (octave_idx_type i = 0; i < nr; i++)
+  result.xelem (j, i) = fcn (xelem (i, j));
+
+      return result;
     }
 }
 
diff --git a/liboctave/Array.h b/liboctave/Array.h
--- a/liboctave/Array.h
+++ b/liboctave/Array.h
@@ -461,6 +461,7 @@ public:
   bool is_empty (void) const { return numel () == 0; }
 
   Array<T> transpose (void) const;
+  Array<T> hermitian (T (*fcn) (const T&) = 0) const;
 
   const T *data (void) const { return rep->data; }
 
diff --git a/liboctave/Array2.h b/liboctave/Array2.h
--- a/liboctave/Array2.h
+++ b/liboctave/Array2.h
@@ -109,6 +109,12 @@ public:
       return Array2<T> (tmp, tmp.rows (), tmp.columns ());
     }
 
+  Array2<T> hermitian (T (*fcn) (const T&) = 0) const
+    {
+      Array<T> tmp = Array<T>::hermitian (fcn);
+      return Array2<T> (tmp, tmp.rows (), tmp.columns ());
+    }
+
   Array2<T> index (idx_vector& i, int resize_ok = 0,
    const T& rfv = resize_fill_value (T ())) const
     {
diff --git a/liboctave/ArrayN.h b/liboctave/ArrayN.h
--- a/liboctave/ArrayN.h
+++ b/liboctave/ArrayN.h
@@ -102,6 +102,7 @@ public:
   ArrayN<T> squeeze (void) const { return Array<T>::squeeze (); }
 
   ArrayN<T> transpose (void) const { return Array<T>::transpose (); }
+  ArrayN<T> hermitian (T (*fcn) (const T&) = 0) const { return Array<T>::hermitian (fcn); }
 
   ArrayN<T>& insert (const ArrayN<T>& a, const dim_vector& dv)
     {
diff --git a/liboctave/CColVector.cc b/liboctave/CColVector.cc
--- a/liboctave/CColVector.cc
+++ b/liboctave/CColVector.cc
@@ -221,17 +221,16 @@ ComplexColumnVector::stack (const Comple
   return retval;
 }
 
-ComplexRowVector
+ComplexRowVector
 ComplexColumnVector::hermitian (void) const
-{
-  octave_idx_type len = length ();
-  return ComplexRowVector (mx_inline_conj_dup (data (), len), len);
+{
+  return MArray<Complex>::hermitian (std::conj);
 }
 
 ComplexRowVector
 ComplexColumnVector::transpose (void) const
 {
-  return ComplexRowVector (*this);
+  return MArray<Complex>::transpose ();
 }
 
 ComplexColumnVector
diff --git a/liboctave/CColVector.h b/liboctave/CColVector.h
--- a/liboctave/CColVector.h
+++ b/liboctave/CColVector.h
@@ -72,7 +72,7 @@ public:
   ComplexColumnVector stack (const ColumnVector& a) const;
   ComplexColumnVector stack (const ComplexColumnVector& a) const;
 
-  ComplexRowVector hermitian (void) const;  // complex conjugate transpose.
+  ComplexRowVector hermitian (void) const;
   ComplexRowVector transpose (void) const;
 
   friend ComplexColumnVector conj (const ComplexColumnVector& a);
diff --git a/liboctave/CDiagMatrix.cc b/liboctave/CDiagMatrix.cc
--- a/liboctave/CDiagMatrix.cc
+++ b/liboctave/CDiagMatrix.cc
@@ -230,20 +230,6 @@ ComplexDiagMatrix::fill (const ComplexRo
     elem (i+beg, i+beg) = a.elem (i);
 
   return *this;
-}
-
-ComplexDiagMatrix
-ComplexDiagMatrix::hermitian (void) const
-{
-  return ComplexDiagMatrix (mx_inline_conj_dup (data (), length ()),
-    cols (), rows ());
-}
-
-ComplexDiagMatrix
-ComplexDiagMatrix::transpose (void) const
-{
-  return ComplexDiagMatrix (mx_inline_dup (data (), length ()),
-    cols (), rows ());
 }
 
 ComplexDiagMatrix
diff --git a/liboctave/CDiagMatrix.h b/liboctave/CDiagMatrix.h
--- a/liboctave/CDiagMatrix.h
+++ b/liboctave/CDiagMatrix.h
@@ -87,8 +87,8 @@ public:
   ComplexDiagMatrix& fill (const RowVector& a, octave_idx_type beg);
   ComplexDiagMatrix& fill (const ComplexRowVector& a, octave_idx_type beg);
 
-  ComplexDiagMatrix hermitian (void) const;  // complex conjugate transpose
-  ComplexDiagMatrix transpose (void) const;
+  ComplexDiagMatrix hermitian (void) const { return MDiagArray2<Complex>::hermitian (std::conj); }
+  ComplexDiagMatrix transpose (void) const { return MDiagArray2<Complex>::transpose(); }
 
   friend ComplexDiagMatrix conj (const ComplexDiagMatrix& a);
 
diff --git a/liboctave/CMatrix.cc b/liboctave/CMatrix.cc
--- a/liboctave/CMatrix.cc
+++ b/liboctave/CMatrix.cc
@@ -882,22 +882,6 @@ ComplexMatrix::stack (const ComplexDiagM
   retval.insert (*this, 0, 0);
   retval.insert (a, nr_insert, 0);
   return retval;
-}
-
-ComplexMatrix
-ComplexMatrix::hermitian (void) const
-{
-  octave_idx_type nr = rows ();
-  octave_idx_type nc = cols ();
-  ComplexMatrix result;
-  if (length () > 0)
-    {
-      result.resize (nc, nr);
-      for (octave_idx_type j = 0; j < nc; j++)
- for (octave_idx_type i = 0; i < nr; i++)
-  result.elem (j, i) = conj (elem (i, j));
-    }
-  return result;
 }
 
 ComplexMatrix
diff --git a/liboctave/CMatrix.h b/liboctave/CMatrix.h
--- a/liboctave/CMatrix.h
+++ b/liboctave/CMatrix.h
@@ -126,7 +126,8 @@ public:
   ComplexMatrix stack (const ComplexColumnVector& a) const;
   ComplexMatrix stack (const ComplexDiagMatrix& a) const;
 
-  ComplexMatrix hermitian (void) const;  // complex conjugate transpose
+  ComplexMatrix hermitian (void) const
+    { return MArray2<Complex>::hermitian (std::conj); }
   ComplexMatrix transpose (void) const
     { return MArray2<Complex>::transpose (); }
 
diff --git a/liboctave/CRowVector.cc b/liboctave/CRowVector.cc
--- a/liboctave/CRowVector.cc
+++ b/liboctave/CRowVector.cc
@@ -227,14 +227,13 @@ ComplexColumnVector
 ComplexColumnVector
 ComplexRowVector::hermitian (void) const
 {
-  octave_idx_type len = length ();
-  return ComplexColumnVector (mx_inline_conj_dup (data (), len), len);
+  return MArray<Complex>::hermitian (std::conj);
 }
 
 ComplexColumnVector
 ComplexRowVector::transpose (void) const
 {
-  return ComplexColumnVector (*this);
+  return MArray<Complex>::transpose ();
 }
 
 ComplexRowVector
diff --git a/liboctave/CRowVector.h b/liboctave/CRowVector.h
--- a/liboctave/CRowVector.h
+++ b/liboctave/CRowVector.h
@@ -70,7 +70,7 @@ public:
   ComplexRowVector append (const RowVector& a) const;
   ComplexRowVector append (const ComplexRowVector& a) const;
 
-  ComplexColumnVector hermitian (void) const;  // complex conjugate transpose.
+  ComplexColumnVector hermitian (void) const;
   ComplexColumnVector transpose (void) const;
 
   friend ComplexRowVector conj (const ComplexRowVector& a);
diff --git a/liboctave/DiagArray2.cc b/liboctave/DiagArray2.cc
--- a/liboctave/DiagArray2.cc
+++ b/liboctave/DiagArray2.cc
@@ -33,6 +33,27 @@ along with Octave; see the file COPYING.
 #include "DiagArray2.h"
 
 #include "lo-error.h"
+
+template <class T>
+DiagArray2<T>
+DiagArray2<T>::transpose (void) const
+{
+  DiagArray2<T> retval (*this);
+  retval.dimensions = dim_vector (this->dim2 (), this->dim1 ());
+  return retval;
+}
+
+template <class T>
+DiagArray2<T>
+DiagArray2<T>::hermitian (T (* fcn) (const T&)) const
+{
+  DiagArray2<T> retval (this->dim2 (), this->dim1 ());
+  const T *p = this->data ();
+  T *q = retval.fortran_vec ();
+  for (octave_idx_type i = 0; i < this->length (); i++)
+    q [i] = fcn (p [i]);
+  return retval;
+}
 
 // A two-dimensional array with diagonal elements only.
 
diff --git a/liboctave/DiagArray2.h b/liboctave/DiagArray2.h
--- a/liboctave/DiagArray2.h
+++ b/liboctave/DiagArray2.h
@@ -180,6 +180,9 @@ public:
   void resize (octave_idx_type n, octave_idx_type m, const T& val);
 
   void maybe_delete_elements (idx_vector& i, idx_vector& j);
+
+  DiagArray2<T> transpose (void) const;
+  DiagArray2<T> hermitian (T (*fcn) (const T&) = 0) const;
 };
 
 #endif
diff --git a/liboctave/MArray.h b/liboctave/MArray.h
--- a/liboctave/MArray.h
+++ b/liboctave/MArray.h
@@ -63,6 +63,9 @@ public:
       return *this;
     }
 
+  MArray<T> transpose (void) const { return Array<T>::transpose (); }
+  MArray<T> hermitian (T (*fcn) (const T&) = 0) const { return Array<T>::hermitian (fcn); }
+
   octave_idx_type nnz (void) const
     {
       octave_idx_type retval = 0;
diff --git a/liboctave/MArray2.h b/liboctave/MArray2.h
--- a/liboctave/MArray2.h
+++ b/liboctave/MArray2.h
@@ -80,6 +80,7 @@ public:
   }
 
   MArray2<T> transpose (void) const { return Array2<T>::transpose (); }
+  MArray2<T> hermitian (T (*fcn) (const T&) = 0) const { return Array2<T>::hermitian (fcn); }
 
   MArray2<T> diag (octave_idx_type k) const
   {
diff --git a/liboctave/MDiagArray2.h b/liboctave/MDiagArray2.h
--- a/liboctave/MDiagArray2.h
+++ b/liboctave/MDiagArray2.h
@@ -81,6 +81,9 @@ public:
       return retval;
     }
 
+  MDiagArray2<T> transpose (void) const { return DiagArray2<T>::transpose (); }
+  MDiagArray2<T> hermitian (T (*fcn) (const T&) = 0) const { return DiagArray2<T>::hermitian (fcn); }
+
   static MDiagArray2<T> nil_array;
 
   // Currently, the OPS functions don't need to be friends, but that
diff --git a/liboctave/dColVector.cc b/liboctave/dColVector.cc
--- a/liboctave/dColVector.cc
+++ b/liboctave/dColVector.cc
@@ -142,7 +142,7 @@ RowVector
 RowVector
 ColumnVector::transpose (void) const
 {
-  return RowVector (*this);
+  return MArray<double>::transpose();
 }
 
 ColumnVector
diff --git a/liboctave/dDiagMatrix.cc b/liboctave/dDiagMatrix.cc
--- a/liboctave/dDiagMatrix.cc
+++ b/liboctave/dDiagMatrix.cc
@@ -136,12 +136,6 @@ DiagMatrix::fill (const RowVector& a, oc
     elem (i+beg, i+beg) = a.elem (i);
 
   return *this;
-}
-
-DiagMatrix
-DiagMatrix::transpose (void) const
-{
-  return DiagMatrix (mx_inline_dup (data (), length ()), cols (), rows ());
 }
 
 DiagMatrix
diff --git a/liboctave/dDiagMatrix.h b/liboctave/dDiagMatrix.h
--- a/liboctave/dDiagMatrix.h
+++ b/liboctave/dDiagMatrix.h
@@ -70,7 +70,7 @@ public:
   DiagMatrix& fill (const ColumnVector& a, octave_idx_type beg);
   DiagMatrix& fill (const RowVector& a, octave_idx_type beg);
 
-  DiagMatrix transpose (void) const;
+  DiagMatrix transpose (void) const { return MDiagArray2<double>::transpose(); }
 
   friend OCTAVE_API DiagMatrix real (const ComplexDiagMatrix& a);
   friend OCTAVE_API DiagMatrix imag (const ComplexDiagMatrix& a);
diff --git a/liboctave/dRowVector.cc b/liboctave/dRowVector.cc
--- a/liboctave/dRowVector.cc
+++ b/liboctave/dRowVector.cc
@@ -144,7 +144,7 @@ ColumnVector
 ColumnVector
 RowVector::transpose (void) const
 {
-  return ColumnVector (*this);
+  return MArray<double>::transpose();
 }
 
 RowVector
diff --git a/liboctave/fCColVector.cc b/liboctave/fCColVector.cc
--- a/liboctave/fCColVector.cc
+++ b/liboctave/fCColVector.cc
@@ -221,17 +221,16 @@ FloatComplexColumnVector::stack (const F
   return retval;
 }
 
-FloatComplexRowVector
+FloatComplexRowVector
 FloatComplexColumnVector::hermitian (void) const
 {
-  octave_idx_type len = length ();
-  return FloatComplexRowVector (mx_inline_conj_dup (data (), len), len);
-}
-
-FloatComplexRowVector
+  return MArray<FloatComplex>::hermitian (std::conj);
+}
+
+FloatComplexRowVector
 FloatComplexColumnVector::transpose (void) const
 {
-  return FloatComplexRowVector (*this);
+  return MArray<FloatComplex>::transpose ();
 }
 
 FloatComplexColumnVector
diff --git a/liboctave/fCColVector.h b/liboctave/fCColVector.h
--- a/liboctave/fCColVector.h
+++ b/liboctave/fCColVector.h
@@ -72,7 +72,7 @@ public:
   FloatComplexColumnVector stack (const FloatColumnVector& a) const;
   FloatComplexColumnVector stack (const FloatComplexColumnVector& a) const;
 
-  FloatComplexRowVector hermitian (void) const;  // complex conjugate transpose.
+  FloatComplexRowVector hermitian (void) const;
   FloatComplexRowVector transpose (void) const;
 
   friend FloatComplexColumnVector conj (const FloatComplexColumnVector& a);
diff --git a/liboctave/fCDiagMatrix.cc b/liboctave/fCDiagMatrix.cc
--- a/liboctave/fCDiagMatrix.cc
+++ b/liboctave/fCDiagMatrix.cc
@@ -230,20 +230,6 @@ FloatComplexDiagMatrix::fill (const Floa
     elem (i+beg, i+beg) = a.elem (i);
 
   return *this;
-}
-
-FloatComplexDiagMatrix
-FloatComplexDiagMatrix::hermitian (void) const
-{
-  return FloatComplexDiagMatrix (mx_inline_conj_dup (data (), length ()),
-    cols (), rows ());
-}
-
-FloatComplexDiagMatrix
-FloatComplexDiagMatrix::transpose (void) const
-{
-  return FloatComplexDiagMatrix (mx_inline_dup (data (), length ()),
-    cols (), rows ());
 }
 
 FloatComplexDiagMatrix
diff --git a/liboctave/fCDiagMatrix.h b/liboctave/fCDiagMatrix.h
--- a/liboctave/fCDiagMatrix.h
+++ b/liboctave/fCDiagMatrix.h
@@ -87,8 +87,8 @@ public:
   FloatComplexDiagMatrix& fill (const FloatRowVector& a, octave_idx_type beg);
   FloatComplexDiagMatrix& fill (const FloatComplexRowVector& a, octave_idx_type beg);
 
-  FloatComplexDiagMatrix hermitian (void) const;  // complex conjugate transpose
-  FloatComplexDiagMatrix transpose (void) const;
+  FloatComplexDiagMatrix hermitian (void) const { return MDiagArray2<FloatComplex>::hermitian (std::conj); }
+  FloatComplexDiagMatrix transpose (void) const { return MDiagArray2<FloatComplex>::transpose(); }
 
   friend FloatComplexDiagMatrix conj (const FloatComplexDiagMatrix& a);
 
diff --git a/liboctave/fCMatrix.cc b/liboctave/fCMatrix.cc
--- a/liboctave/fCMatrix.cc
+++ b/liboctave/fCMatrix.cc
@@ -876,22 +876,6 @@ FloatComplexMatrix::stack (const FloatCo
   retval.insert (*this, 0, 0);
   retval.insert (a, nr_insert, 0);
   return retval;
-}
-
-FloatComplexMatrix
-FloatComplexMatrix::hermitian (void) const
-{
-  octave_idx_type nr = rows ();
-  octave_idx_type nc = cols ();
-  FloatComplexMatrix result;
-  if (length () > 0)
-    {
-      result.resize (nc, nr);
-      for (octave_idx_type j = 0; j < nc; j++)
- for (octave_idx_type i = 0; i < nr; i++)
-  result.elem (j, i) = conj (elem (i, j));
-    }
-  return result;
 }
 
 FloatComplexMatrix
diff --git a/liboctave/fCMatrix.h b/liboctave/fCMatrix.h
--- a/liboctave/fCMatrix.h
+++ b/liboctave/fCMatrix.h
@@ -126,7 +126,8 @@ public:
   FloatComplexMatrix stack (const FloatComplexColumnVector& a) const;
   FloatComplexMatrix stack (const FloatComplexDiagMatrix& a) const;
 
-  FloatComplexMatrix hermitian (void) const;  // complex conjugate transpose
+  FloatComplexMatrix hermitian (void) const
+    { return MArray2<FloatComplex>::hermitian (std::conj); }
   FloatComplexMatrix transpose (void) const
     { return MArray2<FloatComplex>::transpose (); }
 
diff --git a/liboctave/fCRowVector.cc b/liboctave/fCRowVector.cc
--- a/liboctave/fCRowVector.cc
+++ b/liboctave/fCRowVector.cc
@@ -224,17 +224,16 @@ FloatComplexRowVector::append (const Flo
   return retval;
 }
 
-FloatComplexColumnVector
+FloatComplexColumnVector
 FloatComplexRowVector::hermitian (void) const
 {
-  octave_idx_type len = length ();
-  return FloatComplexColumnVector (mx_inline_conj_dup (data (), len), len);
-}
-
-FloatComplexColumnVector
+  return MArray<FloatComplex>::hermitian (std::conj);
+}
+
+FloatComplexColumnVector
 FloatComplexRowVector::transpose (void) const
 {
-  return FloatComplexColumnVector (*this);
+  return MArray<FloatComplex>::transpose ();
 }
 
 FloatComplexRowVector
diff --git a/liboctave/fCRowVector.h b/liboctave/fCRowVector.h
--- a/liboctave/fCRowVector.h
+++ b/liboctave/fCRowVector.h
@@ -70,7 +70,7 @@ public:
   FloatComplexRowVector append (const FloatRowVector& a) const;
   FloatComplexRowVector append (const FloatComplexRowVector& a) const;
 
-  FloatComplexColumnVector hermitian (void) const;  // complex conjugate transpose.
+  FloatComplexColumnVector hermitian (void) const;
   FloatComplexColumnVector transpose (void) const;
 
   friend FloatComplexRowVector conj (const FloatComplexRowVector& a);
diff --git a/liboctave/fColVector.cc b/liboctave/fColVector.cc
--- a/liboctave/fColVector.cc
+++ b/liboctave/fColVector.cc
@@ -142,7 +142,7 @@ FloatRowVector
 FloatRowVector
 FloatColumnVector::transpose (void) const
 {
-  return FloatRowVector (*this);
+  return MArray<float>::transpose();
 }
 
 FloatColumnVector
diff --git a/liboctave/fDiagMatrix.cc b/liboctave/fDiagMatrix.cc
--- a/liboctave/fDiagMatrix.cc
+++ b/liboctave/fDiagMatrix.cc
@@ -136,12 +136,6 @@ FloatDiagMatrix::fill (const FloatRowVec
     elem (i+beg, i+beg) = a.elem (i);
 
   return *this;
-}
-
-FloatDiagMatrix
-FloatDiagMatrix::transpose (void) const
-{
-  return FloatDiagMatrix (mx_inline_dup (data (), length ()), cols (), rows ());
 }
 
 FloatDiagMatrix
diff --git a/liboctave/fDiagMatrix.h b/liboctave/fDiagMatrix.h
--- a/liboctave/fDiagMatrix.h
+++ b/liboctave/fDiagMatrix.h
@@ -70,7 +70,7 @@ public:
   FloatDiagMatrix& fill (const FloatColumnVector& a, octave_idx_type beg);
   FloatDiagMatrix& fill (const FloatRowVector& a, octave_idx_type beg);
 
-  FloatDiagMatrix transpose (void) const;
+  FloatDiagMatrix transpose (void) const { return MDiagArray2<float>::transpose(); }
 
   friend OCTAVE_API FloatDiagMatrix real (const FloatComplexDiagMatrix& a);
   friend OCTAVE_API FloatDiagMatrix imag (const FloatComplexDiagMatrix& a);
diff --git a/liboctave/fRowVector.cc b/liboctave/fRowVector.cc
--- a/liboctave/fRowVector.cc
+++ b/liboctave/fRowVector.cc
@@ -144,7 +144,7 @@ FloatColumnVector
 FloatColumnVector
 FloatRowVector::transpose (void) const
 {
-  return FloatColumnVector (*this);
+  return MArray<float>::transpose();
 }
 
 FloatRowVector

[Changeset] Re: Faster Array transpose

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On  5-May-2008, David Bateman wrote:

| I got previous without the patch
|
| N =  128, time = 0.0016666 sec
| N =  129, time = 0.0003333 sec
| N = 1024, time = 0.0439971 sec
| N = 1025, time = 0.0336645 sec
| N = 2048, time = 0.276649 sec
| N = 2049, time = 0.172656 sec
| N = 4096, time = 1.19492 sec
| N = 4097, time = 0.700954 sec
|
| and with it
|
| N =  128, time = 0 sec
| N =  129, time = 0 sec
| N = 1024, time = 0.0319979 sec
| N = 1025, time = 0.0243318 sec
| N = 2048, time = 0.170655 sec
| N = 2049, time = 0.117326 sec
| N = 4096, time = 0.745285 sec
| N = 4097, time = 0.558964 sec
|
| and similar improvements for the hermitian operator.

That's a lot of work for a fairly modest improvement...

Is there a ChangeLog entry for this?  With that, I'd apply the change
(but skipping the diffs for the float matrix files I don't have yet).

Also, how about some tests for this since the algorithm is now more
complex?

Thanks,

jwe

Re: [Changeset] Re: Faster Array transpose

by dbateman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


John W. Eaton wrote:
That's a lot of work for a fairly modest improvement...
Well transpose and hermitian operators are used rather often and so even modest gains are useful. Also the gain is about a factor of 2 for matrix sizes divisible by 8. In any case the reason I started this was to move more methods into the Array class itself such that the float array classes could be simplified, so the speed gain in that sense in secondary. Still need a little work there before I'd suggest you take my code, but its almost there.

Is there a ChangeLog entry for this?  With that, I'd apply the change
(but skipping the diffs for the float matrix files I don't have yet).
Humm, I was sure I wrote one, but it seems not. I;ll send another changeset with one later.

All the float stuff is on my respository at hg.dbateman.org, starting at changeset 8002.

Also, how about some tests for this since the algorithm is now more
complex?
Sure ok, but there are lots of places where make check already uses the transpose and that was useful in debugging the code.. In any case I'll supply some specific tests later as well.

D.

Re: [Changeset] Re: Faster Array transpose

by dbateman3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

dbateman wrote:

>
>
> John W. Eaton wrote:
>> Is there a ChangeLog entry for this?  With that, I'd apply the change
>> (but skipping the diffs for the float matrix files I don't have yet).
>>
>
> Humm, I was sure I wrote one, but it seems not. I;ll send another changeset
> with one later.
>
> All the float stuff is on my respository at hg.dbateman.org, starting at
> changeset 8002.
The changeset is attached, however, I'm not sure how to handle the split
between the patch of this code you'll take and the single precision code
that you won't. So perhaps if you'd take the single precision code
before this stuff it might be easier. Could you examine further the
changesets

8002
8003
8004
8006
8015
8017
8019
8056

from http://hg.dbateman.org/octave and see what further you'd like to be
added to the single precision code before accepting it into 3.1.. I'll
work on the single precision type further in any case, including

* single precision version of quad
* perhaps the same for daspk, glpk, etc, but less sure of that
* Move more methods from teh derived classes to the Array classes to
clean up the code.

Regards
David

# HG changeset patch
# User David Bateman <dbateman@...>
# Date 1210108550 -7200
# Node ID e87f87f45d7a149ca5d02849494670b55b0b046b
# Parent  b385f05462735470b01ecee8da054d986494909b
Add tests for tranpose/hermitian and ChangeLog entry for new transpose code

diff --git a/liboctave/Array.cc b/liboctave/Array.cc
--- a/liboctave/Array.cc
+++ b/liboctave/Array.cc
@@ -1322,6 +1322,41 @@ Array<T>::hermitian (T (*fcn) (const T&)
       return result;
     }
 }
+
+/*
+
+%% Tranpose tests for matrices of the tile size and plus or minus a row
+%% and with four tiles.
+
+%!shared m7, mt7, m8, mt8, m9, mt9
+%! m7 = reshape (1 : 7*8, 8, 7);
+%! mt7 = [1:7; 1:7, 1:7, 1:7, 1:7; 1:7, 1:7, 1:7];
+%! m8 = reshape (1 : 8*8, 8, 8);
+%! mt8 = [1:8; 1:8, 1:8, 1:8, 1:8; 1:8, 1:8, 1:8];
+%! m9 = reshape (1 : 9*8, 8, 9);
+%! mt9 = [1:9; 1:9, 1:9, 1:9, 1:9; 1:9, 1:9, 1:9];
+
+%!assert (m7', mt7)
+%!assert ((1i*m7).', 1i * mt7)
+%!assert ((1i*m7)', conj (1i * mt7))
+%!assert (m8', mt8)
+%!assert ((1i*m8).', 1i * mt8)
+%!assert ((1i*m8)', conj (1i * mt8))
+%!assert (m9', mt9)
+%!assert ((1i*m9).', 1i * mt9)
+%!assert ((1i*m9)', conj (1i * mt9))
+
+%!assert ([m7, m7; m8, m8]', [mt7, mt8; mt7, mt8])
+%!assert ((1i*[m7, m7; m8, m8]).', 1i * [mt7, mt8; mt7, mt8])
+%!assert ((1i*[m7, m7; m8, m8])', conj (1i * [mt7, mt8; mt7, mt8]))
+%!assert ([m8, m8; m8, m8]', [mt8, mt8; mt8, mt8])
+%!assert ((1i*[m8, m8; m8, m8]).', 1i * [mt8, mt8; mt8, mt8])
+%!assert ((1i*[m8, m8; m8, m8])', conj (1i * [mt8, mt8; mt8, mt8]))
+%!assert ([m9, m9; m8, m8]', [mt9, mt8; mt9, mt8])
+%!assert ((1i*[m9, m9; m8, m8]).', 1i * [mt9, mt8; mt9, mt8])
+%!assert ((1i*[m9, m9; m8, m8])', conj (1i * [mt9, mt8; mt9, mt8]))
+
+*/
 
 template <class T>
 T *
diff --git a/liboctave/ChangeLog b/liboctave/ChangeLog
--- a/liboctave/ChangeLog
+++ b/liboctave/ChangeLog
@@ -1,3 +1,48 @@ 2008-05-05  John W. Eaton  <jwe@...
+2008-05-06  David Bateman  <dbateman@...>
+
+ * Array.cc (Array<T> Array<T>::transpose () const): Modify for tiled
+ transpose to limit the number of cache misses.
+ (Array<T> Array<T>::hermitian (T (*)(const&)) const): New method
+ for matrix conjugate transpose.
+ * Array.h (Array<T> hermitian (T (*)(const&)) const): Declare it.
+
+ * DiagArray2.cc (DiagArray2<T> DiagArray2<T>::transpose () const):
+ Specialization for diagonal arrays.
+ (DiagArray2<T> DiagArray2<T>::transpose (T (*) (const&)) const):
+ Ditto.
+
+ * MArray.h (MArray<T> hermitian <T (*) (const&)) const): New method.
+ (MArray<T> transpose () const): Ditto.
+ * MArray2.h (MArray2<T> hermitian <T (*) (const&)) const): Ditto.
+ * Array2.h (Array2<T> hermitian <T (*) (const&)) const): Ditto.
+ * ArrayN.h (ArrayN<T> hermitian <T (*) (const&)) const): Ditto.
+ * MDiagArray2.h (MDiagArray2<T> transpose () const): Ditto.
+ (MDiagArray<T> hermitian <T (*) (const&)) const): Ditto.
+
+ * CColVector.cc (transpose, hermitian): Define in terms of base class.
+ * CRowVector.cc (transpose, hermitian): Ditto.
+ * dColVector.cc (transpose): Ditto.
+ * dRowVector.cc (transpose): Ditto.
+ * CDiagMatrix.h (transpose, hermitian): Ditto.
+ * dDiagMatrix.h (transpose): Ditto.
+
+ * fCColVector.cc (transpose, hermitian): Define in terms of base class.
+ * fCRowVector.cc (transpose, hermitian): Ditto.
+ * fColVector.cc (transpose): Ditto.
+ * fRowVector.cc (transpose): Ditto.
+ * fCDiagMatrix.h (transpose, hermitian): Ditto.
+ * fDiagMatrix.h (transpose): Ditto.
+
+ * CDiagMatrix.cc (ComplexDiagMatrix::transpose,
+ ComplexDiagMatrix::hermitian): Delete.
+ * dDiagMatrix.cc (DiagMatrix::transpose): Ditto.
+ * CMatrix.cc (ComplexMatrix::hermitian): Ditto.
+
+ * fCDiagMatrix.cc (FloatComplexDiagMatrix::transpose,
+ FloatComplexDiagMatrix::hermitian): Delete.
+ * fDiagMatrix.cc (FloatDiagMatrix::transpose): Ditto.
+ * fCMatrix.cc (FloatComplexMatrix::hermitian): Ditto.
+
 2008-05-05  John W. Eaton  <jwe@...>
 
  * cmd-edit.cc (command_editor::re_read_init_file,

Re: [Changeset] Re: Faster Array transpose

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On  6-May-2008, David Bateman wrote:

| dbateman wrote:
| >
| >
| > John W. Eaton wrote:
| >> Is there a ChangeLog entry for this?  With that, I'd apply the change
| >> (but skipping the diffs for the float matrix files I don't have yet).
| >>
| >
| > Humm, I was sure I wrote one, but it seems not. I;ll send another changeset
| > with one later.
| >
| > All the float stuff is on my respository at hg.dbateman.org, starting at
| > changeset 8002.
|
| The changeset is attached, however, I'm not sure how to handle the split
| between the patch of this code you'll take and the single precision code
| that you won't. So perhaps if you'd take the single precision code
| before this stuff it might be easier. Could you examine further the
| changesets
|
| 8002
| 8003
| 8004
| 8006
| 8015
| 8017
| 8019
| 8056
|
| from http://hg.dbateman.org/octave and see what further you'd like to be
| added to the single precision code before accepting it into 3.1..

I applied all these changesets in one batch to my archive.

| I'll work on the single precision type further in any case, including
|
| * single precision version of quad

OK.

| * perhaps the same for daspk, glpk, etc, but less sure of that

This is not a high priority for me.

| * Move more methods from teh derived classes to the Array classes to
| clean up the code.

OK.

Thanks,

jwe

Re: [Changeset] Re: Faster Array transpose

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, May 21, 2008 at 2:06 AM, John W. Eaton <jwe@...> wrote:

> On  6-May-2008, David Bateman wrote:
>
> | dbateman wrote:
> | >
> | >
> | > John W. Eaton wrote:
> | >> Is there a ChangeLog entry for this?  With that, I'd apply the change
> | >> (but skipping the diffs for the float matrix files I don't have yet).
> | >>
> | >
> | > Humm, I was sure I wrote one, but it seems not. I;ll send another changeset
> | > with one later.
> | >
> | > All the float stuff is on my respository at hg.dbateman.org, starting at
> | > changeset 8002.
> |
> | The changeset is attached, however, I'm not sure how to handle the split
> | between the patch of this code you'll take and the single precision code
> | that you won't. So perhaps if you'd take the single precision code
> | before this stuff it might be easier. Could you examine further the
> | changesets
> |
> | 8002
> | 8003
> | 8004
> | 8006
> | 8015
> | 8017
> | 8019
> | 8056
> |
> | from http://hg.dbateman.org/octave and see what further you'd like to be
> | added to the single precision code before accepting it into 3.1..
>
> I applied all these changesets in one batch to my archive.
>
> | I'll work on the single precision type further in any case, including
> |
> | * single precision version of quad
>
> OK.
>
> | * perhaps the same for daspk, glpk, etc, but less sure of that
>
> This is not a high priority for me.
>
> | * Move more methods from teh derived classes to the Array classes to
> | clean up the code.
>
> OK.
>
> Thanks,
>
> jwe
>

What a huge update :) I see this is interferring non-trivially with
all my proposed changes (the compound operators repo and two pending
patches.) Shall I merge them? (And re-send the patches, probably).

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Re: [Changeset] Re: Faster Array transpose

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 21-May-2008, Jaroslav Hajek wrote:

| What a huge update :) I see this is interferring non-trivially with
| all my proposed changes (the compound operators repo and two pending
| patches.) Shall I merge them? (And re-send the patches, probably).

I was planning to look at those changes today.  If you can update the
changeset in the next few hours so that it applies cleanly, that would
help.  Otherwise, I will try to do it.

Thanks,

jwe

Re: [Changeset] Re: Faster Array transpose

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, May 21, 2008 at 12:08 PM, John W. Eaton <jwe@...> wrote:

> On 21-May-2008, Jaroslav Hajek wrote:
>
> | What a huge update :) I see this is interferring non-trivially with
> | all my proposed changes (the compound operators repo and two pending
> | patches.) Shall I merge them? (And re-send the patches, probably).
>
> I was planning to look at those changes today.  If you can update the
> changeset in the next few hours so that it applies cleanly, that would
> help.  Otherwise, I will try to do it.
>

I have two pending changesets:
the improved matrix_type SPD check:
http://www.nabble.com/matrix_type-check-tt16883771.html#a16883771
and the (quite old) 2D pchip interpolation
http://www.nabble.com/pchip-method-for-interp2-tt16049957.html#a16049957

The first one actually applies cleanly onto current tip, but obviously
leaves the old code
for FloatMatrix and FoatComplexMatrix. A copy would suffice, but I
think I can easily remake this to use templates to avoid duplicating
code.

The second patch should be more trivial, as most of it is m-file code.

As for the compound operators repo, I intend to import changes from
your main repo, do a merge, then add code to support single precision
where appropriate. I.e., that will be two changesets more. You can
then simply pull from the repo. Is that OK or is there a more
desirable way?

> Thanks,
>
> jwe
>



--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Re: [Changeset] Re: Faster Array transpose

by dbateman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


John W. Eaton wrote:
| Could you examine further the changesets
|
| 8002
| 8003
| 8004
| 8006
| 8015
| 8017
| 8019
| 8056
|
| from http://hg.dbateman.org/octave and see what further you'd like to be
| added to the single precision code before accepting it into 3.1..

I applied all these changesets in one batch to my archive.
Ok some other changesets from my repository that relate to the single precision type that should be merged are

8072 8073 8076 8077 8094

| I'll work on the single precision type further in any case, including
|
| * single precision version of quad

OK.
Done in the above changesets.

| * perhaps the same for daspk, glpk, etc, but less sure of that

This is not a high priority for me.
Me neither, but getting the base classes for this templatized for float/double would be useful and is what I was currently doing, though being a new dad for the last five days has more or less stopped me for the moment.

| * Move more methods from teh derived classes to the Array classes to
| clean up the code.

OK.
This and the single precision test code is the major thing still to do for this.

Cheers
David

Re: [Changeset] Re: Faster Array transpose

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 21-May-2008, dbateman wrote:

|
|
|
| John W. Eaton wrote:
| >
| > | Could you examine further the changesets
| > |
| > | 8002
| > | 8003
| > | 8004
| > | 8006
| > | 8015
| > | 8017
| > | 8019
| > | 8056
| > |
| > | from http://hg.dbateman.org/octave and see what further you'd like to be
| > | added to the single precision code before accepting it into 3.1..
| >
| > I applied all these changesets in one batch to my archive.
| >
|
| Ok some other changesets from my repository that relate to the single
| precision type that should be merged are
|
| 8072 8073 8076 8077 8094

I updated and the latest changeset in the archive is 8092.  Did you
mean that changeset?  It appears to be related to the single precision
changes.

jwe

Re: [Changeset] Re: Faster Array transpose

by dbateman3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John W. Eaton wrote:

> On 21-May-2008, dbateman wrote:
>
> |
> |
> |
> | John W. Eaton wrote:
> | >
> | > | Could you examine further the changesets
> | > |
> | > | 8002
> | > | 8003
> | > | 8004
> | > | 8006
> | > | 8015
> | > | 8017
> | > | 8019
> | > | 8056
> | > |
> | > | from http://hg.dbateman.org/octave and see what further you'd like to be
> | > | added to the single precision code before accepting it into 3.1..
> | >
> | > I applied all these changesets in one batch to my archive.
> | >
> |
> | Ok some other changesets from my repository that relate to the single
> | precision type that should be merged are
> |
> | 8072 8073 8076 8077 8094
>
> I updated and the latest changeset in the archive is 8092.  Did you
> mean that changeset?  It appears to be related to the single precision
> changes.
>
> jwe
>

Try again I forgot to push some of my changes..

D.


Re: [Changeset] Re: Faster Array transpose

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 21-May-2008, David Bateman wrote:

| Try again I forgot to push some of my changes..

OK, I think I have them all now.  If not, then let me know which
changesets I'm missing.

Thanks,

jwe