|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
Faster Array transposeThe array transpose in Octave uses an extremely simplistic method
template <class T> Array<T> Array<T>::transpose (void) const { assert (ndims () == 2); octave_idx_type nr = dim1 (); octave_idx_type nc = dim2 (); if (nr > 1 && nc > 1) { Array<T> result (dim_vector (nc, nr)); for (octave_idx_type j = 0; j < nc; j++) for (octave_idx_type i = 0; i < nr; i++) result.xelem (j, i) = xelem (i, j); return result; } else { // Fast transpose for vectors and empty matrices return Array<T> (*this, dim_vector (nc, nr)); } } This has poor performance due to the number of cache missings. I know there were discussions about flagging a transposed matrix rather than actually doing it to use with the BLAS functions, but the overhead was felt to be too high. If we can't do that can we at least use a method like template <class T> Array<T> Array<T>::transpose (void) const { assert (ndims () == 2); octave_idx_type nr = dim1 (); octave_idx_type nc = dim2 (); if (nr > 1 && nc > 1) { Array<T> result (dim_vector (nc, nr)); // Blocked transpose to attempt to avoid cache misses. #define TRANSN 8 // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool // on some compilers. Array<T> tmp (TRANSN * TRANSN); T *buf = tmp.fortran_vec (); octave_idx_type ii = 0, jj; for (jj = 0; jj < (nc - TRANSN + 1); jj += TRANSN) { for (ii = 0; ii < (nr - TRANSN + 1); ii += TRANSN) { // Copy to buffer for (octave_idx_type j = jj, k = 0; j < jj + TRANSN; j++) for (octave_idx_type i = ii; i < ii + TRANSN; i++) buf[k++] = xelem (i, j); // Copy from buffer for (octave_idx_type i = ii; i < ii + TRANSN; i++) for (octave_idx_type j = jj, k = 0; j < jj + TRANSN; j++, k+=TRANSN) result.xelem (j, i) = buf[i + k]; } if (ii < nr) for (octave_idx_type j = jj; j < jj + TRANSN; j++) for (octave_idx_type i = ii; i < nr; i++) result.xelem (j, i) = xelem (i, j); } for (octave_idx_type j = jj; j < nc; j++) for (octave_idx_type i = ii; i < nr; i++) result.xelem (j, i) = xelem (i, j); #undef TRANSN return result; } else { // Fast transpose for vectors and empty matrices return Array<T> (*this, dim_vector (nc, nr)); } } That copies to a tile and then recopies to the result matrix. This reduces the cache misses by a factor of 2/TRANSN at the cost of an additional copy. For TRANSN=8 about and using the script N = [128, 129, 1024, 1025, 2048, 2049, 4096, 4097]; nruns = 10; t = zeros (1, length (N)); for i = 1: length (N) A = randn (N(i), N(i)); for j = 1: nruns t0 = cputime (); B = A .'; t(i) = t(i) + cputime() - t0; endfor t (i) = t (i) ./ nruns; printf("N = %4d, time = %g sec\n", N(i), t(i)); fflush (stdout); endfor I see the times N = 128, time = 0.0029999 sec N = 129, time = 0.0003333 sec N = 1024, time = 0.0279981 sec N = 1025, time = 0.0219985 sec N = 2048, time = 0.234985 sec N = 2049, time = 0.0796615 sec N = 4096, time = 0.957271 sec N = 4097, time = 0.41564 sec before this change and N = 128, time = 0.0026665 sec N = 129, time = 0.0003333 sec N = 1024, time = 0.010666 sec N = 1025, time = 0.0073329 sec N = 2048, time = 0.0849944 sec N = 2049, time = 0.0563296 sec N = 4096, time = 0.339311 sec N = 4097, time = 0.301647 sec after. So at least on my machine such a change seems to be a consistent win in terms of the performance of the transpose. I made no attempt to really optimize the above code and perhaps larger values of TRANSN will given even better results. We might also directory reference the data rather than use the xelem method to access it for further speed as we can partly do the index calculation in advance. Something similar might also be done with the hermitian method as well. Regards David |
|
|
Re: Faster Array transposeI think the code below is a good compromise for this function. Sorry I
can't easily create a changeset as thare are other uncomitted changes in Array.cc in by repository at the moment D. template <class T> Array<T> Array<T>::transpose (void) const { assert (ndims () == 2); octave_idx_type nr = dim1 (); octave_idx_type nc = dim2 (); if (nr >= 8 && nc >= 8) { Array<T> result (dim_vector (nc, nr)); // Blocked transpose to attempt to avoid cache misses. // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool // on some compilers. Array<T> tmp (64); T *buf = tmp.fortran_vec (); octave_idx_type ii = 0, jj; for (jj = 0; jj < (nc - 8 + 1); jj += 8) { for (ii = 0; ii < (nr - 8 + 1); ii += 8) { // Copy to buffer for (octave_idx_type j = jj, k = 0, idxj = jj * nr; j < jj + 8; j++, idxj += nr) for (octave_idx_type i = ii; i < ii + 8; i++) buf[k++] = xelem (i + idxj); // Copy from buffer for (octave_idx_type i = ii, idxi = ii * nc; i < ii + 8; i++, idxi += nc) for (octave_idx_type j = jj, k = i; j < jj + 8; j++, k+=8) result.xelem (j + idxi) = buf[k]; } if (ii < nr) for (octave_idx_type j = jj; j < jj + 8; j++) for (octave_idx_type i = ii; i < nr; i++) result.xelem (j, i) = xelem (i, j); } for (octave_idx_type j = jj; j < nc; j++) for (octave_idx_type i = ii; i < nr; i++) result.xelem (j, i) = xelem (i, j); return result; } else if (nr > 1 && nc > 1) { Array<T> result (dim_vector (nc, nr)); for (octave_idx_type j = 0; j < nc; j++) for (octave_idx_type i = 0; i < nr; i++) result.xelem (j, i) = xelem (i, j); return result; } else { // Fast transpose for vectors and empty matrices return Array<T> (*this, dim_vector (nc, nr)); } } |
|
|
[Changeset] Re: Faster Array transposeDavid Bateman wrote:
> I think the code below is a good compromise for this function. Sorry I > can't easily create a changeset as thare are other uncomitted changes in > Array.cc in by repository at the moment > > D. It wasn't correct.. I've written this up in my attempt to clean up the Array class for the integration of the single precision type.. The Changeset is attached, but I suspect it might not apply without the rest of the single precision changes (which I consider pretty much complete in themselves). With this changes the transpose and hermitian methods go into the Array class, with only a one specialization for the DiagArray2 class. With this change "make check" passes correctly and the speed difference with the bench mark N = [128, 129, 1024, 1025, 2048, 2049, 4096, 4097]; nruns = 10; t = zeros (1, length (N)); for i = 1: length (N) A = 1i*randn (N(i), N(i)); for j = 1: nruns t0 = cputime (); B = A'; t(i) = t(i) + cputime() - t0; endfor t (i) = t (i) ./ nruns; printf("N = %4d, time = %g sec\n", N(i), t(i)); fflush (stdout); endfor I got previous without the patch N = 128, time = 0.0016666 sec N = 129, time = 0.0003333 sec N = 1024, time = 0.0439971 sec N = 1025, time = 0.0336645 sec N = 2048, time = 0.276649 sec N = 2049, time = 0.172656 sec N = 4096, time = 1.19492 sec N = 4097, time = 0.700954 sec and with it N = 128, time = 0 sec N = 129, time = 0 sec N = 1024, time = 0.0319979 sec N = 1025, time = 0.0243318 sec N = 2048, time = 0.170655 sec N = 2049, time = 0.117326 sec N = 4096, time = 0.745285 sec N = 4097, time = 0.558964 sec and similar improvements for the hermitian operator. D. # HG changeset patch # User David Bateman <dbateman@...> # Date 1209938758 -7200 # Node ID 5ca24a40998f84f8d693aa8baeb3b85b1b06933c # Parent 2442bbe5a6932984c8e9ec7378954f4e218e621d Cache optimized hermitian/transpose methods diff --git a/liboctave/Array.cc b/liboctave/Array.cc --- a/liboctave/Array.cc +++ b/liboctave/Array.cc @@ -1203,7 +1203,48 @@ Array<T>::transpose (void) const octave_idx_type nr = dim1 (); octave_idx_type nc = dim2 (); - if (nr > 1 && nc > 1) + if (nr >= 8 && nc >= 8) + { + Array<T> result (dim_vector (nc, nr)); + + // Blocked transpose to attempt to avoid cache misses. + + // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool + // on some compilers. + T buf[64]; + + octave_idx_type ii = 0, jj; + for (jj = 0; jj < (nc - 8 + 1); jj += 8) + { + for (ii = 0; ii < (nr - 8 + 1); ii += 8) + { + // Copy to buffer + for (octave_idx_type j = jj, k = 0, idxj = jj * nr; + j < jj + 8; j++, idxj += nr) + for (octave_idx_type i = ii; i < ii + 8; i++) + buf[k++] = xelem (i + idxj); + + // Copy from buffer + for (octave_idx_type i = ii, idxi = ii * nc; i < ii + 8; + i++, idxi += nc) + for (octave_idx_type j = jj, k = i - ii; j < jj + 8; + j++, k+=8) + result.xelem (j + idxi) = buf[k]; + } + + if (ii < nr) + for (octave_idx_type j = jj; j < jj + 8; j++) + for (octave_idx_type i = ii; i < nr; i++) + result.xelem (j, i) = xelem (i, j); + } + + for (octave_idx_type j = jj; j < nc; j++) + for (octave_idx_type i = 0; i < nr; i++) + result.xelem (j, i) = xelem (i, j); + + return result; + } + else if (nr > 1 && nc > 1) { Array<T> result (dim_vector (nc, nr)); @@ -1217,6 +1258,68 @@ Array<T>::transpose (void) const { // Fast transpose for vectors and empty matrices return Array<T> (*this, dim_vector (nc, nr)); + } +} + +template <class T> +Array<T> +Array<T>::hermitian (T (*fcn) (const T&)) const +{ + assert (ndims () == 2); + + octave_idx_type nr = dim1 (); + octave_idx_type nc = dim2 (); + + if (nr >= 8 && nc >= 8) + { + Array<T> result (dim_vector (nc, nr)); + + // Blocked transpose to attempt to avoid cache misses. + + // Don't use OCTAVE_LOCAL_BUFFER here as it doesn't work with bool + // on some compilers. + T buf[64]; + + octave_idx_type ii = 0, jj; + for (jj = 0; jj < (nc - 8 + 1); jj += 8) + { + for (ii = 0; ii < (nr - 8 + 1); ii += 8) + { + // Copy to buffer + for (octave_idx_type j = jj, k = 0, idxj = jj * nr; + j < jj + 8; j++, idxj += nr) + for (octave_idx_type i = ii; i < ii + 8; i++) + buf[k++] = xelem (i + idxj); + + // Copy from buffer + for (octave_idx_type i = ii, idxi = ii * nc; i < ii + 8; + i++, idxi += nc) + for (octave_idx_type j = jj, k = i - ii; j < jj + 8; + j++, k+=8) + result.xelem (j + idxi) = fcn (buf[k]); + } + + if (ii < nr) + for (octave_idx_type j = jj; j < jj + 8; j++) + for (octave_idx_type i = ii; i < nr; i++) + result.xelem (j, i) = fcn (xelem (i, j)); + } + + for (octave_idx_type j = jj; j < nc; j++) + for (octave_idx_type i = 0; i < nr; i++) + result.xelem (j, i) = fcn (xelem (i, j)); + + return result; + } + else + { + Array<T> result (dim_vector (nc, nr)); + + for (octave_idx_type j = 0; j < nc; j++) + for (octave_idx_type i = 0; i < nr; i++) + result.xelem (j, i) = fcn (xelem (i, j)); + + return result; } } diff --git a/liboctave/Array.h b/liboctave/Array.h --- a/liboctave/Array.h +++ b/liboctave/Array.h @@ -461,6 +461,7 @@ public: bool is_empty (void) const { return numel () == 0; } Array<T> transpose (void) const; + Array<T> hermitian (T (*fcn) (const T&) = 0) const; const T *data (void) const { return rep->data; } diff --git a/liboctave/Array2.h b/liboctave/Array2.h --- a/liboctave/Array2.h +++ b/liboctave/Array2.h @@ -109,6 +109,12 @@ public: return Array2<T> (tmp, tmp.rows (), tmp.columns ()); } + Array2<T> hermitian (T (*fcn) (const T&) = 0) const + { + Array<T> tmp = Array<T>::hermitian (fcn); + return Array2<T> (tmp, tmp.rows (), tmp.columns ()); + } + Array2<T> index (idx_vector& i, int resize_ok = 0, const T& rfv = resize_fill_value (T ())) const { diff --git a/liboctave/ArrayN.h b/liboctave/ArrayN.h --- a/liboctave/ArrayN.h +++ b/liboctave/ArrayN.h @@ -102,6 +102,7 @@ public: ArrayN<T> squeeze (void) const { return Array<T>::squeeze (); } ArrayN<T> transpose (void) const { return Array<T>::transpose (); } + ArrayN<T> hermitian (T (*fcn) (const T&) = 0) const { return Array<T>::hermitian (fcn); } ArrayN<T>& insert (const ArrayN<T>& a, const dim_vector& dv) { diff --git a/liboctave/CColVector.cc b/liboctave/CColVector.cc --- a/liboctave/CColVector.cc +++ b/liboctave/CColVector.cc @@ -221,17 +221,16 @@ ComplexColumnVector::stack (const Comple return retval; } -ComplexRowVector +ComplexRowVector ComplexColumnVector::hermitian (void) const -{ - octave_idx_type len = length (); - return ComplexRowVector (mx_inline_conj_dup (data (), len), len); +{ + return MArray<Complex>::hermitian (std::conj); } ComplexRowVector ComplexColumnVector::transpose (void) const { - return ComplexRowVector (*this); + return MArray<Complex>::transpose (); } ComplexColumnVector diff --git a/liboctave/CColVector.h b/liboctave/CColVector.h --- a/liboctave/CColVector.h +++ b/liboctave/CColVector.h @@ -72,7 +72,7 @@ public: ComplexColumnVector stack (const ColumnVector& a) const; ComplexColumnVector stack (const ComplexColumnVector& a) const; - ComplexRowVector hermitian (void) const; // complex conjugate transpose. + ComplexRowVector hermitian (void) const; ComplexRowVector transpose (void) const; friend ComplexColumnVector conj (const ComplexColumnVector& a); diff --git a/liboctave/CDiagMatrix.cc b/liboctave/CDiagMatrix.cc --- a/liboctave/CDiagMatrix.cc +++ b/liboctave/CDiagMatrix.cc @@ -230,20 +230,6 @@ ComplexDiagMatrix::fill (const ComplexRo elem (i+beg, i+beg) = a.elem (i); return *this; -} - -ComplexDiagMatrix -ComplexDiagMatrix::hermitian (void) const -{ - return ComplexDiagMatrix (mx_inline_conj_dup (data (), length ()), - cols (), rows ()); -} - -ComplexDiagMatrix -ComplexDiagMatrix::transpose (void) const -{ - return ComplexDiagMatrix (mx_inline_dup (data (), length ()), - cols (), rows ()); } ComplexDiagMatrix diff --git a/liboctave/CDiagMatrix.h b/liboctave/CDiagMatrix.h --- a/liboctave/CDiagMatrix.h +++ b/liboctave/CDiagMatrix.h @@ -87,8 +87,8 @@ public: ComplexDiagMatrix& fill (const RowVector& a, octave_idx_type beg); ComplexDiagMatrix& fill (const ComplexRowVector& a, octave_idx_type beg); - ComplexDiagMatrix hermitian (void) const; // complex conjugate transpose - ComplexDiagMatrix transpose (void) const; + ComplexDiagMatrix hermitian (void) const { return MDiagArray2<Complex>::hermitian (std::conj); } + ComplexDiagMatrix transpose (void) const { return MDiagArray2<Complex>::transpose(); } friend ComplexDiagMatrix conj (const ComplexDiagMatrix& a); diff --git a/liboctave/CMatrix.cc b/liboctave/CMatrix.cc --- a/liboctave/CMatrix.cc +++ b/liboctave/CMatrix.cc @@ -882,22 +882,6 @@ ComplexMatrix::stack (const ComplexDiagM retval.insert (*this, 0, 0); retval.insert (a, nr_insert, 0); return retval; -} - -ComplexMatrix -ComplexMatrix::hermitian (void) const -{ - octave_idx_type nr = rows (); - octave_idx_type nc = cols (); - ComplexMatrix result; - if (length () > 0) - { - result.resize (nc, nr); - for (octave_idx_type j = 0; j < nc; j++) - for (octave_idx_type i = 0; i < nr; i++) - result.elem (j, i) = conj (elem (i, j)); - } - return result; } ComplexMatrix diff --git a/liboctave/CMatrix.h b/liboctave/CMatrix.h --- a/liboctave/CMatrix.h +++ b/liboctave/CMatrix.h @@ -126,7 +126,8 @@ public: ComplexMatrix stack (const ComplexColumnVector& a) const; ComplexMatrix stack (const ComplexDiagMatrix& a) const; - ComplexMatrix hermitian (void) const; // complex conjugate transpose + ComplexMatrix hermitian (void) const + { return MArray2<Complex>::hermitian (std::conj); } ComplexMatrix transpose (void) const { return MArray2<Complex>::transpose (); } diff --git a/liboctave/CRowVector.cc b/liboctave/CRowVector.cc --- a/liboctave/CRowVector.cc +++ b/liboctave/CRowVector.cc @@ -227,14 +227,13 @@ ComplexColumnVector ComplexColumnVector ComplexRowVector::hermitian (void) const { - octave_idx_type len = length (); - return ComplexColumnVector (mx_inline_conj_dup (data (), len), len); + return MArray<Complex>::hermitian (std::conj); } ComplexColumnVector ComplexRowVector::transpose (void) const { - return ComplexColumnVector (*this); + return MArray<Complex>::transpose (); } ComplexRowVector diff --git a/liboctave/CRowVector.h b/liboctave/CRowVector.h --- a/liboctave/CRowVector.h +++ b/liboctave/CRowVector.h @@ -70,7 +70,7 @@ public: ComplexRowVector append (const RowVector& a) const; ComplexRowVector append (const ComplexRowVector& a) const; - ComplexColumnVector hermitian (void) const; // complex conjugate transpose. + ComplexColumnVector hermitian (void) const; ComplexColumnVector transpose (void) const; friend ComplexRowVector conj (const ComplexRowVector& a); diff --git a/liboctave/DiagArray2.cc b/liboctave/DiagArray2.cc --- a/liboctave/DiagArray2.cc +++ b/liboctave/DiagArray2.cc @@ -33,6 +33,27 @@ along with Octave; see the file COPYING. #include "DiagArray2.h" #include "lo-error.h" + +template <class T> +DiagArray2<T> +DiagArray2<T>::transpose (void) const +{ + DiagArray2<T> retval (*this); + retval.dimensions = dim_vector (this->dim2 (), this->dim1 ()); + return retval; +} + +template <class T> +DiagArray2<T> +DiagArray2<T>::hermitian (T (* fcn) (const T&)) const +{ + DiagArray2<T> retval (this->dim2 (), this->dim1 ()); + const T *p = this->data (); + T *q = retval.fortran_vec (); + for (octave_idx_type i = 0; i < this->length (); i++) + q [i] = fcn (p [i]); + return retval; +} // A two-dimensional array with diagonal elements only. diff --git a/liboctave/DiagArray2.h b/liboctave/DiagArray2.h --- a/liboctave/DiagArray2.h +++ b/liboctave/DiagArray2.h @@ -180,6 +180,9 @@ public: void resize (octave_idx_type n, octave_idx_type m, const T& val); void maybe_delete_elements (idx_vector& i, idx_vector& j); + + DiagArray2<T> transpose (void) const; + DiagArray2<T> hermitian (T (*fcn) (const T&) = 0) const; }; #endif diff --git a/liboctave/MArray.h b/liboctave/MArray.h --- a/liboctave/MArray.h +++ b/liboctave/MArray.h @@ -63,6 +63,9 @@ public: return *this; } + MArray<T> transpose (void) const { return Array<T>::transpose (); } + MArray<T> hermitian (T (*fcn) (const T&) = 0) const { return Array<T>::hermitian (fcn); } + octave_idx_type nnz (void) const { octave_idx_type retval = 0; diff --git a/liboctave/MArray2.h b/liboctave/MArray2.h --- a/liboctave/MArray2.h +++ b/liboctave/MArray2.h @@ -80,6 +80,7 @@ public: } MArray2<T> transpose (void) const { return Array2<T>::transpose (); } + MArray2<T> hermitian (T (*fcn) (const T&) = 0) const { return Array2<T>::hermitian (fcn); } MArray2<T> diag (octave_idx_type k) const { diff --git a/liboctave/MDiagArray2.h b/liboctave/MDiagArray2.h --- a/liboctave/MDiagArray2.h +++ b/liboctave/MDiagArray2.h @@ -81,6 +81,9 @@ public: return retval; } + MDiagArray2<T> transpose (void) const { return DiagArray2<T>::transpose (); } + MDiagArray2<T> hermitian (T (*fcn) (const T&) = 0) const { return DiagArray2<T>::hermitian (fcn); } + static MDiagArray2<T> nil_array; // Currently, the OPS functions don't need to be friends, but that diff --git a/liboctave/dColVector.cc b/liboctave/dColVector.cc --- a/liboctave/dColVector.cc +++ b/liboctave/dColVector.cc @@ -142,7 +142,7 @@ RowVector RowVector ColumnVector::transpose (void) const { - return RowVector (*this); + return MArray<double>::transpose(); } ColumnVector diff --git a/liboctave/dDiagMatrix.cc b/liboctave/dDiagMatrix.cc --- a/liboctave/dDiagMatrix.cc +++ b/liboctave/dDiagMatrix.cc @@ -136,12 +136,6 @@ DiagMatrix::fill (const RowVector& a, oc elem (i+beg, i+beg) = a.elem (i); return *this; -} - -DiagMatrix -DiagMatrix::transpose (void) const -{ - return DiagMatrix (mx_inline_dup (data (), length ()), cols (), rows ()); } DiagMatrix diff --git a/liboctave/dDiagMatrix.h b/liboctave/dDiagMatrix.h --- a/liboctave/dDiagMatrix.h +++ b/liboctave/dDiagMatrix.h @@ -70,7 +70,7 @@ public: DiagMatrix& fill (const ColumnVector& a, octave_idx_type beg); DiagMatrix& fill (const RowVector& a, octave_idx_type beg); - DiagMatrix transpose (void) const; + DiagMatrix transpose (void) const { return MDiagArray2<double>::transpose(); } friend OCTAVE_API DiagMatrix real (const ComplexDiagMatrix& a); friend OCTAVE_API DiagMatrix imag (const ComplexDiagMatrix& a); diff --git a/liboctave/dRowVector.cc b/liboctave/dRowVector.cc --- a/liboctave/dRowVector.cc +++ b/liboctave/dRowVector.cc @@ -144,7 +144,7 @@ ColumnVector ColumnVector RowVector::transpose (void) const { - return ColumnVector (*this); + return MArray<double>::transpose(); } RowVector diff --git a/liboctave/fCColVector.cc b/liboctave/fCColVector.cc --- a/liboctave/fCColVector.cc +++ b/liboctave/fCColVector.cc @@ -221,17 +221,16 @@ FloatComplexColumnVector::stack (const F return retval; } -FloatComplexRowVector +FloatComplexRowVector FloatComplexColumnVector::hermitian (void) const { - octave_idx_type len = length (); - return FloatComplexRowVector (mx_inline_conj_dup (data (), len), len); -} - -FloatComplexRowVector + return MArray<FloatComplex>::hermitian (std::conj); +} + +FloatComplexRowVector FloatComplexColumnVector::transpose (void) const { - return FloatComplexRowVector (*this); + return MArray<FloatComplex>::transpose (); } FloatComplexColumnVector diff --git a/liboctave/fCColVector.h b/liboctave/fCColVector.h --- a/liboctave/fCColVector.h +++ b/liboctave/fCColVector.h @@ -72,7 +72,7 @@ public: FloatComplexColumnVector stack (const FloatColumnVector& a) const; FloatComplexColumnVector stack (const FloatComplexColumnVector& a) const; - FloatComplexRowVector hermitian (void) const; // complex conjugate transpose. + FloatComplexRowVector hermitian (void) const; FloatComplexRowVector transpose (void) const; friend FloatComplexColumnVector conj (const FloatComplexColumnVector& a); diff --git a/liboctave/fCDiagMatrix.cc b/liboctave/fCDiagMatrix.cc --- a/liboctave/fCDiagMatrix.cc +++ b/liboctave/fCDiagMatrix.cc @@ -230,20 +230,6 @@ FloatComplexDiagMatrix::fill (const Floa elem (i+beg, i+beg) = a.elem (i); return *this; -} - -FloatComplexDiagMatrix -FloatComplexDiagMatrix::hermitian (void) const -{ - return FloatComplexDiagMatrix (mx_inline_conj_dup (data (), length ()), - cols (), rows ()); -} - -FloatComplexDiagMatrix -FloatComplexDiagMatrix::transpose (void) const -{ - return FloatComplexDiagMatrix (mx_inline_dup (data (), length ()), - cols (), rows ()); } FloatComplexDiagMatrix diff --git a/liboctave/fCDiagMatrix.h b/liboctave/fCDiagMatrix.h --- a/liboctave/fCDiagMatrix.h +++ b/liboctave/fCDiagMatrix.h @@ -87,8 +87,8 @@ public: FloatComplexDiagMatrix& fill (const FloatRowVector& a, octave_idx_type beg); FloatComplexDiagMatrix& fill (const FloatComplexRowVector& a, octave_idx_type beg); - FloatComplexDiagMatrix hermitian (void) const; // complex conjugate transpose - FloatComplexDiagMatrix transpose (void) const; + FloatComplexDiagMatrix hermitian (void) const { return MDiagArray2<FloatComplex>::hermitian (std::conj); } + FloatComplexDiagMatrix transpose (void) const { return MDiagArray2<FloatComplex>::transpose(); } friend FloatComplexDiagMatrix conj (const FloatComplexDiagMatrix& a); diff --git a/liboctave/fCMatrix.cc b/liboctave/fCMatrix.cc --- a/liboctave/fCMatrix.cc +++ b/liboctave/fCMatrix.cc @@ -876,22 +876,6 @@ FloatComplexMatrix::stack (const FloatCo retval.insert (*this, 0, 0); retval.insert (a, nr_insert, 0); return retval; -} - -FloatComplexMatrix -FloatComplexMatrix::hermitian (void) const -{ - octave_idx_type nr = rows (); - octave_idx_type nc = cols (); - FloatComplexMatrix result; - if (length () > 0) - { - result.resize (nc, nr); - for (octave_idx_type j = 0; j < nc; j++) - for (octave_idx_type i = 0; i < nr; i++) - result.elem (j, i) = conj (elem (i, j)); - } - return result; } FloatComplexMatrix diff --git a/liboctave/fCMatrix.h b/liboctave/fCMatrix.h --- a/liboctave/fCMatrix.h +++ b/liboctave/fCMatrix.h @@ -126,7 +126,8 @@ public: FloatComplexMatrix stack (const FloatComplexColumnVector& a) const; FloatComplexMatrix stack (const FloatComplexDiagMatrix& a) const; - FloatComplexMatrix hermitian (void) const; // complex conjugate transpose + FloatComplexMatrix hermitian (void) const + { return MArray2<FloatComplex>::hermitian (std::conj); } FloatComplexMatrix transpose (void) const { return MArray2<FloatComplex>::transpose (); } diff --git a/liboctave/fCRowVector.cc b/liboctave/fCRowVector.cc --- a/liboctave/fCRowVector.cc +++ b/liboctave/fCRowVector.cc @@ -224,17 +224,16 @@ FloatComplexRowVector::append (const Flo return retval; } -FloatComplexColumnVector +FloatComplexColumnVector FloatComplexRowVector::hermitian (void) const { - octave_idx_type len = length (); - return FloatComplexColumnVector (mx_inline_conj_dup (data (), len), len); -} - -FloatComplexColumnVector + return MArray<FloatComplex>::hermitian (std::conj); +} + +FloatComplexColumnVector FloatComplexRowVector::transpose (void) const { - return FloatComplexColumnVector (*this); + return MArray<FloatComplex>::transpose (); } FloatComplexRowVector diff --git a/liboctave/fCRowVector.h b/liboctave/fCRowVector.h --- a/liboctave/fCRowVector.h +++ b/liboctave/fCRowVector.h @@ -70,7 +70,7 @@ public: FloatComplexRowVector append (const FloatRowVector& a) const; FloatComplexRowVector append (const FloatComplexRowVector& a) const; - FloatComplexColumnVector hermitian (void) const; // complex conjugate transpose. + FloatComplexColumnVector hermitian (void) const; FloatComplexColumnVector transpose (void) const; friend FloatComplexRowVector conj (const FloatComplexRowVector& a); diff --git a/liboctave/fColVector.cc b/liboctave/fColVector.cc --- a/liboctave/fColVector.cc +++ b/liboctave/fColVector.cc @@ -142,7 +142,7 @@ FloatRowVector FloatRowVector FloatColumnVector::transpose (void) const { - return FloatRowVector (*this); + return MArray<float>::transpose(); } FloatColumnVector diff --git a/liboctave/fDiagMatrix.cc b/liboctave/fDiagMatrix.cc --- a/liboctave/fDiagMatrix.cc +++ b/liboctave/fDiagMatrix.cc @@ -136,12 +136,6 @@ FloatDiagMatrix::fill (const FloatRowVec elem (i+beg, i+beg) = a.elem (i); return *this; -} - -FloatDiagMatrix -FloatDiagMatrix::transpose (void) const -{ - return FloatDiagMatrix (mx_inline_dup (data (), length ()), cols (), rows ()); } FloatDiagMatrix diff --git a/liboctave/fDiagMatrix.h b/liboctave/fDiagMatrix.h --- a/liboctave/fDiagMatrix.h +++ b/liboctave/fDiagMatrix.h @@ -70,7 +70,7 @@ public: FloatDiagMatrix& fill (const FloatColumnVector& a, octave_idx_type beg); FloatDiagMatrix& fill (const FloatRowVector& a, octave_idx_type beg); - FloatDiagMatrix transpose (void) const; + FloatDiagMatrix transpose (void) const { return MDiagArray2<float>::transpose(); } friend OCTAVE_API FloatDiagMatrix real (const FloatComplexDiagMatrix& a); friend OCTAVE_API FloatDiagMatrix imag (const FloatComplexDiagMatrix& a); diff --git a/liboctave/fRowVector.cc b/liboctave/fRowVector.cc --- a/liboctave/fRowVector.cc +++ b/liboctave/fRowVector.cc @@ -144,7 +144,7 @@ FloatColumnVector FloatColumnVector FloatRowVector::transpose (void) const { - return FloatColumnVector (*this); + return MArray<float>::transpose(); } FloatRowVector |
|
|
[Changeset] Re: Faster Array transposeOn 5-May-2008, David Bateman wrote:
| I got previous without the patch | | N = 128, time = 0.0016666 sec | N = 129, time = 0.0003333 sec | N = 1024, time = 0.0439971 sec | N = 1025, time = 0.0336645 sec | N = 2048, time = 0.276649 sec | N = 2049, time = 0.172656 sec | N = 4096, time = 1.19492 sec | N = 4097, time = 0.700954 sec | | and with it | | N = 128, time = 0 sec | N = 129, time = 0 sec | N = 1024, time = 0.0319979 sec | N = 1025, time = 0.0243318 sec | N = 2048, time = 0.170655 sec | N = 2049, time = 0.117326 sec | N = 4096, time = 0.745285 sec | N = 4097, time = 0.558964 sec | | and similar improvements for the hermitian operator. That's a lot of work for a fairly modest improvement... Is there a ChangeLog entry for this? With that, I'd apply the change (but skipping the diffs for the float matrix files I don't have yet). Also, how about some tests for this since the algorithm is now more complex? Thanks, jwe |
|
|
Re: [Changeset] Re: Faster Array transposeWell transpose and hermitian operators are used rather often and so even modest gains are useful. Also the gain is about a factor of 2 for matrix sizes divisible by 8. In any case the reason I started this was to move more methods into the Array class itself such that the float array classes could be simplified, so the speed gain in that sense in secondary. Still need a little work there before I'd suggest you take my code, but its almost there. Humm, I was sure I wrote one, but it seems not. I;ll send another changeset with one later. All the float stuff is on my respository at hg.dbateman.org, starting at changeset 8002. Sure ok, but there are lots of places where make check already uses the transpose and that was useful in debugging the code.. In any case I'll supply some specific tests later as well. D. |
|
|
Re: [Changeset] Re: Faster Array transposedbateman wrote:
> > > John W. Eaton wrote: >> Is there a ChangeLog entry for this? With that, I'd apply the change >> (but skipping the diffs for the float matrix files I don't have yet). >> > > Humm, I was sure I wrote one, but it seems not. I;ll send another changeset > with one later. > > All the float stuff is on my respository at hg.dbateman.org, starting at > changeset 8002. between the patch of this code you'll take and the single precision code that you won't. So perhaps if you'd take the single precision code before this stuff it might be easier. Could you examine further the changesets 8002 8003 8004 8006 8015 8017 8019 8056 from http://hg.dbateman.org/octave and see what further you'd like to be added to the single precision code before accepting it into 3.1.. I'll work on the single precision type further in any case, including * single precision version of quad * perhaps the same for daspk, glpk, etc, but less sure of that * Move more methods from teh derived classes to the Array classes to clean up the code. Regards David # HG changeset patch # User David Bateman <dbateman@...> # Date 1210108550 -7200 # Node ID e87f87f45d7a149ca5d02849494670b55b0b046b # Parent b385f05462735470b01ecee8da054d986494909b Add tests for tranpose/hermitian and ChangeLog entry for new transpose code diff --git a/liboctave/Array.cc b/liboctave/Array.cc --- a/liboctave/Array.cc +++ b/liboctave/Array.cc @@ -1322,6 +1322,41 @@ Array<T>::hermitian (T (*fcn) (const T&) return result; } } + +/* + +%% Tranpose tests for matrices of the tile size and plus or minus a row +%% and with four tiles. + +%!shared m7, mt7, m8, mt8, m9, mt9 +%! m7 = reshape (1 : 7*8, 8, 7); +%! mt7 = [1:7; 1:7, 1:7, 1:7, 1:7; 1:7, 1:7, 1:7]; +%! m8 = reshape (1 : 8*8, 8, 8); +%! mt8 = [1:8; 1:8, 1:8, 1:8, 1:8; 1:8, 1:8, 1:8]; +%! m9 = reshape (1 : 9*8, 8, 9); +%! mt9 = [1:9; 1:9, 1:9, 1:9, 1:9; 1:9, 1:9, 1:9]; + +%!assert (m7', mt7) +%!assert ((1i*m7).', 1i * mt7) +%!assert ((1i*m7)', conj (1i * mt7)) +%!assert (m8', mt8) +%!assert ((1i*m8).', 1i * mt8) +%!assert ((1i*m8)', conj (1i * mt8)) +%!assert (m9', mt9) +%!assert ((1i*m9).', 1i * mt9) +%!assert ((1i*m9)', conj (1i * mt9)) + +%!assert ([m7, m7; m8, m8]', [mt7, mt8; mt7, mt8]) +%!assert ((1i*[m7, m7; m8, m8]).', 1i * [mt7, mt8; mt7, mt8]) +%!assert ((1i*[m7, m7; m8, m8])', conj (1i * [mt7, mt8; mt7, mt8])) +%!assert ([m8, m8; m8, m8]', [mt8, mt8; mt8, mt8]) +%!assert ((1i*[m8, m8; m8, m8]).', 1i * [mt8, mt8; mt8, mt8]) +%!assert ((1i*[m8, m8; m8, m8])', conj (1i * [mt8, mt8; mt8, mt8])) +%!assert ([m9, m9; m8, m8]', [mt9, mt8; mt9, mt8]) +%!assert ((1i*[m9, m9; m8, m8]).', 1i * [mt9, mt8; mt9, mt8]) +%!assert ((1i*[m9, m9; m8, m8])', conj (1i * [mt9, mt8; mt9, mt8])) + +*/ template <class T> T * diff --git a/liboctave/ChangeLog b/liboctave/ChangeLog --- a/liboctave/ChangeLog +++ b/liboctave/ChangeLog @@ -1,3 +1,48 @@ 2008-05-05 John W. Eaton <jwe@... +2008-05-06 David Bateman <dbateman@...> + + * Array.cc (Array<T> Array<T>::transpose () const): Modify for tiled + transpose to limit the number of cache misses. + (Array<T> Array<T>::hermitian (T (*)(const&)) const): New method + for matrix conjugate transpose. + * Array.h (Array<T> hermitian (T (*)(const&)) const): Declare it. + + * DiagArray2.cc (DiagArray2<T> DiagArray2<T>::transpose () const): + Specialization for diagonal arrays. + (DiagArray2<T> DiagArray2<T>::transpose (T (*) (const&)) const): + Ditto. + + * MArray.h (MArray<T> hermitian <T (*) (const&)) const): New method. + (MArray<T> transpose () const): Ditto. + * MArray2.h (MArray2<T> hermitian <T (*) (const&)) const): Ditto. + * Array2.h (Array2<T> hermitian <T (*) (const&)) const): Ditto. + * ArrayN.h (ArrayN<T> hermitian <T (*) (const&)) const): Ditto. + * MDiagArray2.h (MDiagArray2<T> transpose () const): Ditto. + (MDiagArray<T> hermitian <T (*) (const&)) const): Ditto. + + * CColVector.cc (transpose, hermitian): Define in terms of base class. + * CRowVector.cc (transpose, hermitian): Ditto. + * dColVector.cc (transpose): Ditto. + * dRowVector.cc (transpose): Ditto. + * CDiagMatrix.h (transpose, hermitian): Ditto. + * dDiagMatrix.h (transpose): Ditto. + + * fCColVector.cc (transpose, hermitian): Define in terms of base class. + * fCRowVector.cc (transpose, hermitian): Ditto. + * fColVector.cc (transpose): Ditto. + * fRowVector.cc (transpose): Ditto. + * fCDiagMatrix.h (transpose, hermitian): Ditto. + * fDiagMatrix.h (transpose): Ditto. + + * CDiagMatrix.cc (ComplexDiagMatrix::transpose, + ComplexDiagMatrix::hermitian): Delete. + * dDiagMatrix.cc (DiagMatrix::transpose): Ditto. + * CMatrix.cc (ComplexMatrix::hermitian): Ditto. + + * fCDiagMatrix.cc (FloatComplexDiagMatrix::transpose, + FloatComplexDiagMatrix::hermitian): Delete. + * fDiagMatrix.cc (FloatDiagMatrix::transpose): Ditto. + * fCMatrix.cc (FloatComplexMatrix::hermitian): Ditto. + 2008-05-05 John W. Eaton <jwe@...> * cmd-edit.cc (command_editor::re_read_init_file, |
|
|
Re: [Changeset] Re: Faster Array transposeOn 6-May-2008, David Bateman wrote:
| dbateman wrote: | > | > | > John W. Eaton wrote: | >> Is there a ChangeLog entry for this? With that, I'd apply the change | >> (but skipping the diffs for the float matrix files I don't have yet). | >> | > | > Humm, I was sure I wrote one, but it seems not. I;ll send another changeset | > with one later. | > | > All the float stuff is on my respository at hg.dbateman.org, starting at | > changeset 8002. | | The changeset is attached, however, I'm not sure how to handle the split | between the patch of this code you'll take and the single precision code | that you won't. So perhaps if you'd take the single precision code | before this stuff it might be easier. Could you examine further the | changesets | | 8002 | 8003 | 8004 | 8006 | 8015 | 8017 | 8019 | 8056 | | from http://hg.dbateman.org/octave and see what further you'd like to be | added to the single precision code before accepting it into 3.1.. I applied all these changesets in one batch to my archive. | I'll work on the single precision type further in any case, including | | * single precision version of quad OK. | * perhaps the same for daspk, glpk, etc, but less sure of that This is not a high priority for me. | * Move more methods from teh derived classes to the Array classes to | clean up the code. OK. Thanks, jwe |
|
|
Re: [Changeset] Re: Faster Array transposeOn Wed, May 21, 2008 at 2:06 AM, John W. Eaton <jwe@...> wrote:
> On 6-May-2008, David Bateman wrote: > > | dbateman wrote: > | > > | > > | > John W. Eaton wrote: > | >> Is there a ChangeLog entry for this? With that, I'd apply the change > | >> (but skipping the diffs for the float matrix files I don't have yet). > | >> > | > > | > Humm, I was sure I wrote one, but it seems not. I;ll send another changeset > | > with one later. > | > > | > All the float stuff is on my respository at hg.dbateman.org, starting at > | > changeset 8002. > | > | The changeset is attached, however, I'm not sure how to handle the split > | between the patch of this code you'll take and the single precision code > | that you won't. So perhaps if you'd take the single precision code > | before this stuff it might be easier. Could you examine further the > | changesets > | > | 8002 > | 8003 > | 8004 > | 8006 > | 8015 > | 8017 > | 8019 > | 8056 > | > | from http://hg.dbateman.org/octave and see what further you'd like to be > | added to the single precision code before accepting it into 3.1.. > > I applied all these changesets in one batch to my archive. > > | I'll work on the single precision type further in any case, including > | > | * single precision version of quad > > OK. > > | * perhaps the same for daspk, glpk, etc, but less sure of that > > This is not a high priority for me. > > | * Move more methods from teh derived classes to the Array classes to > | clean up the code. > > OK. > > Thanks, > > jwe > What a huge update :) I see this is interferring non-trivially with all my proposed changes (the compound operators repo and two pending patches.) Shall I merge them? (And re-send the patches, probably). -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: [Changeset] Re: Faster Array transposeOn 21-May-2008, Jaroslav Hajek wrote:
| What a huge update :) I see this is interferring non-trivially with | all my proposed changes (the compound operators repo and two pending | patches.) Shall I merge them? (And re-send the patches, probably). I was planning to look at those changes today. If you can update the changeset in the next few hours so that it applies cleanly, that would help. Otherwise, I will try to do it. Thanks, jwe |
|
|
Re: [Changeset] Re: Faster Array transposeOn Wed, May 21, 2008 at 12:08 PM, John W. Eaton <jwe@...> wrote:
> On 21-May-2008, Jaroslav Hajek wrote: > > | What a huge update :) I see this is interferring non-trivially with > | all my proposed changes (the compound operators repo and two pending > | patches.) Shall I merge them? (And re-send the patches, probably). > > I was planning to look at those changes today. If you can update the > changeset in the next few hours so that it applies cleanly, that would > help. Otherwise, I will try to do it. > I have two pending changesets: the improved matrix_type SPD check: http://www.nabble.com/matrix_type-check-tt16883771.html#a16883771 and the (quite old) 2D pchip interpolation http://www.nabble.com/pchip-method-for-interp2-tt16049957.html#a16049957 The first one actually applies cleanly onto current tip, but obviously leaves the old code for FloatMatrix and FoatComplexMatrix. A copy would suffice, but I think I can easily remake this to use templates to avoid duplicating code. The second patch should be more trivial, as most of it is m-file code. As for the compound operators repo, I intend to import changes from your main repo, do a merge, then add code to support single precision where appropriate. I.e., that will be two changesets more. You can then simply pull from the repo. Is that OK or is there a more desirable way? > Thanks, > > jwe > -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: [Changeset] Re: Faster Array transposeOk some other changesets from my repository that relate to the single precision type that should be merged are 8072 8073 8076 8077 8094 Done in the above changesets. Me neither, but getting the base classes for this templatized for float/double would be useful and is what I was currently doing, though being a new dad for the last five days has more or less stopped me for the moment. This and the single precision test code is the major thing still to do for this. Cheers David |
|
|
Re: [Changeset] Re: Faster Array transposeOn 21-May-2008, dbateman wrote:
| | | | John W. Eaton wrote: | > | > | Could you examine further the changesets | > | | > | 8002 | > | 8003 | > | 8004 | > | 8006 | > | 8015 | > | 8017 | > | 8019 | > | 8056 | > | | > | from http://hg.dbateman.org/octave and see what further you'd like to be | > | added to the single precision code before accepting it into 3.1.. | > | > I applied all these changesets in one batch to my archive. | > | | Ok some other changesets from my repository that relate to the single | precision type that should be merged are | | 8072 8073 8076 8077 8094 I updated and the latest changeset in the archive is 8092. Did you mean that changeset? It appears to be related to the single precision changes. jwe |
|
|
Re: [Changeset] Re: Faster Array transposeJohn W. Eaton wrote:
> On 21-May-2008, dbateman wrote: > > | > | > | > | John W. Eaton wrote: > | > > | > | Could you examine further the changesets > | > | > | > | 8002 > | > | 8003 > | > | 8004 > | > | 8006 > | > | 8015 > | > | 8017 > | > | 8019 > | > | 8056 > | > | > | > | from http://hg.dbateman.org/octave and see what further you'd like to be > | > | added to the single precision code before accepting it into 3.1.. > | > > | > I applied all these changesets in one batch to my archive. > | > > | > | Ok some other changesets from my repository that relate to the single > | precision type that should be merged are > | > | 8072 8073 8076 8077 8094 > > I updated and the latest changeset in the archive is 8092. Did you > mean that changeset? It appears to be related to the single precision > changes. > > jwe > Try again I forgot to push some of my changes.. D. |
|
|
Re: [Changeset] Re: Faster Array transposeOn 21-May-2008, David Bateman wrote:
| Try again I forgot to push some of my changes.. OK, I think I have them all now. If not, then let me know which changesets I'm missing. Thanks, jwe |
| Free embeddable forum powered by Nabble | Forum Help |