FYI: nth_element

View: New views
8 Messages — Rating Filter:   Alert me  

FYI: nth_element

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi all,

Octave has just got a new function: nth_element:
http://hg.savannah.gnu.org/hgweb/octave/rev/aea3a3a950e1

the docstring:

 -- Built-in Function:  nth_element (X, N)
 -- Built-in Function:  nth_element (X, N, DIM)
     Select the n-th smallest element of a vector, using the ordering
     defined by `sort'.  In other words, the result is equivalent to
     `sort(X)(N)'.  N can also be a contiguous range, either ascending
     `l:u' or descending `u:-1:l', in which case a range of elements is
     returned.  If X is an array, `nth_element' operates along the
     dimension defined by DIM, or the first non-singleton dimension if
     DIM is not given.

     nth_element encapsulates the C++ STL algorithms nth_element and
     partial_sort.  On average, the complexity of the operation is
     O(M*log(K)), where `M = size(X, DIM)' and `K = length (N)'.  This
     function is intended for cases where the ratio K/M is small;
     otherwise, it may be better to use `sort'.

     See also: sort, min, max

In short, it allows extracting a small portion of sort(X) without
actually doing the full sort.
This is sometimes useful for statistics when computing quantiles and
the like... statisticians can surely tell better.

Maybe it can be used to boost some existing functions. As an example,
it can be readily used to speed up the "median" function:
http://hg.savannah.gnu.org/hgweb/octave/rev/b7b89061bd0e

the following benchmark demonstrates the speed-up:

for n = [2000, 2001]
  a = rand (n);
  disp (n);
  tic; median (a(:)); toc
  tic; median (a, 1); toc
  tic; median (a, 2); toc
endfor

using a recent tip, Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native, I get:

 2000
Elapsed time is 0.62825 seconds.
Elapsed time is 0.34634 seconds.
Elapsed time is 0.40085 seconds.
 2001
Elapsed time is 0.62713 seconds.
Elapsed time is 0.34236 seconds.
Elapsed time is 0.39419 seconds.

with the new patch, I get

 2000
Elapsed time is 0.061557 seconds.
Elapsed time is 0.049 seconds.
Elapsed time is 0.074331 seconds.
 2001
Elapsed time is 0.047512 seconds.
Elapsed time is 0.047687 seconds.
Elapsed time is 0.075987 seconds.

Contributing tests would be warmly welcome.
There's a number of corner cases, esp. if NaNs come into play, so I
expect bugs to be still there.

enjoy!

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Re: FYI: nth_element

by shaiay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 15, 2009 at 9:15 AM, Jaroslav Hajek <highegg@...> wrote:

> hi all,
>
> Octave has just got a new function: nth_element:
> http://hg.savannah.gnu.org/hgweb/octave/rev/aea3a3a950e1
>
> the docstring:
>
>  -- Built-in Function:  nth_element (X, N)
>  -- Built-in Function:  nth_element (X, N, DIM)
>     Select the n-th smallest element of a vector, using the ordering
>     defined by `sort'.  In other words, the result is equivalent to
>     `sort(X)(N)'.  N can also be a contiguous range, either ascending
>     `l:u' or descending `u:-1:l', in which case a range of elements is
>     returned.  If X is an array, `nth_element' operates along the
>     dimension defined by DIM, or the first non-singleton dimension if
>     DIM is not given.
>
>     nth_element encapsulates the C++ STL algorithms nth_element and
>     partial_sort.  On average, the complexity of the operation is
>     O(M*log(K)), where `M = size(X, DIM)' and `K = length (N)'.  This
>     function is intended for cases where the ratio K/M is small;
>     otherwise, it may be better to use `sort'.
>
>     See also: sort, min, max
>
> In short, it allows extracting a small portion of sort(X) without
> actually doing the full sort.
> This is sometimes useful for statistics when computing quantiles and
> the like... statisticians can surely tell better.
>
> Maybe it can be used to boost some existing functions. As an example,
> it can be readily used to speed up the "median" function:
> http://hg.savannah.gnu.org/hgweb/octave/rev/b7b89061bd0e
>
It can also help hist quite a lot.

Shai


Re: FYI: nth_element

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 15, 2009 at 9:56 AM, Shai Ayal <shaiay@...> wrote:

> On Thu, Oct 15, 2009 at 9:15 AM, Jaroslav Hajek <highegg@...> wrote:
>> hi all,
>>
>> Octave has just got a new function: nth_element:
>> http://hg.savannah.gnu.org/hgweb/octave/rev/aea3a3a950e1
>>
>> the docstring:
>>
>>  -- Built-in Function:  nth_element (X, N)
>>  -- Built-in Function:  nth_element (X, N, DIM)
>>     Select the n-th smallest element of a vector, using the ordering
>>     defined by `sort'.  In other words, the result is equivalent to
>>     `sort(X)(N)'.  N can also be a contiguous range, either ascending
>>     `l:u' or descending `u:-1:l', in which case a range of elements is
>>     returned.  If X is an array, `nth_element' operates along the
>>     dimension defined by DIM, or the first non-singleton dimension if
>>     DIM is not given.
>>
>>     nth_element encapsulates the C++ STL algorithms nth_element and
>>     partial_sort.  On average, the complexity of the operation is
>>     O(M*log(K)), where `M = size(X, DIM)' and `K = length (N)'.  This
>>     function is intended for cases where the ratio K/M is small;
>>     otherwise, it may be better to use `sort'.
>>
>>     See also: sort, min, max
>>
>> In short, it allows extracting a small portion of sort(X) without
>> actually doing the full sort.
>> This is sometimes useful for statistics when computing quantiles and
>> the like... statisticians can surely tell better.
>>
>> Maybe it can be used to boost some existing functions. As an example,
>> it can be readily used to speed up the "median" function:
>> http://hg.savannah.gnu.org/hgweb/octave/rev/b7b89061bd0e
>>
> It can also help hist quite a lot.
>

I don't see how. Can you clarify?


--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


Re: FYI: nth_element

by shaiay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 15, 2009 at 12:36 PM, Jaroslav Hajek <highegg@...> wrote:

> On Thu, Oct 15, 2009 at 9:56 AM, Shai Ayal <shaiay@...> wrote:
>> On Thu, Oct 15, 2009 at 9:15 AM, Jaroslav Hajek <highegg@...> wrote:
>>> hi all,
>>>
>>> Octave has just got a new function: nth_element:
>>> http://hg.savannah.gnu.org/hgweb/octave/rev/aea3a3a950e1
>>>
>>> the docstring:
>>>
>>>  -- Built-in Function:  nth_element (X, N)
>>>  -- Built-in Function:  nth_element (X, N, DIM)
>>>     Select the n-th smallest element of a vector, using the ordering
>>>     defined by `sort'.  In other words, the result is equivalent to
>>>     `sort(X)(N)'.  N can also be a contiguous range, either ascending
>>>     `l:u' or descending `u:-1:l', in which case a range of elements is
>>>     returned.  If X is an array, `nth_element' operates along the
>>>     dimension defined by DIM, or the first non-singleton dimension if
>>>     DIM is not given.
>>>
>>>     nth_element encapsulates the C++ STL algorithms nth_element and
>>>     partial_sort.  On average, the complexity of the operation is
>>>     O(M*log(K)), where `M = size(X, DIM)' and `K = length (N)'.  This
>>>     function is intended for cases where the ratio K/M is small;
>>>     otherwise, it may be better to use `sort'.
>>>
>>>     See also: sort, min, max
>>>
>>> In short, it allows extracting a small portion of sort(X) without
>>> actually doing the full sort.
>>> This is sometimes useful for statistics when computing quantiles and
>>> the like... statisticians can surely tell better.
>>>
>>> Maybe it can be used to boost some existing functions. As an example,
>>> it can be readily used to speed up the "median" function:
>>> http://hg.savannah.gnu.org/hgweb/octave/rev/b7b89061bd0e
>>>
>> It can also help hist quite a lot.
>>
>
> I don't see how. Can you clarify?

You are right -- I was confused. hist needs the number of elements
smaller than a given value (and uses sort to find out), while
nth_element gives the value of the nth element.
Writing hist in c++ is not hard and would really help since it uses
(M+K)log(M+K) sort in order to avoid a M*K loop just because looping
is slow in the scripting language. One of these days I'll get down to
doing that ...

Shai


Re: FYI: nth_element

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 15, 2009 at 2:22 PM, Shai Ayal <shaiay@...> wrote:

> On Thu, Oct 15, 2009 at 12:36 PM, Jaroslav Hajek <highegg@...> wrote:
>> On Thu, Oct 15, 2009 at 9:56 AM, Shai Ayal <shaiay@...> wrote:
>>> On Thu, Oct 15, 2009 at 9:15 AM, Jaroslav Hajek <highegg@...> wrote:
>>>> hi all,
>>>>
>>>> Octave has just got a new function: nth_element:
>>>> http://hg.savannah.gnu.org/hgweb/octave/rev/aea3a3a950e1
>>>>
>>>> the docstring:
>>>>
>>>>  -- Built-in Function:  nth_element (X, N)
>>>>  -- Built-in Function:  nth_element (X, N, DIM)
>>>>     Select the n-th smallest element of a vector, using the ordering
>>>>     defined by `sort'.  In other words, the result is equivalent to
>>>>     `sort(X)(N)'.  N can also be a contiguous range, either ascending
>>>>     `l:u' or descending `u:-1:l', in which case a range of elements is
>>>>     returned.  If X is an array, `nth_element' operates along the
>>>>     dimension defined by DIM, or the first non-singleton dimension if
>>>>     DIM is not given.
>>>>
>>>>     nth_element encapsulates the C++ STL algorithms nth_element and
>>>>     partial_sort.  On average, the complexity of the operation is
>>>>     O(M*log(K)), where `M = size(X, DIM)' and `K = length (N)'.  This
>>>>     function is intended for cases where the ratio K/M is small;
>>>>     otherwise, it may be better to use `sort'.
>>>>
>>>>     See also: sort, min, max
>>>>
>>>> In short, it allows extracting a small portion of sort(X) without
>>>> actually doing the full sort.
>>>> This is sometimes useful for statistics when computing quantiles and
>>>> the like... statisticians can surely tell better.
>>>>
>>>> Maybe it can be used to boost some existing functions. As an example,
>>>> it can be readily used to speed up the "median" function:
>>>> http://hg.savannah.gnu.org/hgweb/octave/rev/b7b89061bd0e
>>>>
>>> It can also help hist quite a lot.
>>>
>>
>> I don't see how. Can you clarify?
>
> You are right -- I was confused. hist needs the number of elements
> smaller than a given value (and uses sort to find out), while
> nth_element gives the value of the nth element.
> Writing hist in c++ is not hard and would really help since it uses
> (M+K)log(M+K) sort in order to avoid a M*K loop just because looping
> is slow in the scripting language. One of these days I'll get down to
> doing that ...
>
> Shai
>

Why not just use a binary search, via `lookup'? That would be
O(M*log(K) + K) (M the number of data and K the number of bins).
Look what histc does... maybe hist can even simply call histc.

hth

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


Re: FYI: nth_element

by shaiay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 15, 2009 at 2:31 PM, Jaroslav Hajek <highegg@...> wrote:

> On Thu, Oct 15, 2009 at 2:22 PM, Shai Ayal <shaiay@...> wrote:
>> On Thu, Oct 15, 2009 at 12:36 PM, Jaroslav Hajek <highegg@...> wrote:
>>> On Thu, Oct 15, 2009 at 9:56 AM, Shai Ayal <shaiay@...> wrote:
>>>> On Thu, Oct 15, 2009 at 9:15 AM, Jaroslav Hajek <highegg@...> wrote:
>>>>> hi all,
>>>>>
>>>>> Octave has just got a new function: nth_element:
>>>>> http://hg.savannah.gnu.org/hgweb/octave/rev/aea3a3a950e1
>>>>>
>>>>> the docstring:
>>>>>
>>>>>  -- Built-in Function:  nth_element (X, N)
>>>>>  -- Built-in Function:  nth_element (X, N, DIM)
>>>>>     Select the n-th smallest element of a vector, using the ordering
>>>>>     defined by `sort'.  In other words, the result is equivalent to
>>>>>     `sort(X)(N)'.  N can also be a contiguous range, either ascending
>>>>>     `l:u' or descending `u:-1:l', in which case a range of elements is
>>>>>     returned.  If X is an array, `nth_element' operates along the
>>>>>     dimension defined by DIM, or the first non-singleton dimension if
>>>>>     DIM is not given.
>>>>>
>>>>>     nth_element encapsulates the C++ STL algorithms nth_element and
>>>>>     partial_sort.  On average, the complexity of the operation is
>>>>>     O(M*log(K)), where `M = size(X, DIM)' and `K = length (N)'.  This
>>>>>     function is intended for cases where the ratio K/M is small;
>>>>>     otherwise, it may be better to use `sort'.
>>>>>
>>>>>     See also: sort, min, max
>>>>>
>>>>> In short, it allows extracting a small portion of sort(X) without
>>>>> actually doing the full sort.
>>>>> This is sometimes useful for statistics when computing quantiles and
>>>>> the like... statisticians can surely tell better.
>>>>>
>>>>> Maybe it can be used to boost some existing functions. As an example,
>>>>> it can be readily used to speed up the "median" function:
>>>>> http://hg.savannah.gnu.org/hgweb/octave/rev/b7b89061bd0e
>>>>>
>>>> It can also help hist quite a lot.
>>>>
>>>
>>> I don't see how. Can you clarify?
>>
>> You are right -- I was confused. hist needs the number of elements
>> smaller than a given value (and uses sort to find out), while
>> nth_element gives the value of the nth element.
>> Writing hist in c++ is not hard and would really help since it uses
>> (M+K)log(M+K) sort in order to avoid a M*K loop just because looping
>> is slow in the scripting language. One of these days I'll get down to
>> doing that ...
>>
>> Shai
>>
>
> Why not just use a binary search, via `lookup'? That would be
> O(M*log(K) + K) (M the number of data and K the number of bins).
> Look what histc does... maybe hist can even simply call histc.
yes, I think hist should call histc -- the main difference between
them is that hist plots the results

Shai


Re: FYI: nth_element

by Jordi Gutiérrez Hermoso :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/10/15 Jaroslav Hajek <highegg@...>:
>     nth_element encapsulates the C++ STL algorithms nth_element and
>     partial_sort.

Silly and pointless nitpickery: would you please refer to it as  the
C++ standard library instead of STL? Although nth_element and
partial_sort are indeed in the STL that SGI contribute to the C++
stdlib and the GNU implementation is based on SGI's contributions, the
term "STL", 11 years after being standardised and very well
implemented, still carries connotations of "strange, bloated, foreign,
and nonstandard part of C++".

For marketing purposes, I think it's better to say that these are
Octave functions that are wrapping standard C++ functions. That's all.
;-)

Re: FYI: nth_element

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/10/17 Jordi Gutiérrez Hermoso <jordigh@...>:

> 2009/10/15 Jaroslav Hajek <highegg@...>:
>>     nth_element encapsulates the C++ STL algorithms nth_element and
>>     partial_sort.
>
> Silly and pointless nitpickery: would you please refer to it as  the
> C++ standard library instead of STL? Although nth_element and
> partial_sort are indeed in the STL that SGI contribute to the C++
> stdlib and the GNU implementation is based on SGI's contributions, the
> term "STL", 11 years after being standardised and very well
> implemented, still carries connotations of "strange, bloated, foreign,
> and nonstandard part of C++".
>

Sorry, I was not aware of the distinction. For me, STL was simply the
containers+algorithms part of C++ standard library.

> For marketing purposes, I think it's better to say that these are
> Octave functions that are wrapping standard C++ functions. That's all.
> ;-)
>

Marketing? I hope not :)


--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz