Jsoftware
High-Performance Development Platform

rank by a key

View: New views
7 Messages — Rating Filter:   Alert me  

rank by a key

by Tirrell, Jordan (Consultant) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Given a key (in the sense of /.), I'd like to obtain the rank (the
i.~\:~ kind, not the " kind) of each item within the collection
corresponding to its key. My first instinct was to try (i.~\:~)/. but
this is not exactly what I want.

 

In the following example, collection 0 is 10 30 20 20 and has ranks 3 0
1 1, while collection 1 is 200 100 300 and has ranks 1 2 0. The problem
is the extra 0 appended to make them the same size, and the rearranging
that happens. I want the output to be 3 1 2 0 0 1 1, so that it
corresponds to the original right-side input.

 

   0 1 1 1 0 0 0 (i.~\:~)/. 10 200 100 300 30 20 20

3 0 1 1

1 2 0 0

 

I found that the following expression works.

 

rankbykey=: +/@( =@[ ( [ * (i.~\:~"_1)@:( (*"_1 _))) ] )

 

For example:

 

   0 1 1 1 0 0 0 rankbykey 10 200 100 300 30 20 20

3 1 2 0 0 1 1

   (1 1,1 0,:1 0) rankbykey 0 0,0 1,:0 2

0 1 0

 

However,  rankbykey runs out of memory when I apply it to huge arrays
(15M rows, 7 cols). So then I found the following solution, which works
better on huge arrays.

 

rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</.i.@#@]) )

 

This seems needlessly messy because of the boxing, and I imagine others
have run into the same question, although I haven't been able to find
anything in phrases or the the wiki. It looks like something J should do
easily, but I am no J expert. Does anyone know a better solution that
will work on large arrays?

 

Thanks,

Jordan

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: rank by a key

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 10, 2009 at 4:23 PM, Tirrell, Jordan (Consultant)
<JTirrell@...> wrote:
> Given a key (in the sense of /.), I'd like to obtain the rank (the
> i.~\:~ kind, not the " kind) of each item within the collection
> corresponding to its key. My first instinct was to try (i.~\:~)/. but
> this is not exactly what I want.

To get the keys for this kind of operation, I would use ~.

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: rank by a key

by neitzel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> In the following example, collection 0 is 10 30 20 20 and has ranks 3 0
> 1 1, while collection 1 is 200 100 300 and has ranks 1 2 0. The problem
> is the extra 0 appended to make them the same size, [...]

The usual way to avoid those extra fills is to box the interim values
and to flatten afterwards.  This is what your left tine of your
rankbykey2 is doing, but with some unnecessarily convoluted
"pre-boxing - pre-de-boxing - post-re-boxing" sequence.

> rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</.i.@#@]) )

Developing it directly from your innitial code

>    0 1 1 1 0 0 0 (i.~\:~)/. 10 200 100 300 30 20 20
> 3 0 1 1
> 1 2 0 0

in the naive way I come to:

   0 1 1 1 0 0 0 <@(i.~\:~)/. 10 200 100 300 30 20 20
+-------+-----+
|3 0 1 1|1 2 0|
+-------+-----+
   0 1 1 1 0 0 0 ;@(<@(i.~\:~)/.) 10 200 100 300 30 20 20
3 0 1 1 1 2 0
 
>                                             [...] and the rearranging
> that happens. I want the output to be 3 1 2 0 0 1 1, so that it
> corresponds to the original right-side input.

For that step, I can offer a slight variation, too.  Your right tine in

> rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</.i.@#@]) )

reshuffles the index vector 0 1 2 3 ... N according to the key sets:

   0 1 1 1 0 0 0  ;@([ </. i.@#@])  10 200 100 300 30 20 20
   0 4 5 6 1 2 3

With these values, there is no difference between simple indexing and
re-grading your rank values:

   0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) /: ;@([</.i.@#@])) 10 200 100 300 30 20 20
3 1 2 0 0 1 1
   0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) {~ ;@([</.i.@#@])) 10 200 100 300 30 20 20
3 1 2 0 0 1 1

My idea was to re-grade the rank values a bit more directly on the key:

   0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) /: [) 10 200 100 300 30 20 20
3 1 2 0 0 1 1

works already nicely, but just because the (two) distinct key groups
happen to be introduced in ascending order.  (Perhaps your keys
*ARE* always of this nature?)  In the general case, self-indexing will
help:

   i.~ 8 4 4 4 8 8 8
0 1 1 1 0 0 0
   8 4 4 4 8 8 8  (;@(<@(i.~\:~)/.) /: i.~@[) 10 200 100 300 30 20 20
3 1 2 0 0 1 1

All in all, this is not very different from what you did.  Slighltly
less boxing/unboxing.  Can you measure any difference?

                                                Martin Neitzel
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: rank by a key

by R.E. Boss :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
>    i.~ 8 4 4 4 8 8 8
> 0 1 1 1 0 0 0
>    8 4 4 4 8 8 8  (;@(<@(i.~\:~)/.) /: i.~@[) 10 200 100 300 30 20 20
> 3 1 2 0 0 1 1
>
> All in all, this is not very different from what you did.  Slighltly
> less boxing/unboxing.  Can you measure any difference?
>

or

   8 4 4 4 8 8 8  (;@(<@(i.~\:~)/.) \: [) 10 200 100 300 30 20 20
3 1 2 0 0 1 1


R.E. Boss



----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: rank by a key

by Tirrell, Jordan (Consultant) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I originally defined:
rankbykey1=: +/@( =@[ ( [ * ((i.~\:~)"_1)@:( (*"_1 _))) ] )
rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</.i.@#@]) )

key=: 1 1,1 0,1 1,1 0,1 0,0 1,:1 1
data=: 0 1,1 1,0 2,0 2,2 0,0 0,:1 1
   key rankbykey1 data
2 1 1 2 0 0 0
   key rankbykey2 data
2 1 1 2 0 0 0

I cannot figure out how to use ~: to express this function as Raul
Miller suggested, does anyone see a specific solution using it?

Martin Neitzel gave a better solution for the first part by avoiding
unnecessary boxing/unboxing.

rankbykey3=: (;@(<@(i.~\:~)/.) /: ;@([</.i.@#@]) )

   key rankbykey3 data
2 1 1 2 0 0 0

On my large data set, rankbykey3 runs in just under 24 seconds, as
compared to just over 24 seconds for rankbykey2 (rankbykey1 runs out of
memory).

Using {~ or i.~@[ does not give the correct answer in general.

   key (;@(<@(i.~\:~)/.) {~ ;@([</.i.@#@]) ) data
2 0 0 1 1 2 0
   key (;@(<@(i.~\:~)/.) /: i.~@] ) data
2 1 0 0 1 2 0
   key (;@(<@(i.~\:~)/.) {~ i.~@] ) data
2 1 0 0 2 0 1
   
The distinct keys are not ordered, so (;@(<@(i.~\:~)/.) /: [) or
(;@(<@(i.~\:~)/.) \: [) would not work for my situation (I am applying
it to a large table which is actually ordered, but not by the columns
which are used jointly as the key, or the columns which the ranking is
applied to).

Thanks for the responses!
Jordan




-----Original Message-----
From: programming-bounces@...
[mailto:programming-bounces@...] On Behalf Of
neitzel@...
Sent: Tuesday, November 10, 2009 9:25 PM
To: programming@...
Subject: Re: [Jprogramming] rank by a key

> In the following example, collection 0 is 10 30 20 20 and has ranks 3
0
> 1 1, while collection 1 is 200 100 300 and has ranks 1 2 0. The
problem
> is the extra 0 appended to make them the same size, [...]

The usual way to avoid those extra fills is to box the interim values
and to flatten afterwards.  This is what your left tine of your
rankbykey2 is doing, but with some unnecessarily convoluted
"pre-boxing - pre-de-boxing - post-re-boxing" sequence.

> rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</.i.@#@]) )

Developing it directly from your innitial code

>    0 1 1 1 0 0 0 (i.~\:~)/. 10 200 100 300 30 20 20
> 3 0 1 1
> 1 2 0 0

in the naive way I come to:

   0 1 1 1 0 0 0 <@(i.~\:~)/. 10 200 100 300 30 20 20
+-------+-----+
|3 0 1 1|1 2 0|
+-------+-----+
   0 1 1 1 0 0 0 ;@(<@(i.~\:~)/.) 10 200 100 300 30 20 20
3 0 1 1 1 2 0
 
>                                             [...] and the rearranging
> that happens. I want the output to be 3 1 2 0 0 1 1, so that it
> corresponds to the original right-side input.

For that step, I can offer a slight variation, too.  Your right tine in

> rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</.i.@#@]) )

reshuffles the index vector 0 1 2 3 ... N according to the key sets:

   0 1 1 1 0 0 0  ;@([ </. i.@#@])  10 200 100 300 30 20 20
   0 4 5 6 1 2 3

With these values, there is no difference between simple indexing and
re-grading your rank values:

   0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) /: ;@([</.i.@#@])) 10 200 100 300 30
20 20
3 1 2 0 0 1 1
   0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) {~ ;@([</.i.@#@])) 10 200 100 300 30
20 20
3 1 2 0 0 1 1

My idea was to re-grade the rank values a bit more directly on the key:

   0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) /: [) 10 200 100 300 30 20 20
3 1 2 0 0 1 1

works already nicely, but just because the (two) distinct key groups
happen to be introduced in ascending order.  (Perhaps your keys
*ARE* always of this nature?)  In the general case, self-indexing will
help:

   i.~ 8 4 4 4 8 8 8
0 1 1 1 0 0 0
   8 4 4 4 8 8 8  (;@(<@(i.~\:~)/.) /: i.~@[) 10 200 100 300 30 20 20
3 1 2 0 0 1 1

All in all, this is not very different from what you did.  Slighltly
less boxing/unboxing.  Can you measure any difference?

                                                Martin Neitzel
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: rank by a key

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 13, 2009 at 1:10 PM, Tirrell, Jordan (Consultant) > key=:
1 1,1 0,1 1,1 0,1 0,0 1,:1 1
> data=: 0 1,1 1,0 2,0 2,2 0,0 0,:1 1
>   key rankbykey1 data
> 2 1 1 2 0 0 0
>   key rankbykey2 data
> 2 1 1 2 0 0 0
>
> I cannot figure out how to use ~: to express this function as Raul
> Miller suggested, does anyone see a specific solution using it?

Perhaps something like this would work for you?

   rankbykey3=: (/:@/:@(i.~ ~.)@[ { [:; <@(i.~ \:~)/.)

Thanks,

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: rank by a key

by Tirrell, Jordan (Consultant) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It works great, and in a few seconds less.

Thanks!
Jordan

-----Original Message-----
From: programming-bounces@... [mailto:programming-bounces@...] On Behalf Of Raul Miller
Sent: Friday, November 13, 2009 2:34 PM
To: Programming forum
Subject: Re: [Jprogramming] rank by a key

On Fri, Nov 13, 2009 at 1:10 PM, Tirrell, Jordan (Consultant) > key=:
1 1,1 0,1 1,1 0,1 0,0 1,:1 1
> data=: 0 1,1 1,0 2,0 2,2 0,0 0,:1 1
>   key rankbykey1 data
> 2 1 1 2 0 0 0
>   key rankbykey2 data
> 2 1 1 2 0 0 0
>
> I cannot figure out how to use ~: to express this function as Raul
> Miller suggested, does anyone see a specific solution using it?

Perhaps something like this would work for you?

   rankbykey3=: (/:@/:@(i.~ ~.)@[ { [:; <@(i.~ \:~)/.)

Thanks,

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm