matching each row

View: New views
6 Messages — Rating Filter:   Alert me  

matching each row

by tathta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables).  
I would like to figure out how many times each id number appears in each dataframe.  

So far I can use:
length( match (dataframeA$unique.id[1], dataframeB$unique.id) )

but this only works on each row of dataframe A one-at-a-time.  

I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count


I'm new to R, so please be patient with me!



thx

Re: matching each row

by tathta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

From an email suggestion, here are two sample datasets, and my ideal output:

dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2)
dataB <- data.frame(unique.id=c("A","B","A","B","A","C","D","A"),x=27:20,y=22:29)

## mystery operation(s) happen here....

## ideal output would be:
dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2))


so my mystery operation(s) would count the number of times the unique id shows up in a given dataset.  
my ideal outputs are as follows:
countA is the "mystery operation" applied to dataA (counting occurrences within the same dataset)
countB is applied to dataB (counting occurrences within a second dataset).  



My best try so far is to do:
tempA <- aggregate(dataA$unique.id,list(dataA$unique.id),length)

which gives me a matrix with ONE instance of each unique.id and the counts...
(and which I thought was kinda cute)
but it only works for within a single dataset!



tathta wrote:
I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables).  
I would like to figure out how many times each id number appears in each dataframe.  

So far I can use:
length( match (dataframeA$unique.id[1], dataframeB$unique.id) )

but this only works on each row of dataframe A one-at-a-time.  

I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count


I'm new to R, so please be patient with me!



thx

Re: matching each row

by HufferD :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Something like this?

  > dataframeA <- data.frame (
  +   unique.id= c(1,1,3,3,3,5,7,7, 9)
  +   , x1=rnorm(9)
  +   , x2=rnorm(9)
  +   , x3=rnorm(9)
  + )
  > dataframeB <- data.frame (
  +   unique.id= c(2,3,4,5,5,5,6,7,9,10,10)
  +   , x4=rnorm(11)
  +   , x5=rnorm(11)
  +   , x6=rnorm(11)
  + )
  > match.counts <- function ( x , y ) {
  +   out <- cbind (
  +     table ( x [ which ( x %in% y ) ] )
  +     , table ( y [ which ( y %in% x ) ] )
  +   )
  +   dimnames ( out ) [[2]] <- c ( "N in x" , "N in y" )
  +   out
  + }
  > match.counts ( dataframeA$unique.id , dataframeB$unique.id )
    N in x N in y
  3      3      1
  5      1      3
  7      2      1
  9      1      1
  >

--
 David
 
 -----------------------------------------------------
 David Huffer, Ph.D.               Senior Statistician
 CSOSA/Washington, DC           david.huffer@...
 -----------------------------------------------------

-----Original Message-----
From: r-help-bounces@... [mailto:r-help-bounces@...] On Behalf Of tathta
Sent: Wednesday, July 08, 2009 11:10 AM
To: r-help@...
Subject: [R] matching each row


I have two dataframes, the first column of each dataframe is a unique id
number (the rest of the columns are data variables).  
I would like to figure out how many times each id number appears in each
dataframe.  

So far I can use:
length( match (dataframeA$unique.id[1], dataframeB$unique.id) )

but this only works on each row of dataframe A one-at-a-time.  

I would like to do this for all of the rows in dataframe A, and then put the
results in a new variable: dataframeA$count


I'm new to R, so please be patient with me!


Sorry if this question has already been answered, my search of the archives
only brought up one relevant post, and I didn't understand the answer to
it....  http://www.nabble.com/match-to20799206.html#a20799206


thx
--
View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24393051.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: matching each row

by tathta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Close...  

The output I'm looking for is more like this:

output <- data.frame(unique.id=c(1,3,5,7,9),N.in.x=c(2,3,1,2,1),N.in.y=c(0,1,3,1,1))

The first column can be gotten using a small change to the first table line:
table ( x [ which ( x %in% x ) ] )   ##the 3rd "x" used to be a "y"

but I can't modify it to make the second "ideal output" column, I just end up with warnings...  




Something like this?

  > dataframeA <- data.frame (
  +   unique.id= c(1,1,3,3,3,5,7,7, 9)
  +   , x1=rnorm(9)
  +   , x2=rnorm(9)
  +   , x3=rnorm(9)
  + )
  > dataframeB <- data.frame (
  +   unique.id= c(2,3,4,5,5,5,6,7,9,10,10)
  +   , x4=rnorm(11)
  +   , x5=rnorm(11)
  +   , x6=rnorm(11)
  + )
  > match.counts <- function ( x , y ) {
  +   out <- cbind (
  +     table ( x [ which ( x %in% y ) ] )
  +     , table ( y [ which ( y %in% x ) ] )
  +   )
  +   dimnames ( out ) [[2]] <- c ( "N in x" , "N in y" )
  +   out
  + }
  > match.counts ( dataframeA$unique.id , dataframeB$unique.id )
    N in x N in y
  3      3      1
  5      1      3
  7      2      1
  9      1      1
  >

--
 David
 

Re: matching each row

by Marc Schwartz-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jul 8, 2009, at 10:09 AM, tathta wrote:

>
> I have two dataframes, the first column of each dataframe is a  
> unique id
> number (the rest of the columns are data variables).
> I would like to figure out how many times each id number appears in  
> each
> dataframe.
>
> So far I can use:
> length( match (dataframeA$unique.id[1], dataframeB$unique.id) )
>
> but this only works on each row of dataframe A one-at-a-time.
>
> I would like to do this for all of the rows in dataframe A, and then  
> put the
> results in a new variable: dataframeA$count
>
>
> I'm new to R, so please be patient with me!
>
>
> Sorry if this question has already been answered, my search of the  
> archives
> only brought up one relevant post, and I didn't understand the  
> answer to
> it....  http://www.nabble.com/match-to20799206.html#a20799206


If I am correctly understanding what you are looking for, you could do  
something like the following:

# Create some simple data. Note that only a subset of the ID's (3:5)  
will match across the two DF's:
set.seed(1)
DF.A <- data.frame(ID = sample(1:5, 10, replace = TRUE))
DF.B <- data.frame(ID = sample(3:7, 10, replace = TRUE))

 > DF.A
    ID
1   2
2   2
3   3
4   5
5   2
6   5
7   5
8   4
9   4
10  1

 > DF.B
    ID
1   4
2   3
3   6
4   4
5   6
6   5
7   6
8   7
9   4
10  6


Now, create counts of the IDs in each, coercing the results to data  
frames and setting the count column name for each:

TAB.A <- as.data.frame(table(DF.A$ID), responseName = "Count.A")
TAB.B <- as.data.frame(table(DF.B$ID), responseName = "Count.B")

 > TAB.A
   Var1 Count.A
1    1       1
2    2       3
3    3       1
4    4       2
5    5       3

 > TAB.B
   Var1 Count.B
1    3       1
2    4       3
3    5       1
4    6       4
5    7       1


Now, use merge() to join each of the two above. 'all = TRUE' will  
include non-matching keys:

 > merge(TAB.A, TAB.B, by = "Var1", all = TRUE)
   Var1 Count.A Count.B
1    1       1      NA
2    2       3      NA
3    3       1       1
4    4       2       3
5    5       3       1
6    6      NA       4
7    7      NA       1


Note that you will get NAs for any non-matching ID's (Var1).

See ?table, ?as.data.frame and ?merge for more information.

HTH,

Marc Schwartz

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: matching each row

by Marc Schwartz-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jul 8, 2009, at 12:17 PM, tathta wrote:

>
>> From an email suggestion, here are two sample datasets, and my  
>> ideal output:
>
> dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2)
> dataB <-
> data
> .frame(unique.id=c("A","B","A","B","A","C","D","A"),x=27:20,y=22:29)
>
> ## mystery operation(s) happen here....
>
> ## ideal output would be:
> dataA <-
> data
> .frame
> (unique
> .id
> =c("A","B","C","B"),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2))
>
>
> so my mystery operation(s) would count the number of times the  
> unique id
> shows up in a given dataset.
> my ideal outputs are as follows:
> countA is the "mystery operation" applied to dataA (counting  
> occurrences
> within the same dataset)
> countB is applied to dataB (counting occurrences within a second  
> dataset).
>
>
>
> My best try so far is to do:
> tempA <- aggregate(dataA$unique.id,list(dataA$unique.id),length)
>
> which gives me a matrix with ONE instance of each unique.id and the
> counts...
> (and which I thought was kinda cute)
> but it only works for within a single dataset!

<snip>

Modify my initial proposal:

countA <- as.data.frame(table(dataA$unique.id), responseName = "countA")
countB <- as.data.frame(table(dataB$unique.id), responseName = "countB")

 > countA
   Var1 countA
1    A      1
2    B      2
3    C      1

 > countB
   Var1 countB
1    A      4
2    B      2
3    C      1
4    D      1


dataA <- merge(dataA, countA, by.x = "unique.id", by.y = "Var1")
dataA <- merge(dataA, countB, by.x = "unique.id", by.y = "Var1")

 > dataA
   unique.id  x y countA countB
1         A 11 5      1      4
2         B 12 4      2      2
3         B 14 2      2      2
4         C 13 3      1      1


Note that without 'all.x = TRUE' in the merge() calls, only those  
unique.id's that are common to both datasets will be in the result. If  
you want to include unique.id's that are in A, but not in B, using  
'all.x = TRUE'.

Note also that by default, 'unique.id' will be alpha sorted in the  
output.

HTH,

Marc Schwartz

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.