|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
matching each rowI have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables).
I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! thx |
|
|
Re: matching each rowFrom an email suggestion, here are two sample datasets, and my ideal output:
dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2) dataB <- data.frame(unique.id=c("A","B","A","B","A","C","D","A"),x=27:20,y=22:29) ## mystery operation(s) happen here.... ## ideal output would be: dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2)) so my mystery operation(s) would count the number of times the unique id shows up in a given dataset. my ideal outputs are as follows: countA is the "mystery operation" applied to dataA (counting occurrences within the same dataset) countB is applied to dataB (counting occurrences within a second dataset). My best try so far is to do: tempA <- aggregate(dataA$unique.id,list(dataA$unique.id),length) which gives me a matrix with ONE instance of each unique.id and the counts... (and which I thought was kinda cute) but it only works for within a single dataset!
|
|
|
Re: matching each rowSomething like this?
> dataframeA <- data.frame ( + unique.id= c(1,1,3,3,3,5,7,7, 9) + , x1=rnorm(9) + , x2=rnorm(9) + , x3=rnorm(9) + ) > dataframeB <- data.frame ( + unique.id= c(2,3,4,5,5,5,6,7,9,10,10) + , x4=rnorm(11) + , x5=rnorm(11) + , x6=rnorm(11) + ) > match.counts <- function ( x , y ) { + out <- cbind ( + table ( x [ which ( x %in% y ) ] ) + , table ( y [ which ( y %in% x ) ] ) + ) + dimnames ( out ) [[2]] <- c ( "N in x" , "N in y" ) + out + } > match.counts ( dataframeA$unique.id , dataframeB$unique.id ) N in x N in y 3 3 1 5 1 3 7 2 1 9 1 1 > -- David ----------------------------------------------------- David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huffer@... ----------------------------------------------------- -----Original Message----- From: r-help-bounces@... [mailto:r-help-bounces@...] On Behalf Of tathta Sent: Wednesday, July 08, 2009 11:10 AM To: r-help@... Subject: [R] matching each row I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables). I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! Sorry if this question has already been answered, my search of the archives only brought up one relevant post, and I didn't understand the answer to it.... http://www.nabble.com/match-to20799206.html#a20799206 thx -- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24393051.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: matching each rowClose...
The output I'm looking for is more like this: output <- data.frame(unique.id=c(1,3,5,7,9),N.in.x=c(2,3,1,2,1),N.in.y=c(0,1,3,1,1)) The first column can be gotten using a small change to the first table line: table ( x [ which ( x %in% x ) ] ) ##the 3rd "x" used to be a "y" but I can't modify it to make the second "ideal output" column, I just end up with warnings...
|
|
|
Re: matching each rowOn Jul 8, 2009, at 10:09 AM, tathta wrote:
> > I have two dataframes, the first column of each dataframe is a > unique id > number (the rest of the columns are data variables). > I would like to figure out how many times each id number appears in > each > dataframe. > > So far I can use: > length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) > > but this only works on each row of dataframe A one-at-a-time. > > I would like to do this for all of the rows in dataframe A, and then > put the > results in a new variable: dataframeA$count > > > I'm new to R, so please be patient with me! > > > Sorry if this question has already been answered, my search of the > archives > only brought up one relevant post, and I didn't understand the > answer to > it.... http://www.nabble.com/match-to20799206.html#a20799206 If I am correctly understanding what you are looking for, you could do something like the following: # Create some simple data. Note that only a subset of the ID's (3:5) will match across the two DF's: set.seed(1) DF.A <- data.frame(ID = sample(1:5, 10, replace = TRUE)) DF.B <- data.frame(ID = sample(3:7, 10, replace = TRUE)) > DF.A ID 1 2 2 2 3 3 4 5 5 2 6 5 7 5 8 4 9 4 10 1 > DF.B ID 1 4 2 3 3 6 4 4 5 6 6 5 7 6 8 7 9 4 10 6 Now, create counts of the IDs in each, coercing the results to data frames and setting the count column name for each: TAB.A <- as.data.frame(table(DF.A$ID), responseName = "Count.A") TAB.B <- as.data.frame(table(DF.B$ID), responseName = "Count.B") > TAB.A Var1 Count.A 1 1 1 2 2 3 3 3 1 4 4 2 5 5 3 > TAB.B Var1 Count.B 1 3 1 2 4 3 3 5 1 4 6 4 5 7 1 Now, use merge() to join each of the two above. 'all = TRUE' will include non-matching keys: > merge(TAB.A, TAB.B, by = "Var1", all = TRUE) Var1 Count.A Count.B 1 1 1 NA 2 2 3 NA 3 3 1 1 4 4 2 3 5 5 3 1 6 6 NA 4 7 7 NA 1 Note that you will get NAs for any non-matching ID's (Var1). See ?table, ?as.data.frame and ?merge for more information. HTH, Marc Schwartz ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: matching each rowOn Jul 8, 2009, at 12:17 PM, tathta wrote:
> >> From an email suggestion, here are two sample datasets, and my >> ideal output: > > dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2) > dataB <- > data > .frame(unique.id=c("A","B","A","B","A","C","D","A"),x=27:20,y=22:29) > > ## mystery operation(s) happen here.... > > ## ideal output would be: > dataA <- > data > .frame > (unique > .id > =c("A","B","C","B"),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2)) > > > so my mystery operation(s) would count the number of times the > unique id > shows up in a given dataset. > my ideal outputs are as follows: > countA is the "mystery operation" applied to dataA (counting > occurrences > within the same dataset) > countB is applied to dataB (counting occurrences within a second > dataset). > > > > My best try so far is to do: > tempA <- aggregate(dataA$unique.id,list(dataA$unique.id),length) > > which gives me a matrix with ONE instance of each unique.id and the > counts... > (and which I thought was kinda cute) > but it only works for within a single dataset! <snip> Modify my initial proposal: countA <- as.data.frame(table(dataA$unique.id), responseName = "countA") countB <- as.data.frame(table(dataB$unique.id), responseName = "countB") > countA Var1 countA 1 A 1 2 B 2 3 C 1 > countB Var1 countB 1 A 4 2 B 2 3 C 1 4 D 1 dataA <- merge(dataA, countA, by.x = "unique.id", by.y = "Var1") dataA <- merge(dataA, countB, by.x = "unique.id", by.y = "Var1") > dataA unique.id x y countA countB 1 A 11 5 1 4 2 B 12 4 2 2 3 B 14 2 2 2 4 C 13 3 1 1 Note that without 'all.x = TRUE' in the merge() calls, only those unique.id's that are common to both datasets will be in the result. If you want to include unique.id's that are in A, but not in B, using 'all.x = TRUE'. Note also that by default, 'unique.id' will be alpha sorted in the output. HTH, Marc Schwartz ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Free embeddable forum powered by Nabble | Forum Help |