|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
probem on merge dataHi there,
data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE) data1<-data.frame(data1) names(data1)<-c("areaid","x","y","date") data1 areaid x y date 1 1 1.2 1.3 3/23/2004 2 1 1.5 2.3 3/22/2004 3 2 0.2 3.3 4/23/2004 4 3 1.5 1.3 5/22/2004 data2<-matrix(data=c(1,1.22,1.32,1, 1.53, 2.34,1, 1.21, 1.37,1, 1.52, 2.35,2, 0.21, 3.33,2, 0.23, 3.35,3, 1.57, 1.31,3, 1.59, 1.33),nrow=8,ncol=3,byrow=TRUE) data2<-data.frame(data2) names(data2)<-c("areaid","x1","y1") data2 areaid x1 y1 1 1 1.22 1.32 2 1 1.53 2.34 3 1 1.21 1.37 4 1 1.52 2.35 5 2 0.21 3.33 6 2 0.23 3.35 7 3 1.57 1.31 8 3 1.59 1.33 Explains the two data. You can treat data1 as case dataset and data2 as control dataset,respectively.Note th number of recodes for data2 are 2 times as that of data1 for each records,something like 1:2 matched case-control study design. I hope to merge data1 and data2. Take areaid=1 as an example. >From the two dataset, we can see that data1 has two points(x,y) in areaid=1, and data2 has four points (x1,y1) in areaid=1. Each record in data1 will have two matched records in data2.I want to randomly select 1/2 points of areaid=1 in data2 to link the one record of areaid=1 in the data1, and the other 1/2 points of areaid=1 in data2 to link the other record of areaid=1 in the data1.Actually,the number of records in the same areaid will be over 2 in the actual dataset. This is only an example to explain the problem. For the cases of areaid=2 or 3,they are a little easier than areaid=1 because there are only one value in data1. The final results are something like the following dataset. areaid x1 y1 date x y 1 1.22 1.32 3/23/2004 1.2 1.3 1 1.53 2.34 3/22/2004 1.2 1.3 1 1.21 1.37 3/23/2004 1.5 2.3 1 1.52 2.35 3/22/2004 1.5 2.3 2 0.21 3.33 4/23/2004 0.2 3.3 2 0.23 3.35 4/23/2004 0.2 3.3 3 1.57 1.31 5/22/2004 1.5 1.3 3 1.59 1.33 5/22/2004 1.5 1.3 Any suggestions or help are greatly appreciated. Thanks a lot. [[alternative HTML version deleted]] ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: probem on merge dataHi,
So you want to randomly throw away data? Doesn't sound like a good idea to me... You can get the combined data set using data3 <- merge(data2, data1, all=TRUE) >From there it's just a matter of randomly deleting rows in which the combination of areiad, x1 and x2 are duplicated. I'll leave that to you, but I encourage you to think about whether this is really what you want. -Ista On Thu, Nov 5, 2009 at 11:34 PM, rusers.sh <rusers.sh@...> wrote: > Hi there, > data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE) > data1<-data.frame(data1) > names(data1)<-c("areaid","x","y","date") > data1 > > areaid x y date > 1 1 1.2 1.3 3/23/2004 > 2 1 1.5 2.3 3/22/2004 > 3 2 0.2 3.3 4/23/2004 > 4 3 1.5 1.3 5/22/2004 > data2<-matrix(data=c(1,1.22,1.32,1, 1.53, 2.34,1, 1.21, 1.37,1, 1.52, > 2.35,2, 0.21, 3.33,2, 0.23, 3.35,3, 1.57, 1.31,3, 1.59, > 1.33),nrow=8,ncol=3,byrow=TRUE) > data2<-data.frame(data2) > names(data2)<-c("areaid","x1","y1") > data2 > > areaid x1 y1 > 1 1 1.22 1.32 > 2 1 1.53 2.34 > 3 1 1.21 1.37 > 4 1 1.52 2.35 > 5 2 0.21 3.33 > 6 2 0.23 3.35 > 7 3 1.57 1.31 > 8 3 1.59 1.33 > Explains the two data. You can treat data1 as case dataset and data2 as > control dataset,respectively.Note th number of recodes for data2 are 2 times > as that of data1 for each records,something like 1:2 matched case-control > study design. I hope to merge data1 and data2. Take areaid=1 as an example. > >From the two dataset, we can see that data1 has two points(x,y) in areaid=1, > and data2 has four points (x1,y1) in areaid=1. Each record in data1 will > have two matched records in data2.I want to randomly select 1/2 points of > areaid=1 in data2 to link the one record of areaid=1 in the data1, and the > other 1/2 points of areaid=1 in data2 to link the other record of areaid=1 > in the data1.Actually,the number of records in the same areaid will be over > 2 in the actual dataset. This is only an example to explain the problem. > For the cases of areaid=2 or 3,they are a little easier than areaid=1 > because there are only one value in data1. > The final results are something like the following dataset. > areaid x1 y1 date x y > 1 1.22 1.32 3/23/2004 1.2 1.3 > 1 1.53 2.34 3/22/2004 1.2 1.3 > 1 1.21 1.37 3/23/2004 1.5 2.3 > 1 1.52 2.35 3/22/2004 1.5 2.3 > 2 0.21 3.33 4/23/2004 0.2 3.3 > 2 0.23 3.35 4/23/2004 0.2 3.3 > 3 1.57 1.31 5/22/2004 1.5 1.3 > 3 1.59 1.33 5/22/2004 1.5 1.3 > > Any suggestions or help are greatly appreciated. > Thanks a lot. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@... mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: probem on merge dataHi,
Actually no data was throw away. You can see that from the final results that i want showed in previous email. All the data in data1 was added to the data2. The problem is only how to match the repeated areaid between data1 and data2. There are two times records of the same areaid in data2 as that in data1, so i will randomly select two records for the same areaid without repetition to match one of the repeated records for the same areaid in data1(note they are different records because the date is different, although the areaid is the same). Merge function may be not enough to solve it. I tried the following codes, the results are the same and not the results that i want. Final dataset should be 8 records and for areaid=1 two records should have the date "3/23/2004" and two should have the date 3/22/2004. > data3 <- merge(data2, data1, all.x=TRUE) > data3 <- merge(data2, data1, all.x=TRUE,all.y=FALSE) > data3 areaid x1 y1 x y date 1 1 1.22 1.32 1.2 1.3 3/23/2004 2 1 1.22 1.32 1.5 2.3 3/22/2004 3 1 1.53 2.34 1.2 1.3 3/23/2004 4 1 1.53 2.34 1.5 2.3 3/22/2004 5 1 1.21 1.37 1.2 1.3 3/23/2004 6 1 1.21 1.37 1.5 2.3 3/22/2004 7 1 1.52 2.35 1.2 1.3 3/23/2004 8 1 1.52 2.35 1.5 2.3 3/22/2004 9 2 0.21 3.33 0.2 3.3 4/23/2004 10 2 0.23 3.35 0.2 3.3 4/23/2004 11 3 1.57 1.31 1.5 1.3 5/22/2004 12 3 1.59 1.33 1.5 1.3 5/22/2004 I think the rough ideas maybe, Firstly, we need to divide the datasets into two parts, unique areaid and repeated areaid. Sendly, from the repeated areaid in data2, we will randomly select two records without repetition to match one of the repeated areaid in data1, and then randomly select another two records without repetition to match another repeated areaid in data1, ET AL. Thirdly, match the unique areaid between data1 and data2. This should be easy compared with repeated areaid. Finally, combine them into one dataset. I am not very sure about this AND also hope to have explained this issue clearly. Thanks a lot. 2009/11/6 Ista Zahn <istazahn@...> > Hi, > So you want to randomly throw away data? Doesn't sound like a good idea to > me... > > You can get the combined data set using > > data3 <- merge(data2, data1, all=TRUE) > > From there it's just a matter of randomly deleting rows in which the > combination of areiad, x1 and x2 are duplicated. I'll leave that to > you, but I encourage you to think about whether this is really what > you want. > > -Ista > > On Thu, Nov 5, 2009 at 11:34 PM, rusers.sh <rusers.sh@...> wrote: > > Hi there, > > > data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE) > > data1<-data.frame(data1) > > names(data1)<-c("areaid","x","y","date") > > data1 > > > > areaid x y date > > 1 1 1.2 1.3 3/23/2004 > > 2 1 1.5 2.3 3/22/2004 > > 3 2 0.2 3.3 4/23/2004 > > 4 3 1.5 1.3 5/22/2004 > > data2<-matrix(data=c(1,1.22,1.32,1, 1.53, 2.34,1, 1.21, 1.37,1, > 1.52, > > 2.35,2, 0.21, 3.33,2, 0.23, 3.35,3, 1.57, 1.31,3, 1.59, > > 1.33),nrow=8,ncol=3,byrow=TRUE) > > data2<-data.frame(data2) > > names(data2)<-c("areaid","x1","y1") > > data2 > > > > areaid x1 y1 > > 1 1 1.22 1.32 > > 2 1 1.53 2.34 > > 3 1 1.21 1.37 > > 4 1 1.52 2.35 > > 5 2 0.21 3.33 > > 6 2 0.23 3.35 > > 7 3 1.57 1.31 > > 8 3 1.59 1.33 > > Explains the two data. You can treat data1 as case dataset and data2 as > > control dataset,respectively.Note th number of recodes for data2 are 2 > times > > as that of data1 for each records,something like 1:2 matched case-control > > study design. I hope to merge data1 and data2. Take areaid=1 as an > example. > > >From the two dataset, we can see that data1 has two points(x,y) in > areaid=1, > > and data2 has four points (x1,y1) in areaid=1. Each record in data1 will > > have two matched records in data2.I want to randomly select 1/2 points of > > areaid=1 in data2 to link the one record of areaid=1 in the data1, and > the > > other 1/2 points of areaid=1 in data2 to link the other record of > areaid=1 > > in the data1.Actually,the number of records in the same areaid will be > over > > 2 in the actual dataset. This is only an example to explain the problem. > > For the cases of areaid=2 or 3,they are a little easier than areaid=1 > > because there are only one value in data1. > > The final results are something like the following dataset. > > areaid x1 y1 date x y > > 1 1.22 1.32 3/23/2004 1.2 1.3 > > 1 1.53 2.34 3/22/2004 1.2 1.3 > > 1 1.21 1.37 3/23/2004 1.5 2.3 > > 1 1.52 2.35 3/22/2004 1.5 2.3 > > 2 0.21 3.33 4/23/2004 0.2 3.3 > > 2 0.23 3.35 4/23/2004 0.2 3.3 > > 3 1.57 1.31 5/22/2004 1.5 1.3 > > 3 1.59 1.33 5/22/2004 1.5 1.3 > > > > Any suggestions or help are greatly appreciated. > > Thanks a lot. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@... mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > [[alternative HTML version deleted]] ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Free embeddable forum powered by Nabble | Forum Help |