probem on merge data

View: New views
3 Messages — Rating Filter:   Alert me  

probem on merge data

by zhijie zhang-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi there,
data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
data1<-data.frame(data1)
names(data1)<-c("areaid","x","y","date")
data1

   areaid   x   y      date
1      1 1.2 1.3 3/23/2004
2      1 1.5 2.3 3/22/2004
3      2 0.2 3.3 4/23/2004
4      3 1.5 1.3 5/22/2004
data2<-matrix(data=c(1,1.22,1.32,1,  1.53,  2.34,1,  1.21,  1.37,1,  1.52,
2.35,2,  0.21,  3.33,2,  0.23,  3.35,3,  1.57, 1.31,3,  1.59,
1.33),nrow=8,ncol=3,byrow=TRUE)
data2<-data.frame(data2)
names(data2)<-c("areaid","x1","y1")
data2

   areaid x1   y1
1      1 1.22 1.32
2      1 1.53 2.34
3      1 1.21 1.37
4      1 1.52 2.35
5      2 0.21 3.33
6      2 0.23 3.35
7      3 1.57 1.31
8      3 1.59 1.33
  Explains the two data. You can treat data1 as case dataset and data2 as
control dataset,respectively.Note th number of recodes for data2 are 2 times
as that of data1 for each records,something like 1:2 matched case-control
study design. I hope to merge data1 and data2. Take areaid=1 as an example.
>From the two dataset, we can see that data1 has two points(x,y) in areaid=1,
and data2 has four points (x1,y1) in areaid=1. Each record in data1 will
have two matched records in data2.I want to randomly select 1/2 points of
areaid=1 in data2 to link the one record of areaid=1 in the data1, and the
other 1/2 points of areaid=1 in data2 to link the other record of areaid=1
in the data1.Actually,the number of records in the same areaid will be over
2 in the actual dataset. This is only an example to explain the problem.
For the cases of areaid=2 or 3,they are a little easier than areaid=1
because there are only one value in data1.
  The final results are something like the following dataset.
areaid x1 y1    date         x  y
1  1.22  1.32  3/23/2004   1.2  1.3
1  1.53  2.34  3/22/2004   1.2  1.3
1  1.21  1.37  3/23/2004   1.5  2.3
1  1.52  2.35  3/22/2004   1.5  2.3
2  0.21  3.33  4/23/2004   0.2  3.3
2  0.23  3.35  4/23/2004   0.2  3.3
3  1.57  1.31  5/22/2004   1.5  1.3
3  1.59  1.33  5/22/2004   1.5  1.3

   Any suggestions or help are greatly appreciated.
  Thanks a lot.

        [[alternative HTML version deleted]]

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: probem on merge data

by Ista Zahn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
So you want to randomly throw away data? Doesn't sound like a good idea to me...

You can get the combined data set using

data3 <- merge(data2, data1, all=TRUE)

>From there it's just a matter of randomly deleting rows in which the
combination of areiad, x1 and x2 are duplicated. I'll leave that to
you, but I encourage you to think about whether this is really what
you want.

-Ista

On Thu, Nov 5, 2009 at 11:34 PM, rusers.sh <rusers.sh@...> wrote:

> Hi there,
> data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
> data1<-data.frame(data1)
> names(data1)<-c("areaid","x","y","date")
> data1
>
>   areaid   x   y      date
> 1      1 1.2 1.3 3/23/2004
> 2      1 1.5 2.3 3/22/2004
> 3      2 0.2 3.3 4/23/2004
> 4      3 1.5 1.3 5/22/2004
> data2<-matrix(data=c(1,1.22,1.32,1,  1.53,  2.34,1,  1.21,  1.37,1,  1.52,
> 2.35,2,  0.21,  3.33,2,  0.23,  3.35,3,  1.57, 1.31,3,  1.59,
> 1.33),nrow=8,ncol=3,byrow=TRUE)
> data2<-data.frame(data2)
> names(data2)<-c("areaid","x1","y1")
> data2
>
>   areaid x1   y1
> 1      1 1.22 1.32
> 2      1 1.53 2.34
> 3      1 1.21 1.37
> 4      1 1.52 2.35
> 5      2 0.21 3.33
> 6      2 0.23 3.35
> 7      3 1.57 1.31
> 8      3 1.59 1.33
>  Explains the two data. You can treat data1 as case dataset and data2 as
> control dataset,respectively.Note th number of recodes for data2 are 2 times
> as that of data1 for each records,something like 1:2 matched case-control
> study design. I hope to merge data1 and data2. Take areaid=1 as an example.
> >From the two dataset, we can see that data1 has two points(x,y) in areaid=1,
> and data2 has four points (x1,y1) in areaid=1. Each record in data1 will
> have two matched records in data2.I want to randomly select 1/2 points of
> areaid=1 in data2 to link the one record of areaid=1 in the data1, and the
> other 1/2 points of areaid=1 in data2 to link the other record of areaid=1
> in the data1.Actually,the number of records in the same areaid will be over
> 2 in the actual dataset. This is only an example to explain the problem.
> For the cases of areaid=2 or 3,they are a little easier than areaid=1
> because there are only one value in data1.
>  The final results are something like the following dataset.
> areaid x1 y1    date         x  y
> 1  1.22  1.32  3/23/2004   1.2  1.3
> 1  1.53  2.34  3/22/2004   1.2  1.3
> 1  1.21  1.37  3/23/2004   1.5  2.3
> 1  1.52  2.35  3/22/2004   1.5  2.3
> 2  0.21  3.33  4/23/2004   0.2  3.3
> 2  0.23  3.35  4/23/2004   0.2  3.3
> 3  1.57  1.31  5/22/2004   1.5  1.3
> 3  1.59  1.33  5/22/2004   1.5  1.3
>
>   Any suggestions or help are greatly appreciated.
>  Thanks a lot.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: probem on merge data

by zhijie zhang-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
  Actually no data was throw away. You can see that from the final results
that i want showed in previous email.
   All the data in data1 was added to the data2. The problem is only how to
match the repeated areaid between data1 and data2. There are two times
records of the same areaid in data2 as that in data1, so i will randomly
select two records for the same areaid without repetition to match one of
the repeated records for the same areaid in data1(note they are different
records because the date is different, although the areaid is the same).
  Merge function may be not enough to solve it. I tried the following codes,
the results are the same and not the results that i want. Final dataset
should be 8 records and for areaid=1 two records should have the date
"3/23/2004" and two should have the date 3/22/2004.
 > data3 <- merge(data2, data1, all.x=TRUE)
> data3 <- merge(data2, data1, all.x=TRUE,all.y=FALSE)
> data3
   areaid   x1   y1   x   y      date
1       1 1.22 1.32 1.2 1.3 3/23/2004
2       1 1.22 1.32 1.5 2.3 3/22/2004
3       1 1.53 2.34 1.2 1.3 3/23/2004
4       1 1.53 2.34 1.5 2.3 3/22/2004
5       1 1.21 1.37 1.2 1.3 3/23/2004
6       1 1.21 1.37 1.5 2.3 3/22/2004
7       1 1.52 2.35 1.2 1.3 3/23/2004
8       1 1.52 2.35 1.5 2.3 3/22/2004
9       2 0.21 3.33 0.2 3.3 4/23/2004
10      2 0.23 3.35 0.2 3.3 4/23/2004
11      3 1.57 1.31 1.5 1.3 5/22/2004
12      3 1.59 1.33 1.5 1.3 5/22/2004

  I  think the rough ideas maybe,
Firstly, we need to divide the datasets into two parts, unique areaid and
repeated areaid.
Sendly, from the repeated areaid in data2, we will randomly select two
records without repetition to match one of the repeated areaid in data1, and
then randomly select another two records without repetition to match another
repeated areaid in data1, ET AL.
Thirdly, match the unique areaid between data1 and data2. This should be
easy compared with repeated areaid.
Finally, combine them into one dataset.
  I am not very sure about this AND also hope to have explained this issue
clearly.
 Thanks a lot.


2009/11/6 Ista Zahn <istazahn@...>

> Hi,
> So you want to randomly throw away data? Doesn't sound like a good idea to
> me...
>
> You can get the combined data set using
>
> data3 <- merge(data2, data1, all=TRUE)
>
> From there it's just a matter of randomly deleting rows in which the
> combination of areiad, x1 and x2 are duplicated. I'll leave that to
> you, but I encourage you to think about whether this is really what
> you want.
>
> -Ista
>
> On Thu, Nov 5, 2009 at 11:34 PM, rusers.sh <rusers.sh@...> wrote:
> > Hi there,
> >
> data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
> > data1<-data.frame(data1)
> > names(data1)<-c("areaid","x","y","date")
> > data1
> >
> >   areaid   x   y      date
> > 1      1 1.2 1.3 3/23/2004
> > 2      1 1.5 2.3 3/22/2004
> > 3      2 0.2 3.3 4/23/2004
> > 4      3 1.5 1.3 5/22/2004
> > data2<-matrix(data=c(1,1.22,1.32,1,  1.53,  2.34,1,  1.21,  1.37,1,
>  1.52,
> > 2.35,2,  0.21,  3.33,2,  0.23,  3.35,3,  1.57, 1.31,3,  1.59,
> > 1.33),nrow=8,ncol=3,byrow=TRUE)
> > data2<-data.frame(data2)
> > names(data2)<-c("areaid","x1","y1")
> > data2
> >
> >   areaid x1   y1
> > 1      1 1.22 1.32
> > 2      1 1.53 2.34
> > 3      1 1.21 1.37
> > 4      1 1.52 2.35
> > 5      2 0.21 3.33
> > 6      2 0.23 3.35
> > 7      3 1.57 1.31
> > 8      3 1.59 1.33
> >  Explains the two data. You can treat data1 as case dataset and data2 as
> > control dataset,respectively.Note th number of recodes for data2 are 2
> times
> > as that of data1 for each records,something like 1:2 matched case-control
> > study design. I hope to merge data1 and data2. Take areaid=1 as an
> example.
> > >From the two dataset, we can see that data1 has two points(x,y) in
> areaid=1,
> > and data2 has four points (x1,y1) in areaid=1. Each record in data1 will
> > have two matched records in data2.I want to randomly select 1/2 points of
> > areaid=1 in data2 to link the one record of areaid=1 in the data1, and
> the
> > other 1/2 points of areaid=1 in data2 to link the other record of
> areaid=1
> > in the data1.Actually,the number of records in the same areaid will be
> over
> > 2 in the actual dataset. This is only an example to explain the problem.
> > For the cases of areaid=2 or 3,they are a little easier than areaid=1
> > because there are only one value in data1.
> >  The final results are something like the following dataset.
> > areaid x1 y1    date         x  y
> > 1  1.22  1.32  3/23/2004   1.2  1.3
> > 1  1.53  2.34  3/22/2004   1.2  1.3
> > 1  1.21  1.37  3/23/2004   1.5  2.3
> > 1  1.52  2.35  3/22/2004   1.5  2.3
> > 2  0.21  3.33  4/23/2004   0.2  3.3
> > 2  0.23  3.35  4/23/2004   0.2  3.3
> > 3  1.57  1.31  5/22/2004   1.5  1.3
> > 3  1.59  1.33  5/22/2004   1.5  1.3
> >
> >   Any suggestions or help are greatly appreciated.
> >  Thanks a lot.
> >
> >        [[alternative HTML version deleted]]
>  >
> > ______________________________________________
> > R-help@... mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.