after PCA, the pc values are so large, wrong?

View: New views
3 Messages — Rating Filter:   Alert me  

after PCA, the pc values are so large, wrong?

by bbslover :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

rm(list=ls())
yx.df<-read.csv("c:/MK-2-72.csv",sep=',',header=T,dec='.')
dim(yx.df)
#get X matrix
y<-yx.df[,1]
x<-yx.df[,2:643]
#conver to matrix
mat<-as.matrix(x)
#get row number
rownum<-nrow(mat)
#remove the constant parameters
mat1<-mat[,apply(mat,2,function(.col)!(all(.col[1]==.col[2:rownum])))]
dim(yx.df)
dim(mat1)
#remove columns with numbers of zero >0.95
mat2<-mat1[,apply(mat1,2,function(.col)!(sum(.col==0)/rownum>0.95))]
dim(yx.df)
dim(mat2)
#remove colunms that sd<0.5
mat3<-mat2[,apply(mat2,2,function(.col)!all(sd(.col)<0.5))]
dim(yx.df)
dim(mat3)
#PCA analysis
mat3.pr<-prcomp(mat3,cor=T)
summary(mat3.pr,loading=T)
pre.cmp<-predict(mat3.pr)
cmp<-pre.cmp[,1:3]
cmp
DF<-cbind(Y,cmp)
DF<-as.data.frame(DF)
names(DF)<-c('y','p1','p2','p3')
DF
summary(lm(y~p1+p2+p3,data=DF))
mat3.pr<-prcomp(DF,cor=T)
summary(mat3.pr)
pre<-predict(mat3.pr)
pre1<-pre[,1:3]
pre1
colnames(pre1)<-c("x1","x2","x3")
pre1
pc<-cbind(y,pre1)
pc<-as.data.frame(pc)
lm.pc<-lm(y~x1+x2+x3,data=pc)
summary(lm.pc)

above, my code about pca, but after finishing it, the first three pcs are some large, why? and the fit value

r2 are bad.   belowe is my value on the firest 3 pcs.
> pre1
              PC1          PC2          PC3
 [1,] -15181.5190  1944.392700 -1074.326182
 [2,] -32152.4533  1007.113729  3201.361408
 [3,] -15836.5362  2117.988273  -555.799383
 [4,]  -1618.5561  1481.020337   255.530132
 [5,]  -5407.5030  1975.779398   -84.646283
 [6,]  -9662.1949  2611.220928  -417.435782
 [7,] -30488.2102   577.385588  1853.420297
 [8,]  -2135.2563 -4506.112873  1382.413284
 [9,]  -1584.2796 -4645.142062   929.146895
[10,]   -668.7664 -4876.250486   177.691446
[11,]  -2188.5914 -4495.203080  1432.428127
[12,] -19633.9581  2159.000138 -1598.710872
[13,] -26849.1088  -515.574085 -2683.552623
[14,]  -9492.9503 -4868.648205  1236.986097
[15,] -13857.6517 -4810.228193  1296.342199
[16,] -11596.5097 -8181.631403   462.913210
[17,] -25948.6564  -746.442386 -3415.426682
[18,]  15386.4477   709.974524   555.160973
[19,]  21642.7516  1163.456075  -609.437740
[20,]  22236.7094   675.562564  -136.992578
[21,]  14354.9927   611.996274    -4.867054
[22,]  12569.9493  1111.842240   585.540985
[23,]  20739.0219  3078.679745  1662.902248
[24,]   9472.0249   648.769910   381.487034
[25,]  17299.5307  1424.712428  1522.311676
[26,]  13231.2735   587.761915   170.448061
[27,]  10843.5590   705.485396   -79.931518
[28,]   9402.8803 -1978.216853 -1534.244078
[29,]  13094.9525   212.042937  -363.941664
[30,]   9337.3522   537.885230   189.558999
[31,]   7747.1347  -141.004825 -1664.082447
[32,]   4640.1161 -1489.652284 -3584.574135
[33,]  13241.5054   175.630689  -486.250927
[34,]   3867.2204   814.830143  1584.358007
[35,]   8614.5030   708.274447   814.295587
[36,] -18815.6774  -480.311541  1248.369916
[37,]  -1860.0810  1195.557861   269.322703
[38,]   7172.0057     4.216905 -1191.448702
[39,]  -7233.2271 -2361.951658  -235.293358
[40,]   1841.3548  1187.225488   632.116420
[41,]  12465.2336   367.822405   160.751014
[42,] -39021.7259  1972.333778  3167.504098
[43,]  13098.7736  -424.152058  -567.846037
[44,]   9793.7729  -559.084900  -210.696126
[45,]  13111.1861    22.772626  -318.242722
[46,]  13169.0604     7.808885  -363.995563
[47,]   3306.6293  -694.908211  -642.996604
[48,]  10779.8582  -989.175596 -1619.861931
[49,]  10872.6913  -747.979343 -1375.317959
[50,]  -3057.5633  1838.449143  1454.886518
[51,]  -6854.9316  2338.753165  1113.510561
[52,] -15077.1823  1917.776905 -1158.158633
[53,] -45862.8305  1173.157521 -1707.293955
[54,] -14294.1553  1716.708462 -1794.064434
[55,]  24645.0508  2519.904889  1424.233563
[56,]  23303.5998  2250.088386   839.587354
[57,]  18865.5231   897.566446    36.240598
[58,]    227.2659 -6582.661199  -712.892569
[59,]  15336.8371   722.953549   593.903314
[60,]  13030.8715   228.509670  -312.933654
[61,]   5826.0388   331.077814   -53.417878
[62,]  13150.4446  -437.612023  -608.342969
[63,]  11728.3897   -83.151510   569.007995
[64,]  11021.5720  -869.425283 -1216.724017
[65,]   9625.3142   137.388994   138.735249
[66,] -15905.2704  3735.547166   421.846379
[67,] -15539.7628  3331.399648   104.886572
[68,]  -2294.9924  1648.164750   822.075221
[69,] -10120.0153  1558.766306  -333.378256
[70,] -24241.4554  -533.700229  1516.603088
[71,]  -1036.6022 -4782.136067   475.195011
[72,] -24575.2244  2655.599986 -1965.946921

the fit result below:
Call:
lm(formula = y ~ x1 + x2 + x3, data = pc)

Residuals:
     Min       1Q   Median       3Q      Max
-1.29638 -0.47622  0.01059  0.49268  1.69335

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.613e+00  8.143e-02  68.932  < 2e-16 ***
x1          -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
x2          -4.095e-05  3.448e-05  -1.188    0.239    
x3          -8.106e-05  6.412e-05  -1.264    0.210    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.691 on 68 degrees of freedom
Multiple R-squared: 0.3644,     Adjusted R-squared: 0.3364
F-statistic: 12.99 on 3 and 68 DF,  p-value: 8.368e-07

x2,x3 is not significance. by pricipal, after PCA, the pcs should significance, but my data is not, why?

Re: after PCA, the pc values are so large, wrong?

by Ben Bolker :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

bbslover <dluthm <at> yeah.net> writes:

>
[snip]

> the fit result below:
> Call:
> lm(formula = y ~ x1 + x2 + x3, data = pc)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -1.29638 -0.47622  0.01059  0.49268  1.69335
>
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)    
> (Intercept)  5.613e+00  8.143e-02  68.932  < 2e-16 ***
> x1          -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
> x2          -4.095e-05  3.448e-05  -1.188    0.239    
> x3          -8.106e-05  6.412e-05  -1.264    0.210    
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.691 on 68 degrees of freedom
> Multiple R-squared: 0.3644,     Adjusted R-squared: 0.3364
> F-statistic: 12.99 on 3 and 68 DF,  p-value: 8.368e-07
>
> x2,x3 is not significance. by pricipal, after PCA, the pcs should
> significance, but my data is not, why?

  Why is it necessary that the first few principal components
should have significant relationships with some other response
values?  The strength, and weakness, of PCA is that it is
calculated *without regard* to a response variable, so it
does not constitute "data snooping" ...
  I may of course have misinterpreted your question, but at
a quick look, I don't see anything obviously wrong here.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: after PCA, the pc values are so large, wrong?

by bbslover :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

ok,I understand your means, maybe PLS is better for my aim. but I have done that, also bad. the most questions for me is how to select less variables from the independent to fit dependent. GA maybe is good way, but I do not learn it well.
Ben Bolker wrote:
bbslover <dluthm <at> yeah.net> writes:

>
[snip]

> the fit result below:
> Call:
> lm(formula = y ~ x1 + x2 + x3, data = pc)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -1.29638 -0.47622  0.01059  0.49268  1.69335
>
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)    
> (Intercept)  5.613e+00  8.143e-02  68.932  < 2e-16 ***
> x1          -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
> x2          -4.095e-05  3.448e-05  -1.188    0.239    
> x3          -8.106e-05  6.412e-05  -1.264    0.210    
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.691 on 68 degrees of freedom
> Multiple R-squared: 0.3644,     Adjusted R-squared: 0.3364
> F-statistic: 12.99 on 3 and 68 DF,  p-value: 8.368e-07
>
> x2,x3 is not significance. by pricipal, after PCA, the pcs should
> significance, but my data is not, why?

  Why is it necessary that the first few principal components
should have significant relationships with some other response
values?  The strength, and weakness, of PCA is that it is
calculated *without regard* to a response variable, so it
does not constitute "data snooping" ...
  I may of course have misinterpreted your question, but at
a quick look, I don't see anything obviously wrong here.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.