Title: | The FPCdpca Criterion on Distributed Principal Component Analysis |
---|---|
Description: | We consider optimal subset selection in the setting that one needs to use only one data subset to represent the whole data set with minimum information loss, and devise a novel intersection-based criterion on selecting optimal subset, called as the FPC criterion, to handle with the optimal sub-estimator in distributed principal component analysis; That is, the FPCdpca. The philosophy of the package is described in Guo G. (2020) <doi:10.1007/s00180-020-00974-4>. |
Authors: | Guangbao Guo [aut, cre, cph], Jiarui Li [ctb] |
Maintainer: | Guangbao Guo <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 0.1.0 |
Built: | 2025-02-22 04:25:40 UTC |
Source: | https://github.com/cran/FPCdpca |
Decentralized PCA is a technology that applies decentralized PCA to distributed computing environments.
Depca(data,K,nk, eps,nit.max)
Depca(data,K,nk, eps,nit.max)
data |
is sparse random projection matrix. |
K |
is the desired target rank. |
nk |
is the size of subsets. |
eps |
is the noise. |
nit.max |
is the repeat times. |
MSEXrp,MSEvrp, MSESrp, kopt
K=20; nk=50; nr=10; p=8; k=4; n=K*nk;d=6 data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) set.seed(1234) eps=10^(-1);nit.max=1000 TXde=TSde=c(rep(0,5)) for (j in 1:5){ depca=Depca(data=data,K=K, nk=nk,eps=eps,nit.max=nit.max) TXde[j]=as.numeric(depca)[1] TSde[j]=as.numeric(depca)[2] } mean(TXde) mean(TSde)
K=20; nk=50; nr=10; p=8; k=4; n=K*nk;d=6 data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) set.seed(1234) eps=10^(-1);nit.max=1000 TXde=TSde=c(rep(0,5)) for (j in 1:5){ depca=Depca(data=data,K=K, nk=nk,eps=eps,nit.max=nit.max) TXde[j]=as.numeric(depca)[1] TSde[j]=as.numeric(depca)[2] } mean(TXde) mean(TSde)
Distributed PCA is a technology that applies PCA to distributed computing environments.
Dpca(data,K, nk)
Dpca(data,K, nk)
data |
is the n random vectors constitute the data matrix. |
K |
is an index subset/sub-vector specifying. |
nk |
is the size of subsets. |
MSEXp, MSEvp, MSESp, kopt
K=20; nk=50; nr=10; p=8;n=K*nk;d=6 data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) Dpca(data,K,nk)
K=20; nk=50; nr=10; p=8;n=K*nk;d=6 data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) Dpca(data,K,nk)
Distributed random projection is a technology that applies random projection to distributed computing environments.
Drp(data,K, nk,d)
Drp(data,K, nk,d)
data |
is sparse random projection matrix. |
K |
is the number of distributed nodes. |
nk |
is the size of subsets. |
d |
is the dimension number. |
MSEXrp,MSEvrp, MSESrp, kopt
K=20; nk=50; nr=10; p=8; d=5; n=K*nk; data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) data=matrix(rpois((n-nr)*p,1),ncol=p); rexp(nr*p,1); rchisq(10000, df = 5); Drp(data=data,K=K, nk=nk,d=d)
K=20; nk=50; nr=10; p=8; d=5; n=K*nk; data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) data=matrix(rpois((n-nr)*p,1),ncol=p); rexp(nr*p,1); rchisq(10000, df = 5); Drp(data=data,K=K, nk=nk,d=d)
Distributed random PCA is a technology that applies random PCA to distributed computing environments.
Drpca(data,K, nk,d)
Drpca(data,K, nk,d)
data |
is sparse random projection matrix. |
K |
is the number of distributed nodes. |
nk |
is the size of subsets. |
d |
is the dimension number. |
MSEXrp, MSEvrp, kSopt, kxopt
K=20; nk=50; nr=50; p=8;d=5; n=K*nk; data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) Drpca(data,K, nk,d)
K=20; nk=50; nr=50; p=8;d=5; n=K*nk; data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) Drpca(data,K, nk,d)
Distributed random svd is a technology that applies random SVD to distributed computing environments.
Drsvd(data,K, nk,m,q,k)
Drsvd(data,K, nk,m,q,k)
data |
sparse random projection matrix. |
K |
the number of distributed nodes. |
nk |
the size of subsets. |
m |
the dimension of variables. |
q |
number of additional power iterations. |
k |
the desired target rank. |
MSEXrsvd |
The MSE value of Xrsvd |
MSEvrsvd |
The MSE value of vrsvd |
MSESrsvd |
The MSE value of Srsvd |
kopt |
The size of optimal subset |
K=20; nk=50; nr=10; p=8; m=5; q=5;k=4;n=K*nk; data=X=matrix(rexp(n*p,0.8),ncol=p) #data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) #data=X=matrix(rpois((n-nr)*p,1),ncol=p); rexp(nr*p,1); rchisq(10000, df = 5); #data=X=matrix(rexp(n*p,0.8),ncol=p) Drsvd(data=data,K=K,nk=nk,m=m,q=q,k=k)
K=20; nk=50; nr=10; p=8; m=5; q=5;k=4;n=K*nk; data=X=matrix(rexp(n*p,0.8),ncol=p) #data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) #data=X=matrix(rpois((n-nr)*p,1),ncol=p); rexp(nr*p,1); rchisq(10000, df = 5); #data=X=matrix(rexp(n*p,0.8),ncol=p) Drsvd(data=data,K=K,nk=nk,m=m,q=q,k=k)
Distributed svd is a technology that applies SVD to distributed computing environments.
Dsvd(data,K, nk,k)
Dsvd(data,K, nk,k)
data |
A independent variable. |
K |
the number of distributed nodes. |
nk |
the number of each blocks. |
k |
the desired target rank. |
MSEXs |
the MSE of Xs |
MSEvsvd |
the MSE of vsvd |
MSESsvd |
the MSE of Ssvd |
kopt |
the size of optimal subset |
#install.packages("matrixcalc") library(matrixcalc) K=20; nk=50; nr=10; p=8; k=4; n=K*nk; data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) Dsvd(data=data,K=K, nk=nk,k=k)
#install.packages("matrixcalc") library(matrixcalc) K=20; nk=50; nr=10; p=8; k=4; n=K*nk; data=matrix(c(rnorm((n-nr)*p,0,1),rpois(nr*p,100)),ncol=p) Dsvd(data=data,K=K, nk=nk,k=k)
FPC is a technology that applies FPC A to distributed computing environments.
FPC(data,K,nk)
FPC(data,K,nk)
data |
is a data set matrix. |
K |
is the desired target rank. |
nk |
is the size of subsets. |
MSEv1,MSEv2,MSEvopt,MSESopt1,MSESopt2,MSESopt,MSEShat,MSESba,MSESw
K=20; nk=500; p=8; n=10000;m=50 data=matrix(c(rnorm((n-m)*p,0,1),rpois(m*p,100)),ncol=p) FPC(data=data,K=K,nk=nk)
K=20; nk=500; p=8; n=10000;m=50 data=matrix(c(rnorm((n-m)*p,0,1),rpois(m*p,100)),ncol=p) FPC(data=data,K=K,nk=nk)