Title: | The COR for Optimal Subset Selection in Distributed Estimation |
---|---|
Description: | An algorithm of optimal subset selection, related to Covariance matrices, observation matrices and Response vectors (COR) to select the optimal subsets in distributed estimation. The philosophy of the package is described in Guo G. (2024) <doi:10.1007/s11222-024-10471-z>. |
Authors: | Guangbao Guo [aut, cre]
|
Maintainer: | Guangbao Guo <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2025-02-14 05:41:36 UTC |
Source: | https://github.com/cran/COR |
Caculate the estimators of beta on the A-opt and D-opt
beta_AD(K = K, nk = nk, alpha = alpha, X = X, y = y)
beta_AD(K = K, nk = nk, alpha = alpha, X = X, y = y)
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
A list containing:
betaA |
The estimator of beta on the A-opt. |
betaD |
The estimator of beta on the D-opt. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; beta_AD(K=K,nk=nk,alpha=alpha,X=X,y=y)
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; beta_AD(K=K,nk=nk,alpha=alpha,X=X,y=y)
Caculate the estimator of beta on the COR
beta_cor(K = K, nk = nk, alpha = alpha, X = X, y = y)
beta_cor(K = K, nk = nk, alpha = alpha, X = X, y = y)
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
A list containing:
betaC |
The estimator of beta on the COR. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; beta_cor(K=K,nk=nk,alpha=alpha,X=X,y=y)
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; beta_cor(K=K,nk=nk,alpha=alpha,X=X,y=y)
Caculate the estimators of beta on the LEV-opt#'
beta_LW(X, Y, K, nk)
beta_LW(X, Y, K, nk)
X |
is the observation matrix |
Y |
is the response vector |
K |
is the number of subsets |
nk |
is the length of subsets |
A list containing:
betalev |
The estimator of beta on the LEV-opt subset. |
betam |
The mean of the beta estimators across all K subsets. |
AMSE |
The Average Mean Squared Error (AMSE) for the estimator. |
WMSE |
The Weighted Mean Squared Error (WMSE) for the estimator. |
MSElevb |
The Mean Squared Error (MSE) of the LEV-opt estimator compared to the true beta. |
MSEb |
The Mean Squared Error (MSE) of the mean estimator (betam) compared to the true beta. |
MSEyleva |
The Mean Squared Error (MSE) of the LEV-opt estimator on the subset with the maximum hat value (Xleva). |
MSEyleviy |
The Mean Squared Error (MSE) of the LEV-opt estimator on the subset with the minimum hat value (Xlevi). |
MSEW |
The Mean Squared Error (MSE) of the weighted estimator (Wbeta) compared to the true beta. |
MSEw |
The Mean Squared Error (MSE) of the weighted estimator (wbeta) compared to the true beta. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
A data set about the communities and crime
data("communities")
data("communities")
A data frame with 1994 observations on the following 128 variables.
V1
a numeric vector
V2
a numeric vector
V3
a numeric vector
V4
a character vector
V5
a numeric vector
V6
a numeric vector
V7
a numeric vector
V8
a numeric vector
V9
a numeric vector
V10
a numeric vector
V11
a numeric vector
V12
a numeric vector
V13
a numeric vector
V14
a numeric vector
V15
a numeric vector
V16
a numeric vector
V17
a numeric vector
V18
a numeric vector
V19
a numeric vector
V20
a numeric vector
V21
a numeric vector
V22
a numeric vector
V23
a numeric vector
V24
a numeric vector
V25
a numeric vector
V26
a numeric vector
V27
a numeric vector
V28
a numeric vector
V29
a numeric vector
V30
a numeric vector
V31
a numeric vector
V32
a numeric vector
V33
a numeric vector
V34
a numeric vector
V35
a numeric vector
V36
a numeric vector
V37
a numeric vector
V38
a numeric vector
V39
a numeric vector
V40
a numeric vector
V41
a numeric vector
V42
a numeric vector
V43
a numeric vector
V44
a numeric vector
V45
a numeric vector
V46
a numeric vector
V47
a numeric vector
V48
a numeric vector
V49
a numeric vector
V50
a numeric vector
V51
a numeric vector
V52
a numeric vector
V53
a numeric vector
V54
a numeric vector
V55
a numeric vector
V56
a numeric vector
V57
a numeric vector
V58
a numeric vector
V59
a numeric vector
V60
a numeric vector
V61
a numeric vector
V62
a numeric vector
V63
a numeric vector
V64
a numeric vector
V65
a numeric vector
V66
a numeric vector
V67
a numeric vector
V68
a numeric vector
V69
a numeric vector
V70
a numeric vector
V71
a numeric vector
V72
a numeric vector
V73
a numeric vector
V74
a numeric vector
V75
a numeric vector
V76
a numeric vector
V77
a numeric vector
V78
a numeric vector
V79
a numeric vector
V80
a numeric vector
V81
a numeric vector
V82
a numeric vector
V83
a numeric vector
V84
a numeric vector
V85
a numeric vector
V86
a numeric vector
V87
a numeric vector
V88
a numeric vector
V89
a numeric vector
V90
a numeric vector
V91
a numeric vector
V92
a numeric vector
V93
a numeric vector
V94
a numeric vector
V95
a numeric vector
V96
a numeric vector
V97
a numeric vector
V98
a numeric vector
V99
a numeric vector
V100
a numeric vector
V101
a numeric vector
V102
a numeric vector
V103
a numeric vector
V104
a numeric vector
V105
a numeric vector
V106
a numeric vector
V107
a numeric vector
V108
a numeric vector
V109
a numeric vector
V110
a numeric vector
V111
a numeric vector
V112
a numeric vector
V113
a numeric vector
V114
a numeric vector
V115
a numeric vector
V116
a numeric vector
V117
a numeric vector
V118
a numeric vector
V119
a numeric vector
V120
a numeric vector
V121
a numeric vector
V122
a numeric vector
V123
a numeric vector
V124
a numeric vector
V125
a numeric vector
V126
a numeric vector
V127
a numeric vector
V128
a numeric vector
UCI repository
Redmond, M. A. and A. Baveja: A Data-Driven Software Tool for Enabling Cooperative Information Sharing Among Police Departments. European Journal of Operational Research 141 (2002) 660-678.
data(communities) ## maybe str(communities) ; plot(communities) ...
data(communities) ## maybe str(communities) ; plot(communities) ...
Caculate the optimal subset lengths on the COR
COR(K = K, nk = nk, alpha = alpha, X = X, y = y)
COR(K = K, nk = nk, alpha = alpha, X = X, y = y)
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
A list containing:
seqL |
The index of the subset with the minimum L value. |
seqN |
The index of the subset with the minimum N value. |
lWMN |
The optimal subset lengths on the COR. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; COR(K=K,nk=nk,alpha=alpha,X=X,y=y)
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; COR(K=K,nk=nk,alpha=alpha,X=X,y=y)
A data set about chemical sensor
data("ethylene_CO")
data("ethylene_CO")
A data frame with 4001 observations on the following 19 variables.
V1
a character vector
V2
a character vector
V3
a character vector
V4
a character vector
V5
a character vector
V6
a character vector
V7
a character vector
V8
a character vector
V9
a character vector
V10
a character vector
V11
a character vector
V12
a character vector
V13
a character vector
V14
a character vector
V15
a character vector
V16
a character vector
V17
a character vector
V18
a character vector
V19
a character vector
We selected the first 4001 rows on the original data set about 1048576 observations on 19 variables.
UCI Repository
Wang, H. Y., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522), 829-844.
data(ethylene_CO) ## maybe str(ethylene_CO) ; plot(ethylene_CO) ...
data(ethylene_CO) ## maybe str(ethylene_CO) ; plot(ethylene_CO) ...
This function estimates the coefficients of a linear regression model using a design matrix 'X' and a response vector 'Y'. It implements an A-optimal and D-optimal design criteria to choose optimal subsets of observations.
LICbeta(X, Y, alpha, K, nk)
LICbeta(X, Y, alpha, K, nk)
X |
The observation matrix (n x p) |
Y |
The response vector (n x 1) |
alpha |
The significance level for computing confidence intervals |
K |
The number of subsets |
nk |
The number of observations per subset |
A list containing:
E5 |
The LIC estimator for linear regression. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Calculate the LIC estimator based on A-optimal and D-optimal criterion
LICnew(X, Y, alpha, K, nk)
LICnew(X, Y, alpha, K, nk)
X |
A matrix of observations (design matrix) with size n x p |
Y |
A vector of responses with length n |
alpha |
The significance level for confidence intervals |
K |
The number of subsets to consider |
nk |
The size of each subset |
A list containing:
E5 |
The LIC estimator based on A-optimal and D-optimal criterion. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
p = 6; n = 1000; K = 2; nk = 200; alpha = 0.05; sigma = 1 e = rnorm(n, 0, sigma); beta = c(sort(c(runif(p, 0, 1)))); data = c(rnorm(n * p, 5, 10)); X = matrix(data, ncol = p); Y = X %*% beta + e; LICnew(X = X, Y = Y, alpha = alpha, K = K, nk = nk)
p = 6; n = 1000; K = 2; nk = 200; alpha = 0.05; sigma = 1 e = rnorm(n, 0, sigma); beta = c(sort(c(runif(p, 0, 1)))); data = c(rnorm(n * p, 5, 10)); X = matrix(data, ncol = p); Y = X %*% beta + e; LICnew(X = X, Y = Y, alpha = alpha, K = K, nk = nk)
Calculate MSE values for different beta estimation methods
MSEbeta(X, Y, alpha, K, nk)
MSEbeta(X, Y, alpha, K, nk)
X |
The design matrix (observations). |
Y |
The response vector. |
alpha |
The significance level. |
K |
The number of subsets. |
nk |
The length of subsets (number of observations in each subset). |
A list containing:
MSECOR |
The MSE of the COR beta estimator. |
MSEAopt |
The MSE of the A-optimal beta estimator. |
MSEDopt |
The MSE of the D-optimal beta estimator. |
MSElic |
The MSE of the LIC beta estimator. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Caculate the MSE values of the COR criterion in simulation
MSEcom(K = K, nk = nk, alpha = alpha, X = X, y = y)
MSEcom(K = K, nk = nk, alpha = alpha, X = X, y = y)
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
A list containing:
MSEx |
The Mean Squared Error between the true beta and the estimate betax based on the COR. |
MSEA |
The Mean Squared Error between the true beta and the estimate betaA based on the least squares estimate for subset A. |
MSEc |
The Mean Squared Error between the true beta and the estimate betac based on the COR-selected subset. |
MSEm |
The Mean Squared Error between the true beta and the median estimator betamm across all subsets. |
MSEa |
The Mean Squared Error between the true beta and the mean estimator betaa across all subsets. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
p=6;n=1000;K=2;nk=500;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; MSEcom(K=K,nk=nk,alpha=alpha,X=X,y=y)
p=6;n=1000;K=2;nk=500;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; MSEcom(K=K,nk=nk,alpha=alpha,X=X,y=y)
Caculate the MSE values of the COR criterion for redundant data in simulation
MSEver(K = K, nk = nk, alpha = alpha, X = X, y = y)
MSEver(K = K, nk = nk, alpha = alpha, X = X, y = y)
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
A list containing:
minE |
The minimum value of the error variance estimator. |
Mcor |
The MSE of the COR estimator. |
Mx |
The MSE of the estimator based on the subset with the maximum M. |
MA |
The MSE of the estimator based on the subset with the minimum W. |
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; MSEver(K=K,nk=nk,alpha=alpha,X=X,y=y)
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1)))); data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p); y=X%*%beta+e; MSEver(K=K,nk=nk,alpha=alpha,X=X,y=y)