| Title: | Distributed Online Covariance Matrix Tests |
|---|---|
| Description: | Distributed Online Covariance Matrix Tests 'Docovt' is a powerful tool designed to efficiently process and analyze distributed datasets. It enables users to perform covariance matrix tests in an online, distributed manner, making it highly suitable for large-scale data analysis. By leveraging advanced computational techniques, 'Docovt' ensures robust and scalable solutions for statistical analysis, particularly in scenarios where data is dispersed across multiple nodes or sources. This package is ideal for researchers and practitioners working with high-dimensional data, providing a flexible and efficient framework for covariance matrix estimation and hypothesis testing. The philosophy of 'Docovt' is described in Guo G.(2025) <doi:10.1016/j.physa.2024.130308>. |
| Authors: | Guangbao Guo [aut, cre] (ORCID: <https://orcid.org/0000-0002-4115-6218>), Congfan Zhang [aut] |
| Maintainer: | Guangbao Guo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3 |
| Built: | 2026-05-31 08:46:05 UTC |
| Source: | https://github.com/cran/Docovt |
Given two sets of data matrices X and Y, where X is an n1 rows and p cols matrix and Y is an n2 rows and p cols matrix, we conduct hypothesis testing of the covariance matrix between two samples. The null hypothesis is:
and are the sample covariance matrices of X and Y respectively. This test method is based on the test method proposed by Cai, Liu and Xia (2013). When the pval value is less than the significance coefficient (generally 0.05), the null hypothesis is rejected.
CLX(X,Y)CLX(X,Y)
X |
A matrix of n1 by p |
Y |
A matrix of n2 by p |
stat |
a test statistic value. |
pval |
a test p_value. |
Cai, T. T., Liu, W., and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108(501):265-277.
## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test CLX(X,Y)## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test CLX(X,Y)
Given data, it performs 1-sample test for Covariance where the null hypothesis is
where is the covariance of data model and is a
hypothesized covariance based on a procedure proposed by Cai and Ma (2013).
cm13(X,Sigma0, alpha)cm13(X,Sigma0, alpha)
X |
an |
Sigma0 |
a |
alpha |
level of significance. |
a named list containing:
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE to reject null hypothesis, FALSE otherwise.
## generate data from multivariate normal with trivial covariance. p = 5;n=10 X=data = matrix(rnorm(n*p), ncol=p) alpha=0.05 Sigma0=diag(ncol(X)) cm13(X,Sigma0, alpha)## generate data from multivariate normal with trivial covariance. p = 5;n=10 X=data = matrix(rnorm(n*p), ncol=p) alpha=0.05 Sigma0=diag(ncol(X)) cm13(X,Sigma0, alpha)
Given two sets of data, it performs 2-sample test for equality of covariance matrices where the null hypothesis is
where and represent true (unknown) covariance
for each dataset based on a procedure proposed by Cai and Ma (2013).
If statistic threshold, it rejects null hypothesis.
cmtwo(X, Y, alpha)cmtwo(X, Y, alpha)
X |
an |
Y |
an |
alpha |
level of significance. |
a named list containing
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE to reject null hypothesis, FALSE otherwise.
## generate 2 datasets from multivariate normal with identical covariance. p= 5; n1 = 100; n2 = 150; alpha=0.05 X=data1 = matrix(rnorm(n1*p), ncol=p) Y=data2 = matrix(rnorm(n2*p), ncol=p) # run test cmtwo(X, Y, alpha)## generate 2 datasets from multivariate normal with identical covariance. p= 5; n1 = 100; n2 = 150; alpha=0.05 X=data1 = matrix(rnorm(n1*p), ncol=p) Y=data2 = matrix(rnorm(n2*p), ncol=p) # run test cmtwo(X, Y, alpha)
This dataset was acquired during a keratoconus study, a collaborative project involving Ms.Nancy Tripoli and Dr.Kenneth L.Cohen of Department of Ophthalmology at the University of North Carolina, Chapel Hill. The fitted feature vectors for the complete corneal surface dataset collectively into a feature matrix with dimensions of 150 × 2000.
data(corneal)data(corneal)
A data frame with 150 observations on the following 4 groups.
row 1 to row 43 in total 43 rows of the feature matrix correspond to observations from the normal group
row 44 to row 57 in total 14 rows of the feature matrix correspond to observations from the unilateral suspect group
row 58 to row 78 in total 21 of the feature matrix correspond to observations from the suspect map group
row 79 to row 150 in total 72 of the feature matrix correspond to observations from the clinical keratoconus group
data(corneal) dim(corneal) group1 <- as.matrix(corneal[1:43, ]) ## normal group dim(group1) group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group dim(group2) group3 <- as.matrix(corneal[58:78, ]) ## suspect map group dim(group3) group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group dim(group4)data(corneal) dim(corneal) group1 <- as.matrix(corneal[1:43, ]) ## normal group dim(group1) group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group dim(group2) group3 <- as.matrix(corneal[58:78, ]) ## suspect map group dim(group3) group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group dim(group4)
A COVID19 data set from NCBI with ID GSE152641. The data set profiled peripheral blood from 24 healthy controls and 62 prospectively enrolled patients with community-acquired lower respiratory tract infection by SARS-COV-2 within the first 24 hours of hospital admission using RNA sequencing.
data(COVID19)data(COVID19)
A data frame with 86 observations on the following 2 groups.
row 2 to row 19, and row 82 to 87, in total 24 healthy controls
row 20 to 81, in total 62 prospectively enrolled patients
data(COVID19) dim(COVID19) group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group dim(group1) group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients dim(group2)data(COVID19) dim(COVID19) group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group dim(group1) group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients dim(group2)
Given two sets of data matrices X and Y, where X is an n1 rows and p cols matrix and Y is an n2 rows and p cols matrix, we conduct hypothesis testing of the covariance matrix between two samples. The null hypothesis is:
and are the sample covariance matrices of X and Y respectively. This test method is based on the test method proposed by Li and Chen (2012). When the pval value is less than the significance coefficient (generally 0.05), the null hypothesis is rejected.
LC(X,Y)LC(X,Y)
X |
A matrix of n1 by p |
Y |
A matrix of n2 by p |
stat |
a test statistic value. |
pval |
a test p_value. |
Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908-940.
## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test LC(X,Y)## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test LC(X,Y)
A three factor level variable corresponding to cancer type
data(miRNA)data(miRNA)
Dataframe with 21 samples and 537 variables
variables
samples
data(miRNA)data(miRNA)
Given two sets of data matrices X and Y, where X is an n1 rows and p cols matrix and Y is an n2 rows and p cols matrix,, we conduct hypothesis testing of the covariance matrix between two samples. The null hypothesis is:
and are the sample covariance matrices of X and Y respectively. This test method is based on the test method proposed by Yu, Li and Xue (2022). When the pval value is less than the significance coefficient (generally 0.05), the null hypothesis is rejected.
PEC(X,Y)PEC(X,Y)
X |
A matrix of n1 by p |
Y |
A matrix of n2 by p |
stat |
a test statistic value. |
pval |
a test p_value. |
Yu, X., Li, D., and Xue, L. (2022). Fisher's combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, (in press):1-14.
## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test PEC(X,Y)## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test PEC(X,Y)
Given two sets of data matrices X and Y, where X is an n1 rows and p cols matrix and Y is an n2 rows and p cols matrix, we conduct hypothesis testing of the covariance matrix between two samples. The null hypothesis is:
and are the sample covariance matrices of X and Y respectively. This test method is based on the test method proposed by Yu, Li, Xue and Li (2022). When the pval value is less than the significance coefficient (generally 0.05), the null hypothesis is rejected.
PECO(X,Y,delta = NULL)PECO(X,Y,delta = NULL)
X |
A matrix of n1 by p |
Y |
A matrix of n2 by p |
delta |
A scalar used as the threshold for building PE components, usually the default value. |
stat |
a test statistic value. |
pval |
a test p_value. |
Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1-14.
## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test PECO(X,Y)## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test PECO(X,Y)
Given two sets of data matrices X and Y, where X is an n1 rows and p cols matrix and Y is an n2 rows and p cols matrix,, we conduct hypothesis testing of the covariance matrix between two samples. The null hypothesis is:
and are the sample covariance matrices of X and Y respectively. This test method is based on the test method proposed by Yu, Li and Xue (2022). When the pval value is less than the significance coefficient (generally 0.05), the null hypothesis is rejected.
PEF(X,Y)PEF(X,Y)
X |
A matrix of n1 by p |
Y |
A matrix of n2 by p |
stat |
a test statistic value. |
pval |
a test p_value. |
Yu, X., Li, D., and Xue, L. (2022). Fisher's combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, (in press):1-14.
## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test PEF(X,Y)## generate X and Y. p= 500; n1 = 100; n2 = 150 X=matrix(rnorm(n1*p), ncol=p) Y=matrix(rnorm(n2*p), ncol=p) ## run test PEF(X,Y)
Given data, it performs 1-sample test for Covariance where the null hypothesis is
where is the covariance of data model and is a
hypothesized covariance based on a procedure proposed by Srivastava, Yanagihara, and Kubokawa (2014).
syk(data, Sigma0, alpha)syk(data, Sigma0, alpha)
data |
an |
Sigma0 |
a |
alpha |
level of significance. |
a named list containing
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE to reject null hypothesis, FALSE otherwise.
## generate data from multivariate normal with trivial covariance. p = 5;n=10 data = matrix(rnorm(n*p), ncol=p) alpha=0.05 Sigma0=diag(ncol(data)) ## run the test syk(data, Sigma0, alpha)## generate data from multivariate normal with trivial covariance. p = 5;n=10 data = matrix(rnorm(n*p), ncol=p) alpha=0.05 Sigma0=diag(ncol(data)) ## run the test syk(data, Sigma0, alpha)