Package 'Domean'

Title: Distributed Online Mean Tests
Description: Distributed Online Mean Tests is a powerful tool designed to efficiently process and analyze distributed datasets. It enables users to perform mean tests in an online, distributed manner, making it highly suitable for large-scale data analysis. By leveraging advanced computational techniques, 'Domean' ensures robust and scalable solutions for statistical analysis, particularly in scenarios where data is dispersed across multiple nodes or sources. This package is ideal for researchers and practitioners working with high-dimensional data, providing a flexible and efficient framework for mean testing. The philosophy of 'Domean' is described in Guo G.(2025) <doi:10.1016/j.physa.2024.130308>.
Authors: Guangbao Guo [aut, cre] , Qianwen Liu [aut]
Maintainer: Guangbao Guo <[email protected]>
License: MIT + file LICENSE
Version: 0.1
Built: 2025-03-05 05:24:58 UTC
Source: https://github.com/cran/Domean

Help Index


Two-Sample CLX Test for High-Dimensional Data

Description

Performs a two-sample CLX test to compare the means of two high-dimensional samples. This test is suitable for situations where the number of variables \( p \) is large relative to the sample sizes.

Usage

CLX(X, Y, alpha)

Arguments

X

A numeric matrix representing the first sample, where rows are variables and columns are observations.

Y

A numeric matrix representing the second sample, where rows are variables and columns are observations.

alpha

The significance level for the test (e.g., 0.05).

Details

The CLX test is designed to handle high-dimensional data by estimating the covariance matrix, applying thresholding to reduce noise, and transforming the data to white noise. The test statistic is calculated based on the maximum squared difference between the mean vectors, weighted by the inverse of the variances.

Value

A list containing the following components:

statistics

The test statistic.

p.value

The p-value of the test.

alternative

The alternative hypothesis ("two.sided").

method

The method used ("Two-Sample CLX test").

See Also

eigen: Used for eigen-decomposition of the covariance matrix. solve: Used to compute the inverse of the covariance matrix.

Examples

# Example usage:
  set.seed(123)
  p <- 100  # Number of variables
  n1 <- 20  # Sample size for X
  n2 <- 20  # Sample size for Y
  X <- matrix(rnorm(n1 * p), nrow = p, ncol = n1)
  Y <- matrix(rnorm(n2 * p, mean = 0.5), nrow = p, ncol = n2)
  result <- CLX(X, Y, alpha = 0.05)
  print(result)

Two-Sample Covariance Test for High-Dimensional Data

Description

Performs a test to compare the covariance matrices of two high-dimensional samples. This test is designed for situations where the number of variables \( p \) is large relative to the sample sizes \( n_1 \) and \( n_2 \).

Usage

covclx(X, Y)

Arguments

X

A numeric matrix representing the first sample, where rows are observations and columns are variables.

Y

A numeric matrix representing the second sample, where rows are observations and columns are variables.

Details

This function tests the null hypothesis that the covariance matrices of two samples are equal:

H0:Σ1=Σ2H_0: \Sigma_1 = \Sigma_2

against the alternative hypothesis that they are not equal.

The test statistic is based on the maximum normalized squared difference between the two sample covariance matrices. The p-value is computed using an extreme value distribution.

Value

A list containing the following components:

stat

The test statistic.

pval

The p-value of the test.

See Also

cov: Used for calculating sample covariance matrices.

Examples

# Example usage:
  set.seed(123)
  n1 <- 20
  n2 <- 30
  p <- 50
  X <- matrix(rnorm(n1 * p), nrow = n1, ncol = p)
  Y <- matrix(rnorm(n2 * p), nrow = n2, ncol = p)
  result <- covclx(X, Y)
  print(result)

Two-Sample CQ Test for High-Dimensional Covariance Matrices

Description

Performs a two-sample test to compare the covariance matrices of two high-dimensional samples. This test is designed for situations where the number of variables \( p \) is large relative to the sample sizes \( n_1 \) and \( n_2 \).

Usage

CQ2(X, Y)

Arguments

X

A numeric matrix representing the first sample, where rows are variables and columns are observations.

Y

A numeric matrix representing the second sample, where rows are variables and columns are observations.

Details

The test statistic is based on the difference between the sample covariance matrices, normalized by their variances. The p-value is computed using a normal approximation.

Value

A list containing the following components:

statistics

The test statistic \( Q_n \).

p.value

The p-value of the test.

alternative

The alternative hypothesis ("two.sided").

method

The method used ("Two-Sample CQ test").

Examples

# Example usage:
  set.seed(123)
  p <- 50
  n1 <- 30
  n2 <- 30
  X <- matrix(rnorm(n1 * p), nrow = p, ncol = n1)
  Y <- matrix(rnorm(n2 * p), nrow = p, ncol = n2)
  result <- CQ2(X, Y)
  print(result)

High-Dimensional Two-Sample Mean Test

Description

Conducts a high-dimensional two-sample mean test with optional variable filtering. This function performs both non-studentized and studentized tests to determine whether the means of two groups are significantly different.

Usage

CZZZ(X, Y, m = 2500, filter = TRUE, alpha = 0.05)

Arguments

X

Matrix representing the first group of data (variables in rows, observations in columns).

Y

Matrix representing the second group of data (variables in rows, observations in columns).

m

Number of bootstrap samples used for the test (default is 2500).

filter

Logical parameter indicating whether to filter variables based on mean differences (default is TRUE).

alpha

Significance level for the test (default is 0.05).

Details

This function performs a high-dimensional two-sample mean test, which is useful when the number of variables (p) is much larger than the number of observations (n). The function includes an optional filtering step to reduce the number of variables based on the difference in means between the two groups.

Value

A list containing the results of the non-studentized and studentized tests. Each result includes:

statistics

The test statistic.

p.value

The p-value of the test.

alternative

The alternative hypothesis (two-sided).

method

The method description.

Examples

# Example usage:
library(MASS)
set.seed(123)
X <- matrix(rnorm(1000), nrow = 100, ncol = 10)  # 100 variables, 10 observations
Y <- matrix(rnorm(1000, mean = 0.5), nrow = 100, ncol = 10)  # Different mean
result <- CZZZ(X, Y, m = 1000, filter = TRUE, alpha = 0.05)
print(result)

High-Dimensional Two-Sample Mean Test

Description

Conducts a high-dimensional two-sample mean test using a modified Hotelling's T-squared statistic. This test is suitable for cases where the number of variables \( p \) is larger than the sample size \( n \).

Usage

SKK(X, Y)

Arguments

X

Matrix representing the first sample (rows are observations, columns are variables).

Y

Matrix representing the second sample (rows are observations, columns are variables).

Details

This function implements a high-dimensional two-sample mean test by adjusting the Hotelling's T-squared statistic. It uses diagonal matrices and a correction factor to handle high-dimensional data.

Value

A list containing:

TSvalue

The test statistic value.

pvalue

The p-value of the test.

Examples

# Example usage:
set.seed(123)
X <- matrix(rnorm(200), nrow = 10, ncol = 20)  # 10 samples, 20 variables
Y <- matrix(rnorm(200, mean = 0.5), nrow = 10, ncol = 20)  # Different mean
result <- SKK(X, Y)
print(result)
# Output:
# TSvalue: The test statistic value
# pvalue: The p-value indicating the significance of the test

High-Dimensional Two-Sample Mean Test with Centering Adjustment

Description

Conducts a high-dimensional two-sample mean test with centering adjustment. This function is designed for cases where the number of variables \( p \) is larger than the sample sizes \( n \) and \( m \).

Usage

zwl(X, Y, order = 0)

Arguments

X

Matrix representing the first sample (rows are observations, columns are variables).

Y

Matrix representing the second sample (rows are observations, columns are variables).

order

Integer specifying the order of centering adjustment (default is 0).

Details

This function performs a high-dimensional two-sample mean test by adjusting the test statistic for centering. It uses a modified t-statistic and estimates the variance to handle high-dimensional data. The function also includes a custom centering adjustment based on the specified order.

Value

A list containing:

statistic

The test statistic value.

pvalue

The p-value of the test.

Tn

The adjusted test statistic before centering.

var

The estimated variance.

Examples

# Example usage:
set.seed(123)
X <- matrix(rnorm(200), nrow = 10, ncol = 20)  # 10 samples, 20 variables
Y <- matrix(rnorm(200, mean = 0.5), nrow = 10, ncol = 20)  # Different mean
result <- zwl(X, Y, order = 0)
print(result)

# Output:
# $statistic: The test statistic value
# $pvalue: The p-value indicating the significance of the test
# $Tn: The adjusted test statistic before centering
# $var: The estimated variance