| Title: | Copula Factor Models |
|---|---|
| Description: | Provides tools for factor analysis in high-dimensional settings under copula-based factor models. It includes functions to simulate factor-model data with copula-distributed idiosyncratic errors (e.g., Clayton, Gumbel, Frank, Student t and Gaussian copulas) and to perform diagnostic tests such as the Kaiser-Meyer-Olkin measure and Bartlett's test of sphericity. Estimation routines include principal component based factor analysis, projected principal component analysis, and principal orthogonal complement thresholding for large covariance matrix estimation. The philosophy of the package is described in Guo G. (2023) <doi:10.1007/s00180-022-01270-z>. |
| Authors: | Guangbao Guo [aut, cre], Xin Gao [aut] |
| Maintainer: | Guangbao Guo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.4 |
| Built: | 2026-05-28 07:36:42 UTC |
| Source: | https://github.com/cran/CoFM |
Air quality measurements collected from a gas multisensor device deployed in an Italian city.
This dataset contains the responses of a gas multisensor device deployed on the field in an Italian city.
air_quality air_qualityair_quality air_quality
A data frame with 9358 rows and 14 variables:
DateTimeDate and time of the measurement (POSIXct).
CO_GTTrue hourly averaged CO concentration in (reference analyzer).
PT08_S1_COPT08.S1 (tin oxide) hourly averaged sensor response (CO targeted).
NMHC_GTTrue hourly averaged non-methanic hydrocarbons concentration in .
C6H6_GTTrue hourly averaged benzene concentration in .
PT08_S2_NMHCPT08.S2 (titania) hourly averaged sensor response (NMHC targeted).
NOx_GTTrue hourly averaged NOx concentration in ppb.
PT08_S3_NOxPT08.S3 (tungsten oxide) hourly averaged sensor response (NOx targeted).
NO2_GTTrue hourly averaged NO2 concentration in .
PT08_S4_NO2PT08.S4 (tungsten oxide) hourly averaged sensor response (NO2 targeted).
PT08_S5_O3PT08.S5 (indium oxide) hourly averaged sensor response (O3 targeted).
TTemperature in degrees Celsius.
RHRelative humidity (percent).
AHAbsolute humidity.
A data frame with 9357 rows and 16 variables.
The dataset was collected from March 2004 to February 2005. Missing values are tagged with -200 value in the raw data, but have been converted to NA.
UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets/Air+Quality
https://archive.ics.uci.edu/ml/datasets/Air+Quality
This function simulates data based on a Copula Factor Model structure. It generates factor scores, factor loadings, and error terms (using specified Copula distributions), combines them to create the observed data, and then performs KMO and Bartlett's tests on the generated data.
CoFM(n = 1000, p = 10, m = 5, type = "Clayton", param = 2)CoFM(n = 1000, p = 10, m = 5, type = "Clayton", param = 2)
n |
Integer. Sample size (number of rows). Default is 1000. |
p |
Integer. Number of observed variables (columns). Default is 10. |
m |
Integer. Number of factors. Default is 5. |
type |
Character. The type of Copula for error terms. Options: "Clayton", "Gumbel", "Frank". |
param |
Numeric. The parameter for the Copula (theta). Default is 2.0. |
A list containing:
data |
The generated data matrix (n x p). |
KMO |
The results of the Kaiser-Meyer-Olkin test. |
Bartlett |
The results of Bartlett's test of sphericity. |
True_Params |
A list containing the true parameters used (F, A, D, mu). |
# Examples should be fast and reproducible for CRAN checks set.seed(123) # Clayton copula errors (toy size) res1 <- CoFM::CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2) res1$KMO res1$Bartlett # Gumbel copula errors (toy size) res2 <- CoFM::CoFM(n = 150, p = 6, m = 2, type = "Gumbel", param = 2) head(res2$data)# Examples should be fast and reproducible for CRAN checks set.seed(123) # Clayton copula errors (toy size) res1 <- CoFM::CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2) res1$KMO res1$Bartlett # Gumbel copula errors (toy size) res2 <- CoFM::CoFM(n = 150, p = 6, m = 2, type = "Gumbel", param = 2) head(res2$data)
Generate random samples (error terms) from various copula distributions, including Archimedean (Clayton, Gumbel, Frank), Elliptical (t, Normal), Mixed, and Extreme-Value (Galambos) copulas. Useful for simulation studies involving non-normal error structures.
Copula_errors( n, type = "Clayton", dim = 2, param = NULL, extra_params = list() )Copula_errors( n, type = "Clayton", dim = 2, param = NULL, extra_params = list() )
n |
Integer. The number of samples (rows) to generate. |
type |
Character. The type of Copula to use. Options: "Clayton", "Gumbel", "Frank", "t", "Mixed", "Galambos", "Normal". |
dim |
Integer. The dimension of the copula (number of columns/variables). Used for Archimedean/Elliptical copulas. For "Mixed" the default is 3. |
param |
Numeric or Matrix. The main parameter for the copula (e.g., theta for Archimedean, correlation vector/matrix for Normal). If NULL, default values are used. |
extra_params |
List. Additional parameters for specific copulas (e.g., |
A numeric matrix of dimension (n x dim) containing the generated random samples.
# Examples should be fast and reproducible for CRAN checks set.seed(123) # Example 1: Clayton Copula (toy example) U_clayton <- Copula_errors(n = 200, type = "Clayton", dim = 2, param = 2) head(U_clayton) # Example 2: t-Copula with degrees of freedom (toy example) U_t <- Copula_errors( n = 200, type = "t", dim = 2, param = 0.7, extra_params = list(df = 4) ) head(U_t) # Example 3: Multivariate Normal Copula (dim = 3) # normalCopula() expects the upper-triangular correlations as a vector: # (rho_12, rho_13, rho_23) for dim=3 rho_vec <- c(0.5, 0.3, 0.4) U_normal <- Copula_errors(n = 200, type = "Normal", dim = 3, param = rho_vec) head(U_normal)# Examples should be fast and reproducible for CRAN checks set.seed(123) # Example 1: Clayton Copula (toy example) U_clayton <- Copula_errors(n = 200, type = "Clayton", dim = 2, param = 2) head(U_clayton) # Example 2: t-Copula with degrees of freedom (toy example) U_t <- Copula_errors( n = 200, type = "t", dim = 2, param = 0.7, extra_params = list(df = 4) ) head(U_t) # Example 3: Multivariate Normal Copula (dim = 3) # normalCopula() expects the upper-triangular correlations as a vector: # (rho_12, rho_13, rho_23) for dim=3 rho_vec <- c(0.5, 0.3, 0.4) U_normal <- Copula_errors(n = 200, type = "Normal", dim = 3, param = rho_vec) head(U_normal)
This function performs factor analysis using a principal-component (FanPC) approach.
It estimates the factor loading matrix and uniquenesses from the correlation matrix
of the input data. Unlike FanPC_CoFM, this function does not calculate
error metrics against true parameters, making it suitable for simple estimation tasks.
FanPC_basic(data, m)FanPC_basic(data, m)
data |
A matrix or data frame of input data (n x p). |
m |
Integer. The number of principal components (factors) to extract. |
A list containing:
AF |
Estimated factor loadings matrix (p x m). |
DF |
Estimated uniquenesses vector (p). |
SigmahatF |
The correlation matrix of the input data. |
# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate synthetic data using CoFM (toy example) sim <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) obs_data <- sim$data # 2. Apply FanPC method (extract 2 factors) fit <- FanPC_basic(data = obs_data, m = 2) # 3. Inspect estimates head(fit$AF) # Estimated loadings fit$DF # Estimated uniquenesses# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate synthetic data using CoFM (toy example) sim <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) obs_data <- sim$data # 2. Apply FanPC method (extract 2 factors) fit <- FanPC_basic(data = obs_data, m = 2) # 3. Inspect estimates head(fit$AF) # Estimated loadings fit$DF # Estimated uniquenesses
This function estimates factor loadings and uniquenesses using a principal-component
(FanPC) approach. It then compares these estimates with the true parameters (A and D)
to calculate Mean Squared Errors (MSE) and relative loss metrics. This is designed to
work with data generated by the CoFM function.
FanPC_CoFM(data, m, A, D)FanPC_CoFM(data, m, A, D)
data |
A matrix or data frame of input data (n x p). Usually the |
m |
Integer. The number of principal components (factors) to extract. |
A |
Matrix. The true factor loadings matrix (p x m). Usually |
D |
Matrix. The true uniquenesses matrix (p x p). Usually |
A list containing:
AF |
Estimated factor loadings matrix (p x m). |
DF |
Estimated uniquenesses matrix (p x p). |
MSESigmaA |
Mean Squared Error for factor loadings. |
MSESigmaD |
Mean Squared Error for uniquenesses. |
LSigmaA |
Relative loss metric for factor loadings. |
LSigmaD |
Relative loss metric for uniquenesses. |
# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim_result <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) # 2. Extract true parameters and observed data true_A <- sim_result$True_Params$A true_D <- sim_result$True_Params$D obs_data <- sim_result$data # 3. Apply FanPC and compute error metrics fanpc_result <- FanPC_CoFM(data = obs_data, m = 2, A = true_A, D = true_D) # 4. Inspect results fanpc_result$MSESigmaA fanpc_result$MSESigmaD head(fanpc_result$AF)# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim_result <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) # 2. Extract true parameters and observed data true_A <- sim_result$True_Params$A true_D <- sim_result$True_Params$D obs_data <- sim_result$data # 3. Apply FanPC and compute error metrics fanpc_result <- FanPC_CoFM(data = obs_data, m = 2, A = true_A, D = true_D) # 4. Inspect results fanpc_result$MSESigmaA fanpc_result$MSESigmaD head(fanpc_result$AF)
This function performs Principal Component Analysis (PCA) on the correlation matrix
of the data to estimate factor loadings and uniquenesses. It is designed to work with
data generated by the CoFM function and calculates error metrics (MSE and
relative loss) by comparing estimates against the true parameters.
PC_CoFM(data, m, A, D)PC_CoFM(data, m, A, D)
data |
A matrix or data frame of input data (n x p). Usually the |
m |
Integer. The number of principal components (factors) to retain. |
A |
Matrix. The true factor loadings matrix (p x m). Usually |
D |
Matrix. The true uniquenesses matrix (p x p). Usually |
A list containing:
A2 |
Estimated factor loadings matrix. |
D2 |
Estimated uniquenesses matrix. |
MSESigmaA |
Mean Squared Error for factor loadings. |
MSESigmaD |
Mean Squared Error for uniquenesses. |
LSigmaA |
Relative loss metric for factor loadings. |
LSigmaD |
Relative loss metric for uniquenesses. |
# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim_result <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) # 2. Extract true parameters and observed data true_A <- sim_result$True_Params$A true_D <- sim_result$True_Params$D obs_data <- sim_result$data # 3. Apply PC method to estimate parameters and compute errors pc_result <- PC_CoFM(data = obs_data, m = 2, A = true_A, D = true_D) # 4. Inspect results pc_result$MSESigmaA pc_result$MSESigmaD head(pc_result$A2)# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim_result <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) # 2. Extract true parameters and observed data true_A <- sim_result$True_Params$A true_D <- sim_result$True_Params$D obs_data <- sim_result$data # 3. Apply PC method to estimate parameters and compute errors pc_result <- PC_CoFM(data = obs_data, m = 2, A = true_A, D = true_D) # 4. Inspect results pc_result$MSESigmaA pc_result$MSESigmaD head(pc_result$A2)
Implements the POET method for large covariance matrix estimation (Fan, Liao & Mincheva, 2013). The method assumes a factor model structure, estimates the low-rank component via PCA, and applies thresholding to the sparse residual covariance matrix.
poet( X, r = NULL, r.max = 10, thresh = "hard", lambda = NULL, gamma = 3.7, delta = 1e-04, method.r = "IC1" )poet( X, r = NULL, r.max = 10, thresh = "hard", lambda = NULL, gamma = 3.7, delta = 1e-04, method.r = "IC1" )
X |
Numeric matrix (T x N). T is the number of time periods (rows), N is the number of variables (columns). |
r |
Integer or NULL. User-specified number of factors. If NULL, r is estimated automatically using |
r.max |
Integer. Upper bound for the number of factors when estimating r. Default is 10. |
thresh |
Character. Thresholding type for the residual covariance. Options: "hard", "soft", "scad", "adapt". Default is "hard". |
lambda |
Numeric or NULL. Thresholding parameter. If NULL, it defaults to |
gamma |
Numeric. Parameter for SCAD thresholding. Default is 3.7. |
delta |
Numeric. Minimum eigenvalue bump to ensure positive definiteness of the residual covariance. Default is 1e-4. |
method.r |
Character. Method to select the number of factors if r is NULL. Options: "IC1" (Bai & Ng, 2002) or "ER" (Eigenvalue Ratio, Ahn & Horenstein, 2013). |
A list containing:
Sigma.poet |
The estimated N x N POET covariance matrix. |
Sigma.fact |
The estimated N x N low-rank (factor) covariance matrix. |
Sigma.resid |
The estimated N x N thresholded residual covariance matrix. |
F.hat |
Estimated factors (T x r). |
Lambda.hat |
Estimated factor loadings (N x r). |
r.hat |
The number of factors used (estimated or specified). |
R.hat |
Same as Sigma.resid (for compatibility). |
Fan, J., Liao, Y., & Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B, 75(4), 603-680.
# Examples should be fast and reproducible for CRAN checks set.seed(2025) T_obs <- 40; N_var <- 15; r_true <- 2 # Generate a simple factor model: X = F * Lambda' + U Lambda <- matrix(stats::rnorm(N_var * r_true), N_var, r_true) F_scores <- matrix(stats::rnorm(T_obs * r_true), T_obs, r_true) U <- matrix(stats::rnorm(T_obs * N_var), T_obs, N_var) X_sim <- F_scores %*% t(Lambda) + U # T x N # Apply POET (choose r via IC1; use soft thresholding) res <- poet(X_sim, r = NULL, method.r = "IC1", thresh = "soft") res$r.hat res$Sigma.poet[1:5, 1:5]# Examples should be fast and reproducible for CRAN checks set.seed(2025) T_obs <- 40; N_var <- 15; r_true <- 2 # Generate a simple factor model: X = F * Lambda' + U Lambda <- matrix(stats::rnorm(N_var * r_true), N_var, r_true) F_scores <- matrix(stats::rnorm(T_obs * r_true), T_obs, r_true) U <- matrix(stats::rnorm(T_obs * N_var), T_obs, N_var) X_sim <- F_scores %*% t(Lambda) + U # T x N # Apply POET (choose r via IC1; use soft thresholding) res <- poet(X_sim, r = NULL, method.r = "IC1", thresh = "soft") res$r.hat res$Sigma.poet[1:5, 1:5]
This function performs Projected Principal Component Analysis (PPC) to estimate factor loadings
and specific variances. It projects the data onto a specific subspace before performing eigen
decomposition. Unlike PPC_CoFM, this function does not calculate error metrics
against true parameters.
PPC_basic(data, m)PPC_basic(data, m)
data |
A matrix or data frame of input data (n x p). |
m |
Integer. The number of principal components (factors) to extract. |
A list containing:
Apro |
Estimated projected factor loadings matrix (p x m). |
Dpro |
Estimated projected uniquenesses vector (p). |
Sigmahatpro |
The covariance matrix of the projected data. |
# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) obs_data <- sim$data # 2. Apply PPC method (extract 2 factors) fit <- PPC_basic(data = obs_data, m = 2) # 3. Inspect estimates head(fit$Apro) fit$Dpro# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) obs_data <- sim$data # 2. Apply PPC method (extract 2 factors) fit <- PPC_basic(data = obs_data, m = 2) # 3. Inspect estimates head(fit$Apro) fit$Dpro
This function performs Projected Principal Component Analysis (PPC) on the input data to
estimate factor loadings and uniquenesses. It is designed to work with data generated by
the CoFM function and calculates error metrics (MSE and relative loss) by
comparing estimates against true parameters. The method projects data onto a subspace
(using a projection operator) before performing PCA.
PPC_CoFM(data, m, A, D)PPC_CoFM(data, m, A, D)
data |
A matrix or data frame of input data (n x p). Usually the |
m |
Integer. The number of principal components (factors) to retain. |
A |
Matrix. The true factor loadings matrix (p x m). Usually |
D |
Matrix. The true uniquenesses matrix (p x p). Usually |
A list containing:
Ap2 |
Estimated factor loadings matrix (Projected). |
Dp2 |
Estimated uniquenesses matrix (Projected). |
MSESigmaA |
Mean Squared Error for factor loadings. |
MSESigmaD |
Mean Squared Error for uniquenesses. |
LSigmaA |
Relative loss metric for factor loadings. |
LSigmaD |
Relative loss metric for uniquenesses. |
# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim_result <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) # 2. Extract true parameters and observed data true_A <- sim_result$True_Params$A true_D <- sim_result$True_Params$D obs_data <- sim_result$data # 3. Apply PPC method and compute errors ppc_result <- PPC_CoFM(data = obs_data, m = 2, A = true_A, D = true_D) # 4. Inspect results ppc_result$MSESigmaA ppc_result$MSESigmaD head(ppc_result$Ap2)# Examples should be fast and reproducible for CRAN checks set.seed(123) # 1. Generate toy data using CoFM sim_result <- CoFM(n = 200, p = 6, m = 2, type = "Clayton", param = 2.0) # 2. Extract true parameters and observed data true_A <- sim_result$True_Params$A true_D <- sim_result$True_Params$D obs_data <- sim_result$data # 3. Apply PPC method and compute errors ppc_result <- PPC_CoFM(data = obs_data, m = 2, A = true_A, D = true_D) # 4. Inspect results ppc_result$MSESigmaA ppc_result$MSESigmaD head(ppc_result$Ap2)
This function performs a specific type of Projected PCA where the data is projected onto
the orthogonal complement of the mean vector. It effectively applies the centering projection
(where is the all-ones matrix), optionally rescales the columns,
and then performs PCA on the covariance matrix. This allows estimation of factor loadings and
residual variances after removing the mean structure.
PPC_new(data, m)PPC_new(data, m)
data |
A matrix or data frame of input data (n x p). |
m |
Integer. The number of principal components (factors) to keep. |
A list containing:
Apro |
Estimated factor loading matrix (p x m). |
Dpro |
Estimated residual variances (p x p diagonal matrix). |
Sigmahatpro |
Covariance matrix of the projected data. |
# Examples should be fast and reproducible for CRAN checks set.seed(1) dat <- matrix(stats::rnorm(200), ncol = 4) ans <- PPC_new(data = dat, m = 2) str(ans) head(ans$Apro)# Examples should be fast and reproducible for CRAN checks set.seed(1) dat <- matrix(stats::rnorm(200), ncol = 4) ans <- PPC_new(data = dat, m = 2) str(ans) head(ans$Apro)
Projects the data onto the orthogonal complement of a given vector u, eliminating the
effect of u, and then performs PCA on the projected data. This is useful for removing
specific trends (e.g., time trends, common market factors) before analysis.
PPC_u(data, m, u)PPC_u(data, m, u)
data |
A matrix or data frame of input data (n x p). |
m |
Integer. Number of principal components to retain. |
u |
Numeric vector of length n. The projection direction to be removed from the data. Will be normalized internally. |
A list containing:
Apro |
Estimated factor loading matrix (p x m). |
Dpro |
Estimated residual variances (p x p diagonal matrix). |
Sigmahatpro |
Covariance matrix of the projected data. |
u |
The normalized projection vector used. |
# Examples should be fast and reproducible for CRAN checks set.seed(123) dat <- matrix(stats::rnorm(200), ncol = 4) u0 <- seq_len(nrow(dat)) # e.g., a linear trend to remove res <- PPC_u(data = dat, m = 2, u = u0) res$u head(res$Apro)# Examples should be fast and reproducible for CRAN checks set.seed(123) dat <- matrix(stats::rnorm(200), ncol = 4) u0 <- seq_len(nrow(dat)) # e.g., a linear trend to remove res <- PPC_u(data = dat, m = 2, u = u0) res$u head(res$Apro)