| Title: | Elliptical Factor Models |
|---|---|
| Description: | The elliptical factor model, as an extension of the traditional factor model, effectively overcomes the limitations of the traditional model when dealing with heavy-tailed characteristic data. This package implements sparse principal component methods (SPC) and bi-sparse online principal component estimation (SPOC) for parameter estimation. Includes functionality for calculating mean squared error, relative error, and loading matrix sparsity.The philosophy of the package is described in Guo G. (2023) <doi:10.1007/s00180-022-01270-z>. |
| Authors: | Guangbao Guo [aut, cre], Yuanyuan Zhou [aut] |
| Maintainer: | Guangbao Guo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.3.0 |
| Built: | 2026-05-21 07:04:02 UTC |
| Source: | https://github.com/cran/EFM |
The function supports various distribution types for generating the data, including: Elliptical-Normal Distribution, Elliptical-t Distribution.
EFM(n, p, m, nu, distribution_type)EFM(n, p, m, nu, distribution_type)
n |
Sample size. |
p |
Sample dimensionality. |
m |
Number of factors. |
nu |
A numerical parameter used exclusively in the "Elliptical-t" distribution, representing the degrees of freedom. |
distribution_type |
The type of distribution ("Elliptical-Normal Distribution" or "Elliptical-t Distribution"). |
A list containing:
data |
A matrix of generated data (n x p). |
A |
A matrix representing the factor loadings (p x m). |
D |
A diagonal matrix representing the unique variances (p x p). |
library(MASS) library(pracma) n <- 2000 p <- 10 m <- 5 nu <- 5 distribution_type <- "Elliptical-Normal Distribution" X <- EFM(n, p, m, nu, distribution_type)library(MASS) library(pracma) n <- 2000 p <- 10 m <- 5 nu <- 5 distribution_type <- "Elliptical-Normal Distribution" X <- EFM(n, p, m, nu, distribution_type)
This function calculates errors for elliptical distributions, including Elliptical-Normal Distribution and Elliptical-t Distribution.
elliptical_errors(data, distribution = "normal", df = NULL)elliptical_errors(data, distribution = "normal", df = NULL)
data |
Matrix of data following an elliptical distribution. |
distribution |
Type of elliptical distribution ("normal" or "t"). |
df |
Degrees of freedom for Elliptical-t Distribution (required if distribution is "t"). |
Calculate errors for elliptical distributions
A list containing error metrics for the specified elliptical distribution.
set.seed(123) n <- 100; p <- 5 data_normal <- MASS::mvrnorm(n, rep(0, p), diag(p)) errors_normal <- elliptical_errors(data_normal, distribution = "normal") data_t <- matrix(stats::rt(n * p, df = 5), nrow = n, ncol = p) errors_t <- elliptical_errors(data_t, distribution = "t", df = 5) print(errors_normal) print(errors_t)set.seed(123) n <- 100; p <- 5 data_normal <- MASS::mvrnorm(n, rep(0, p), diag(p)) errors_normal <- elliptical_errors(data_normal, distribution = "normal") data_t <- matrix(stats::rt(n * p, df = 5), nrow = n, ncol = p) errors_t <- elliptical_errors(data_t, distribution = "t", df = 5) print(errors_normal) print(errors_t)
This function performs Factor Analysis via Principal Component (FanPC) on a given data set. It calculates the estimated factor loading matrix (AF), specific variance matrix (DF), and the mean squared errors.
FanPC.EFM(data, m, A, D, p)FanPC.EFM(data, m, A, D, p)
data |
A matrix of input data. |
m |
The number of principal components. |
A |
The true factor loadings matrix. |
D |
The true uniquenesses matrix. |
p |
The number of variables. |
A list containing:
AF |
Estimated factor loadings. |
DF |
Estimated uniquenesses. |
MSESigmaA |
Mean squared error for factor loadings. |
MSESigmaD |
Mean squared error for uniquenesses. |
LSigmaA |
Loss metric for factor loadings. |
LSigmaD |
Loss metric for uniquenesses. |
library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- FanPC.EFM(data, m, A, D, p) print(results)library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- FanPC.EFM(data, m, A, D, p) print(results)
Batch-updated PCA with residual-projection perturbation.
incremental_perturb_pca(batch_data, m, eta = 0.1)incremental_perturb_pca(batch_data, m, eta = 0.1)
batch_data |
List of matrices. |
m |
Components. |
eta |
Step size. |
List with Ap, Dp, V.
## Not run: set.seed(789) N <- 50; m.true <- 2; n_batch <- 5 batches <- lapply(1:n_batch, function(i) matrix(rnorm(60 * N), 60, N)) inc <- EFM::incremental_perturb_pca(batches, m = m.true) print(round(inc$Ap[1:5, 1:2], 3)) ## End(Not run)## Not run: set.seed(789) N <- 50; m.true <- 2; n_batch <- 5 batches <- lapply(1:n_batch, function(i) matrix(rnorm(60 * N), 60, N)) inc <- EFM::incremental_perturb_pca(batches, m = m.true) print(round(inc$Ap[1:5, 1:2], 3)) ## End(Not run)
This function performs Principal Component Analysis (PCA) on a given data set to reduce dimensionality. It calculates the estimated values for the loadings, specific variances, and the covariance matrix.
PC1.EFM(data, m, A, D)PC1.EFM(data, m, A, D)
data |
The total data set to be analyzed. |
m |
The number of principal components to retain in the analysis. |
A |
The true factor loadings matrix. |
D |
The true uniquenesses matrix. |
A list containing:
A1 |
Estimated factor loadings. |
D1 |
Estimated uniquenesses. |
MSESigmaA |
Mean squared error for factor loadings. |
MSESigmaD |
Mean squared error for uniquenesses. |
LSigmaA |
Loss metric for factor loadings. |
LSigmaD |
Loss metric for uniquenesses. |
library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- PC1.EFM(data, m, A, D) print(results)library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- PC1.EFM(data, m, A, D) print(results)
Principal component projection method that extracts factor loadings and specific variances through a projection matrix.
PPC(data, m)PPC(data, m)
data |
T x N data matrix. |
m |
Number of components. |
List with Apro (loadings), Dpro (specific variances), Sigmahatpro.
## Not run: set.seed(123) N <- 100; T <- 150; m.true <- 3 dat <- matrix(rnorm(T * N), T, N) ppc.out <- EFM::PPC(dat, m = m.true) print(round(ppc.out$Apro[1:5, 1:3], 3)) ## End(Not run)## Not run: set.seed(123) N <- 100; T <- 150; m.true <- 3 dat <- matrix(rnorm(T * N), T, N) ppc.out <- EFM::PPC(dat, m = m.true) print(round(ppc.out$Apro[1:5, 1:3], 3)) ## End(Not run)
This function computes Perturbation Principal Component Analysis (PPC) for the provided input data, estimating factor loadings and uniquenesses. It calculates mean squared errors and loss metrics for the estimated values compared to true values.
PPC1.EFM(data, m, A, D, p)PPC1.EFM(data, m, A, D, p)
data |
A matrix of input data. |
m |
The number of principal components. |
A |
The true factor loadings matrix. |
D |
The true uniquenesses matrix. |
p |
The number of variables. |
A list containing:
Ap |
Estimated factor loadings. |
Dp |
Estimated uniquenesses. |
MSESigmaA |
Mean squared error for factor loadings. |
MSESigmaD |
Mean squared error for uniquenesses. |
LSigmaA |
Loss metric for factor loadings. |
LSigmaD |
Loss metric for uniquenesses. |
library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- PPC1.EFM(data, m, A, D, p) print(results)library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- PPC1.EFM(data, m, A, D, p) print(results)
Robust PCA by adding random symmetric perturbation to covariance matrix.
robust_perturb_pca(data, m, epsilon = 0.001, sigma = 0.01)robust_perturb_pca(data, m, epsilon = 0.001, sigma = 0.01)
data |
T x N matrix. |
m |
Components. |
epsilon |
Perturbation scale. |
sigma |
Noise sd. |
List with Ap, Dp, lambda.
## Not run: set.seed(456) N <- 80; T <- 100; m.true <- 2 X <- matrix(rnorm(T * N), T, N) rpca.out <- EFM::robust_perturb_pca(X, m = m.true) print(round(rpca.out$Ap[1:5, 1:2], 3)) ## End(Not run)## Not run: set.seed(456) N <- 80; T <- 100; m.true <- 2 X <- matrix(rnorm(T * N), T, N) rpca.out <- EFM::robust_perturb_pca(X, m = m.true) print(round(rpca.out$Ap[1:5, 1:2], 3)) ## End(Not run)
This function processes Elliptical Factor Model (EFM) data using the Sparse Online Principal Component (SOPC) method.
SOPC.EFM(data, m, p, A, D)SOPC.EFM(data, m, p, A, D)
data |
A numeric matrix containing the data used in the SOPC analysis. |
m |
An integer specifying the number of subsets or common factors. |
p |
An integer specifying the number of variables in the data. |
A |
A numeric matrix representing the true factor loadings. |
D |
A numeric matrix representing the true uniquenesses. |
A list containing the following metrics:
Aso |
Estimated factor loadings matrix. |
Dso |
Estimated uniquenesses matrix. |
MSEA |
Mean squared error of the estimated factor loadings (Aso) compared to the true loadings (A). |
MSED |
Mean squared error of the estimated uniquenesses (Dso) compared to the true uniquenesses (D). |
LSA |
Loss metric for the estimated factor loadings (Aso), indicating the relative error compared to the true loadings (A). |
LSD |
Loss metric for the estimated uniquenesses (Dso), indicating the relative error compared to the true uniquenesses (D). |
tauA |
Proportion of zero factor loadings in the estimated loadings matrix (Aso), representing the sparsity. |
library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- SOPC.EFM(data, m, p, A, D) print(results)library(matrixcalc) library(MASS) n <- 100 p <- 10 m <- 5 mu <- t(matrix(rep(runif(p, 0, 1000), n), p, n)) mu0 <- as.matrix(runif(m, 0)) sigma0 <- diag(runif(m, 1)) F_matrix <- matrix(mvrnorm(n, mu0, sigma0), nrow = n) A <- matrix(runif(p * m, -1, 1), nrow = p) r <- rnorm(n * p, 0, 1) epsilon <- matrix(r, nrow = n) D <- diag(as.vector(apply(epsilon, 2, function(x) sum(x^2)))) data <- mu + F_matrix %*% t(A) + epsilon results <- SOPC.EFM(data, m, p, A, D) print(results)