| Title: | General Unilateral Load Estimator for Two-Layer Latent Factor Models |
|---|---|
| Description: | Implements general unilateral loading estimator for two-layer latent factor models with smooth, element-wise factor transformations. We provide data simulation, loading estimation,finite-sample error bounds, and diagnostic tools for zero-mean and sub-Gaussian assumptions. A unified interface is given for evaluating estimation accuracy and cosine similarity. The philosophy of the package is described in Guo G. (2026) <doi:10.1016/j.apm.2025.116280>. |
| Authors: | Guangbao Guo [aut, cre] |
| Maintainer: | Guangbao Guo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.5.0 |
| Built: | 2026-06-07 08:09:18 UTC |
| Source: | https://github.com/cran/GulFM |
General unilateral load Estimator
estimate_gul_loadings(X, m)estimate_gul_loadings(X, m)
X |
n *p data matrix (already centred and scaled if desired). |
m |
number of latent factors (both layers). |
Step 1: PCA on X to get hat_A1 Step 2: Regress X on hat_A1 to get hat_gF1 Step 3: PCA on hat_gF1 to get hat_A2 Step 4: hat_Ag = hat_A1
A list with hat_A1 : p * m 1st-layer loadings hat_A2 : m * m 2nd-layer loadings hat_Ag : p * m overall loadings Sigma1 : p * p sample cov(X) (for diagnostics) Sigma2 : m * m sample cov(hat_gF1) hat_gF1 : n * m estimated transformed latent factors eig1 : eigen-values of Sigma1 eig2 : eigen-values of Sigma2
dat <- generate_gfm_data(500, 50, 5, tanh, seed = 1) est <- estimate_gul_loadings(dat$X, m = 5) err <- sqrt(mean((est$hat_Ag - dat$Ag)^2)) # overall RMSEdat <- generate_gfm_data(500, 50, 5, tanh, seed = 1) est <- estimate_gul_loadings(dat$X, m = 5) err <- sqrt(mean((est$hat_Ag - dat$Ag)^2)) # overall RMSE
Returns a vectorised map and its exact Lipschitz constant
for three increasingly nonlinear choices.
g_fun(type = c("linear", "weak_nonlinear", "strong_nonlinear"))g_fun(type = c("linear", "weak_nonlinear", "strong_nonlinear"))
type |
Character string selecting the map:
|
Named list with components
g_fun |
vectorised function |
L_g |
scalar Lipschitz constant of |
## pick a link with L_g = 1 tmp <- g_fun("linear") dat <- generate_gfm_data(n = 500, p = 200, m = 5, g_fun = tmp$g_fun) est <- estimate_gul_loadings(dat$X, m = 5) err <- norm(est$hat_Ag - dat$Ag, "F") sprintf("F-error (L_g = %d) = %.3f", tmp$L_g, err)## pick a link with L_g = 1 tmp <- g_fun("linear") dat <- generate_gfm_data(n = 500, p = 200, m = 5, g_fun = tmp$g_fun) est <- estimate_gul_loadings(dat$X, m = 5) err <- norm(est$hat_Ag - dat$Ag, "F") sprintf("F-error (L_g = %d) = %.3f", tmp$L_g, err)
One Monte-Carlo replicate; returns empirical error, exceedance indicator, theoretical bounds, and assumption-check flags.
g_theorem(n, p, m, g_type, epsilon, zero_tol = 0.02)g_theorem(n, p, m, g_type, epsilon, zero_tol = 0.02)
n |
sample size |
p |
number of observed variables |
m |
number of latent factors |
g_type |
character: "linear", "weak_nonlinear", "strong_nonlinear" |
epsilon |
error threshold |
zero_tol |
zero-mean tolerance (default 0.02) |
one-row data-frame
df <- g_theorem(500, 200, 5, "linear", 0.6)df <- g_theorem(500, 200, 5, "linear", 0.6)
Generate general factor model with smooth latent transformation
generate_gfm_data(n, p, m, g_fun, seed = 1, sigma_V = 0.1)generate_gfm_data(n, p, m, g_fun, seed = 1, sigma_V = 0.1)
n |
Integer: sample size. |
p |
Integer: number of observed variables. |
m |
Integer: number of latent factors (both layers). |
g_fun |
Function: smooth, element-wise transformation applied to latent factors. Must be vectorised, e.g. 'sin', 'tanh', 'scale'. |
seed |
1. |
sigma_V |
Numeric: standard deviation of the idiosyncratic noise (default 0.1 => Var = 0.01). |
List with components X : n * p matrix of standardised observations. A1 : p * m first-layer loading matrix. A2 : m * m second-layer loading matrix. Ag : p * m overall loading matrix (Ag = A1 F1 : n * m latent factors (before transformation). gF1: n * m latent factors (after transformation). V1 : n * p noise matrix (for diagnostics).
dat <- generate_gfm_data(200, 50, 5, g_fun = tanh)dat <- generate_gfm_data(200, 50, 5, g_fun = tanh)
Generates one synthetic data set, estimates loadings with the GUL, and evaluates estimation accuracy.
gul_simulation(n, p, m, g_fun)gul_simulation(n, p, m, g_fun)
n |
Integer: sample size. |
p |
Integer: number of observed variables. |
m |
Integer: number of latent factors (both layers). |
g_fun |
Function: element-wise, smooth transformation applied to the latent factors (e.g. 'tanh', 'sin'). |
Named numeric vector with components error_F : Frobenius norm ||hat(Ag) - Ag||_F
gul_simulation(200, 50, 5, g_fun = tanh)gul_simulation(200, 50, 5, g_fun = tanh)
Multi-metric evaluation of factor loading matrix estimation error
loading_metrics(A_true, A_hat)loading_metrics(A_true, A_hat)
A_true |
True loading matrix (p x m) |
A_hat |
Estimated loading matrix (p x m) |
data.frame with MSE, RMSE, MAE, MaxDev, and Cosine similarity
## simulated example p <- 100; m <- 5 Ag_true <- matrix(rnorm(p*m), p, m) Ag_hat <- Ag_true + matrix(rnorm(p*m, 0, 0.1), p, m) metrics <- loading_metrics(Ag_true, Ag_hat) print(metrics)## simulated example p <- 100; m <- 5 Ag_true <- matrix(rnorm(p*m), p, m) Ag_hat <- Ag_true + matrix(rnorm(p*m, 0, 0.1), p, m) metrics <- loading_metrics(Ag_true, Ag_hat) print(metrics)
Draws n i.i.d. N(0, I_m) latent factors, applies g component-wise, and checks whether |E[g(x)]| < tol on every coordinate.
verify_mean(g_fun, m = 5, n = 10000, tol = 0.001)verify_mean(g_fun, m = 5, n = 10000, tol = 0.001)
g_fun |
vectorised map g: R -> R |
m |
latent dimension |
n |
Monte-Carlo sample size |
tol |
numerical tolerance (default 1e-3) |
logical TRUE if |mean| < tol on all coords
tmp <- g_fun("weak_nonlinear") verify_mean(tmp$g_fun, m = 5)tmp <- g_fun("weak_nonlinear") verify_mean(tmp$g_fun, m = 5)
Draws n i.i.d. N(0, I_m) latent factors, applies g component-wise, and checks whether E[exp(g(x))] remains below an empirical cut-off. This is a quick proxy for finite sub-Gaussian norm.
verify_subgaussian(g_fun, m = 5, n = 1000, cut = exp(2))verify_subgaussian(g_fun, m = 5, n = 1000, cut = exp(2))
g_fun |
vectorised map g: R -> R |
m |
latent dimension |
n |
Monte-Carlo sample size |
cut |
empirical threshold (default exp(2) & 7.389) |
logical TRUE if E[exp(g)] < cut on all coords
tmp <- g_fun("strong_nonlinear") verify_subgaussian(tmp$g_fun, m = 5)tmp <- g_fun("strong_nonlinear") verify_subgaussian(tmp$g_fun, m = 5)