| Title: | Analyzing Censored Factor Models |
|---|---|
| Description: | Provides generation and estimation of censored factor models for high-dimensional data with censored errors (normal, t, logistic). Includes Sparse Orthogonal Principal Components (SOPC), and evaluation metrics. Based on Guo G. (2023) <doi:10.1007/s00180-022-01270-z>. |
| Authors: | Guangbao Guo [aut, cre], Tong Meng [aut] |
| Maintainer: | Guangbao Guo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.8.0 |
| Built: | 2026-06-05 07:47:44 UTC |
| Source: | https://github.com/cran/CFM |
Australian AIDS survival data containing information on patients diagnosed with AIDS in Australia.
data(Aids2)data(Aids2)
A data frame with 2843 observations and 7 variables:
State in Australia: NSW, Other, QLD, VIC
Sex of the patient
Date of diagnosis
Date of death
Status at the end of the study: A (alive), D (dead)
Transmission category
Age at diagnosis
Australian National AIDS Registry
data(Aids2) head(Aids2) summary(Aids2)data(Aids2) head(Aids2) summary(Aids2)
A dataset containing boundary condition data for computational fluid mechanics simulations.
data(bcdata)data(bcdata)
A data frame with variables:
Time variable (seconds)
Pressure values (Pa)
Temperature values (K)
Velocity components (m/s)
Experimental measurements or numerical simulations
data(bcdata) head(bcdata) summary(bcdata)data(bcdata) head(bcdata) summary(bcdata)
Implementation of robust censored factor model using robust PCA initialization for handling left-censored data.
censored_factor_model( X, m, max_iter = 100, tol = 1e-04, nugget = 1e-06, alpha = 0.75 )censored_factor_model( X, m, max_iter = 100, tol = 1e-04, nugget = 1e-06, alpha = 0.75 )
X |
Data matrix (n x p) |
m |
Number of factors |
max_iter |
Maximum number of ECM iterations (default: 100) |
tol |
Convergence tolerance (default: 1e-4) |
nugget |
Numerical stability term (default: 1e-6) |
alpha |
Robustness parameter for MCD (default: 0.75) |
A list containing model results
Generates multivariate data that follow a latent factor structure with censored errors (Normal, Student-t or Logistic).
censored_factor_models( n, p, m, distribution = c("normal", "t", "logistic"), df = NULL, seed = NULL )censored_factor_models( n, p, m, distribution = c("normal", "t", "logistic"), df = NULL, seed = NULL )
n |
Sample size (> 0). |
p |
Number of observed variables (> 0). |
m |
Number of latent factors (< p). |
distribution |
Error distribution: "normal" (default), "t", "logistic". |
df |
Degrees of freedom when distribution = "t". |
seed |
Optional random seed. |
A list with components:
data |
numeric n × p matrix of observations |
loadings |
p × m factor loadings matrix |
uniqueness |
p × p diagonal uniqueness matrix |
KMO |
KMO measure of sampling adequacy |
Bartlett_p |
p-value of Bartlett's test |
distribution |
error distribution used |
seed |
random seed |
set.seed(2025) obj <- censored_factor_models(200, 6, 2) psych::KMO(obj$data)set.seed(2025) obj <- censored_factor_models(200, 6, 2) psych::KMO(obj$data)
Implementation of kernel-based censored factor model using kernel PCA initialization for nonlinear factor analysis with censored data.
censored_kernel_factor_model( X, m, kernel_type = "rbf", gamma = NULL, max_iter = 100, tol = 1e-04, nugget = 1e-06 )censored_kernel_factor_model( X, m, kernel_type = "rbf", gamma = NULL, max_iter = 100, tol = 1e-04, nugget = 1e-06 )
X |
Data matrix (n x p) |
m |
Number of factors |
kernel_type |
Kernel type: "rbf" or "linear" (default: "rbf") |
gamma |
Gamma parameter for RBF kernel (default: 1/p) |
max_iter |
Maximum number of ECM iterations (default: 100) |
tol |
Convergence tolerance (default: 1e-4) |
nugget |
Numerical stability term (default: 1e-6) |
A list containing model results
Generate multivariate data that follow a latent factor structure
with censoring errors drawn from Normal, Student-t or Logistic
distributions. Convenience wrapper around rcnorm,
rct, and rclogis.
CFM(n, p, m, cens.dist = c("normal", "t", "logistic"), df = 5, seed = NULL)CFM(n, p, m, cens.dist = c("normal", "t", "logistic"), df = 5, seed = NULL)
n |
sample size ( |
p |
number of manifest variables. |
m |
number of latent factors. |
cens.dist |
censoring error distribution:
|
df |
degrees of freedom when |
seed |
optional random seed for reproducibility. |
A named list with components:
data |
numeric |
F |
factor scores matrix ( |
A |
factor loadings matrix ( |
D |
unique variances diagonal matrix ( |
set.seed(2025) # Normal censoring obj <- CFM(n = 200, p = 10, m = 3, cens.dist = "normal") head(obj$data) # t-censoring with 6 d.f. obj <- CFM(n = 300, p = 12, m = 4, cens.dist = "t", df = 6) psych::KMO(obj$data)set.seed(2025) # Normal censoring obj <- CFM(n = 200, p = 10, m = 3, cens.dist = "normal") head(obj$data) # t-censoring with 6 d.f. obj <- CFM(n = 300, p = 12, m = 4, cens.dist = "t", df = 6) psych::KMO(obj$data)
Censored Factor Analysis via Principal Component (FanPC, pure R)
FanPC.CFM( data, m, A = NULL, D = NULL, p = NULL, cens.dist = c("normal", "t", "logistic"), df = NULL, cens.method = c("winsorise", "em"), cens_prop = 0.01, surv.obj = NULL, ctrl = NULL, verbose = NULL )FanPC.CFM( data, m, A = NULL, D = NULL, p = NULL, cens.dist = c("normal", "t", "logistic"), df = NULL, cens.method = c("winsorise", "em"), cens_prop = 0.01, surv.obj = NULL, ctrl = NULL, verbose = NULL )
data |
Numeric matrix or data frame of dimension |
m |
Number of factors (< p). |
A |
Optional true loading matrix, used only for error calculation. |
D |
Optional true unique-variance diagonal matrix, used only for error calculation. |
p |
Number of variables (deprecated; detected automatically). |
cens.dist |
Error distribution, reserved for future use. |
df |
Degrees of freedom, reserved for future use. |
cens.method |
Censoring handling method; currently only |
cens_prop |
Winsorisation proportion, default 0.01. |
surv.obj |
Reserved for future use. |
ctrl |
Reserved for future use. |
verbose |
Reserved for future use. |
Estimated loading matrix, p × m.
Estimated unique-variance diagonal matrix, p × p.
Mean squared error of loadings (if A is provided).
Mean squared error of unique variances (if D is provided).
Relative error of loadings (if A is provided).
Relative error of unique variances (if D is provided).
library(CFM) obj <- CFM(n = 500, p = 10, m = 2, cens.dist = "normal") res <- FanPC.CFM(obj$data, m = 2, A = obj$A, D = obj$D, cens.method = "winsorise") print(res$MSESigmaA)library(CFM) obj <- CFM(n = 500, p = 10, m = 2, cens.dist = "normal") res <- FanPC.CFM(obj$data, m = 2, A = obj$A, D = obj$D, cens.method = "winsorise") print(res$MSESigmaA)
Implementation of incremental censored factor model for streaming data with left-censored observations. Processes data in batches.
incremental_censored_factor_model( batch_data, m, max_iter = 100, tol = 1e-04, nugget = 1e-06 )incremental_censored_factor_model( batch_data, m, max_iter = 100, tol = 1e-04, nugget = 1e-06 )
batch_data |
List of data matrices (batches) |
m |
Number of factors |
max_iter |
Maximum number of ECM iterations (default: 100) |
tol |
Convergence tolerance (default: 1e-4) |
nugget |
Numerical stability term (default: 1e-6) |
A list containing model results
PC2 for censored factor models (Top-2 principal components, pure R)
PC2.CFM( data, m, A = NULL, D = NULL, p = NULL, cens.dist = c("normal", "t", "logistic"), df = NULL, cens.method = c("winsorise", "em"), cens_prop = 0.01, surv.obj = NULL, ctrl = NULL, verbose = NULL )PC2.CFM( data, m, A = NULL, D = NULL, p = NULL, cens.dist = c("normal", "t", "logistic"), df = NULL, cens.method = c("winsorise", "em"), cens_prop = 0.01, surv.obj = NULL, ctrl = NULL, verbose = NULL )
data |
Numeric matrix or data frame of dimension |
m |
Number of factors (< p). |
A |
Optional true loading matrix, used only for error calculation. |
D |
Optional true unique-variance diagonal matrix, used only for error calculation. |
p |
Number of variables (deprecated; detected automatically). |
cens.dist |
Error distribution, reserved for future use. |
df |
Degrees of freedom, reserved for future use. |
cens.method |
Censoring handling method; currently only |
cens_prop |
Winsorisation proportion, default 0.01. |
surv.obj |
Reserved for future use. |
ctrl |
Reserved for future use. |
verbose |
Reserved for future use. |
Estimated loading matrix, p × 2.
Estimated unique-variance diagonal matrix, p × p.
Mean squared error of loadings (if A is provided).
Mean squared error of unique variances (if D is provided).
Relative error of loadings (if A is provided).
Relative error of unique variances (if D is provided).
library(CFM) obj <- CFM(n = 500, p = 12, m = 2, cens.dist = "normal") res <- PC2.CFM(obj$data, A = obj$A, D = obj$D) print(res$MSESigmaA)library(CFM) obj <- CFM(n = 500, p = 12, m = 2, cens.dist = "normal") res <- PC2.CFM(obj$data, A = obj$A, D = obj$D) print(res$MSESigmaA)
PPC2 for censored factor models (Top-2 principal components, pure R)
PPC2.CFM( data, m, A = NULL, D = NULL, p = NULL, cens.dist = c("normal", "t", "logistic"), df = NULL, cens.method = c("winsorise", "em"), cens_prop = 0.01, surv.obj = NULL, ctrl = NULL, verbose = NULL )PPC2.CFM( data, m, A = NULL, D = NULL, p = NULL, cens.dist = c("normal", "t", "logistic"), df = NULL, cens.method = c("winsorise", "em"), cens_prop = 0.01, surv.obj = NULL, ctrl = NULL, verbose = NULL )
data |
Numeric matrix or data frame of dimension |
m |
Number of factors (< p). |
A |
Optional true loading matrix, used only for error calculation. |
D |
Optional true unique-variance diagonal matrix, used only for error calculation. |
p |
Number of variables (deprecated; detected automatically). |
cens.dist |
Error distribution, reserved for future use. |
df |
Degrees of freedom, reserved for future use. |
cens.method |
Censoring handling method; currently only |
cens_prop |
Winsorisation proportion, default 0.01. |
surv.obj |
Reserved for future use. |
ctrl |
Reserved for future use. |
verbose |
Reserved for future use. |
Estimated loading matrix, p × 2.
Estimated unique-variance diagonal matrix, p × p.
Mean squared error of loadings (if A is provided).
Mean squared error of unique variances (if D is provided).
Relative error of loadings (if A is provided).
Relative error of unique variances (if D is provided).
library(CFM) obj <- CFM(n = 500, p = 12, m = 2, cens.dist = "normal") res <- PPC2.CFM(obj$data, A = obj$A, D = obj$D, cens.method = "winsorise") print(res$MSESigmaA)library(CFM) obj <- CFM(n = 500, p = 12, m = 2, cens.dist = "normal") res <- PPC2.CFM(obj$data, A = obj$A, D = obj$D, cens.method = "winsorise") print(res$MSESigmaA)
Implementation of weighted censored factor model with weighted PCA initialization for handling left-censored data with observation weights.
weighted_censored_factor_model( X, m, weights = NULL, max_iter = 100, tol = 1e-04, nugget = 1e-06 )weighted_censored_factor_model( X, m, weights = NULL, max_iter = 100, tol = 1e-04, nugget = 1e-06 )
X |
Data matrix (n x p) |
m |
Number of factors |
weights |
Observation weights vector of length n (optional) |
max_iter |
Maximum number of ECM iterations (default: 100) |
tol |
Convergence tolerance (default: 1e-4) |
nugget |
Numerical stability term (default: 1e-6) |
A list containing model results
A dataset containing experimental or survey data related to yogurt.
data(yoghurt)data(yoghurt)
A data frame with the following columns:
Description of adult variable
Description of fadult variable
Description of left variable
Description of right variable
This dataset contains experimental data collected from yogurt-related studies. The specific meaning of each variable should be documented based on the original study.
Original data source should be specified here.
# Load the data data(yoghurt) # Examine data structure str(yoghurt) # View first few rows head(yoghurt) # Basic summary statistics summary(yoghurt)# Load the data data(yoghurt) # Examine data structure str(yoghurt) # View first few rows head(yoghurt) # Basic summary statistics summary(yoghurt)