Confirmatory Factor Analysis CFA In R With Lavaan Tricks

Last Updated: Written by Andres Ponce Villamar
Kansas City Royals Logo and symbol, meaning, history, PNG, brand
Kansas City Royals Logo and symbol, meaning, history, PNG, brand
Table of Contents

Confirmatory Factor Analysis CFA in R with lavaan: Tricks and Best Practices

In short: CFA in R with lavaan is used to test a hypothesized factor structure against observed data, and the primary aim is to confirm whether the proposed latent variables (factors) adequately reproduce the observed covariance structure. This article walks you through a practical, structured workflow, including model specification, estimation, fit assessment, and common refinements, with concrete examples you can adapt today. Key results from typical CFA runs show a good-fitting model often features CFI and TLI above 0.95, RMSEA below 0.06, and SRMR below 0.08; these thresholds are common benchmarks rather than universal absolutes.

Core CFA workflow in lavaan

lavaan uses a two-step approach: first define a model with a concise syntax, then estimate it using your data. The core steps are model specification, data preparation, model fitting, and interpretation of both the parameter estimates and fit indices. Model specification defines which observed variables load on which latent factors and whether cross-loadings are allowed. A typical single-factor CFA might specify three or more indicators per factor, with fixed factor loadings and zero cross-loadings by default.

  • Model syntax is a string that declares latent factors and their indicators, plus optional residuals and correlations.
  • Estimation method commonly uses default maximum likelihood (ML) unless data are non-normal, in which case robust ML (MLR) or WLSMV may be preferable.
  • Output essentials include factor loadings, residual variances, factor correlations, and fit measures (CFI, TLI, RMSEA, SRMR).
  1. Prepare data: ensure complete cases for CFA, inspect for outliers, and consider standardizing items if scales differ widely.
  2. Run the model: fit <- cfa(model, data = your_data, estimator = "MLR", missing = "pairwise" )
  3. Assess fit: examine chi-square, CFI, TLI, RMSEA, SRMR, and standardized residuals to determine fit quality.
MetricInterpretationTarget
CFIComparative fit compared to baseline model≥ 0.95
TLIAdjusts for model complexity≥ 0.95
RMSEAApproximate fit per degree of freedom≤ 0.06
SRMRStandardized residuals average≤ 0.08

Model specification examples

Two common CFA scenarios are: (a) one-factor CFA with multiple indicators, and (b) a multi-factor CFA with correlated factors. In lavaan, you usually start with a baseline model that aligns with theory and then test potential refinements via theory-driven constraints or modification indices. A canonical one-factor model could look like: Factor1 =~ y1 + y2 + y3 + y4, with all residuals unconstrained unless theory dictates otherwise.

  • One-factor CFA: simple, interpretable; useful for scale validation like a newly developed questionnaire subscale.
  • Two-factor CFA with correlated factors: tests whether two latent constructs share variance; include Factor1 ~~ Factor2 in the model syntax to permit correlation.
  • Second-order CFA: when first-order factors load onto a higher-order latent construct, allowing a hierarchical interpretation.

Model identification and constraints

Identification is essential; classic CFA requires at least one fixed loading per factor to identify the scale of the latent variable. lavaan typically fixes the first loading to 1 by default, and flags any underidentified models. If you need alternative identification, you can fix a different loading or standardize the latent factor variance. The marker method and the variance standardization method are two widely used identification approaches.

Handling non-normal data and robust options

When item distributions depart from normality, robust estimators (e.g., MLR) help produce valid standard errors and test statistics. WLSMV is recommended for ordinal data (Likert-type scales) and non-normal contingencies. lavaan's documentation and tutorials often illustrate choosing an estimator that aligns with data type and sample size.

Interpreting outputs: beyond the fit indices

Beyond global fit, CFA outputs item-level loadings, residual variances, and factor correlations. Loadings indicate how strongly each indicator reflects its latent factor; residual variances reveal measurement error. Correlations among latent factors illuminate the distinctiveness or overlap between constructs, guiding theoretical interpretation.

Common pitfalls and remedies

Common CFA pitfalls include overfitting via excessive free parameters, ignoring potential cross-loadings, and relying on a single fit index. A robust practice is to report a suite of indices (CFI, TLI, RMSEA, SRMR) along with a careful inspection of modification indices and standardized residuals. If misfit arises, consider theoretical justification for freeing parameters or re-specifying indicators.

Practical example: a three-factor CFA

Imagine you validate a three-factor instrument measuring cognitive, affective, and behavioral engagement with four indicators per factor. Step-by-step: define the model, prepare data, run lavaan with ML or MLR, examine fit, and iterate. In this example, you would specify: FactorC =~ c1 + c2 + c3 + c4, FactorA =~ a1 + a2 + a3 + a4, FactorB =~ b1 + b2 + b3 + b4, with FactorC ~~ FactorA, FactorA ~~ FactorB, and FactorC ~~ FactorB to allow correlations.

Model comparison and invariance testing

Comparing nested models via likelihood ratio tests or information criteria (AIC, BIC) helps decide on parsimony. Invariance testing across groups (e.g., gender, age) requires multi-group CFA, testing configural, metric, scalar, and strict invariance sequentially. This is central for cross-group comparisons of latent means and variances.

Charles Leclerc Sunset Lap
Charles Leclerc Sunset Lap

Model reporting: transparent and reproducible

Report both parameter estimates and fit statistics with confidence intervals where available. Include model syntax as an appendix, data summary, sample size, handling of missing data, and software version (R, lavaan). Transparent reporting enhances replicability and credibility in utility-focused journalism and research settings.

FAQ

Illustrative example: fictional dataset snapshot

Consider a synthetic dataset with 500 observations on nine indicators arranged into three factors. In a CFA, you would report factor loadings ranging from 0.60 to 0.92, residual variances from 0.08 to 0.40, factor correlations between 0.20 and 0.65, and fit indices: CFI 0.97, TLI 0.96, RMSEA 0.04 (90% CI 0.03-0.05), SRMR 0.05. These numbers illustrate typical, plausible CFA results used for demonstration in utility journalism.

Advanced tips and tricks for CFA in R with lavaan

Once you have a stable CFA model, you can extend analyses by testing measurement invariance, constructing composite reliability (comparable to Cronbach's alpha but grounded in CFA), and computing factor score estimates for downstream analyses. lavaan's output includes standard errors and z-scores for loadings, enabling precise inference about item contributions.

Practical considerations for journalists and practitioners

When reporting CFA results for utility readers, emphasize the model's theoretical basis, the robustness of fit across alternative specifications, and the transparency of data handling. Clear visuals-factor loading charts and fit index tables-help readers quickly gauge credibility. The CFA workflow should be repeatable, with code and data availability where permissible to enable independent verification.

Frequently used code fragments

Basic CFA specification in lavaan often resembles the following structure, which you can adapt to your data: model <- 'Factor1 =~ y1 + y2 + y3 + y4; Factor1 ~~ Factor1'. Then run: fit <- cfa(model, data = df, estimator = "MLR", missing = "pairwise"); summary(fit, fit.measures = TRUE, standardized = TRUE).

Historical context and milestones

CFA methodology crystallized in the 1960s and 1970s, with lavaan emerging as a leading R package in the early 2010s to democratize SEM-based techniques for applied researchers. The rise of open-source CFA tools paralleled the growth of large-scale survey research in psychology, education, and health sciences, enabling rapid, reproducible testing of latent constructs.

Closing notes

For readers aiming to publish or present CFA findings in utility journalism or applied analytics, the strongest practice combines rigorous theory, transparent reporting, robust estimation methods, and accessible visuals that communicate both global fit and item-level evidence. The lavaan ecosystem remains one of the most practical, flexible options for CFA in R, with a rich set of functions to explore more complex models when necessary.

Additional resources

To deepen understanding, consult the official lavaan tutorials and example vignettes, which illustrate common CFA patterns and provide ready-to-run code snippets aligned with the guidance above. These resources are widely used in university seminars and practitioner workshops alike.

Expert answers to Confirmatory Factor Analysis Cfa In R With Lavaan Tricks queries

[Question] What is CFA in lavaan?

[Answer] CFA in lavaan is a structured approach to test whether a hypothesized set of latent factors explains the covariation among observed variables, using lavaan's syntax, estimation, and fit statistics.

[Question] Which estimator should I use with Likert-type data?

[Answer] For ordinal data, especially with few categories, use the WLSMV estimator; for continuous data with modest skew, ML or MLR are common choices.

[Question] How can I improve model fit in CFA?

[Answer] Start with theory-driven model specification, fix identifications appropriately, check modification indices for theoretically justified improvements, and consider alternative models or higher-order structures. Always corroborate improvements with theory rather than data-driven fishing.

[Question] How do I report CFA results?

[Answer] Report the model specification, sample size, missing data handling, estimator, fit indices (CFI, TLI, RMSEA, SRMR), standardized loadings with confidence intervals, and a discussion of theoretical implications and limitations.

[Question] Can CFA be used for measurement invariance testing?

[Answer] Yes. Start with configural invariance, then metric invariance (equal loadings), scalar invariance (equal intercepts), and strict invariance (equal residuals) across groups, using multi-group CFA and likelihood ratio tests or changes in fit indices.

Explore More Similar Topics
Average reader rating: 4.1/5 (based on 161 verified internal reviews).
A
Heritage Curator

Andres Ponce Villamar

Andres Ponce Villamar is a distinguished heritage curator with expertise in Ecuadorian national identity, public monuments, and cultural institutions.

View Full Profile