A comprehensive, rigorous guide to modern causal inference methods powered by machine learning.
By Victor Chernozhukov1 • Christian Hansen2 • Nathan Kallus3 • Martin Spindler4 • Vasilis Syrgkanis5
1Massachusetts Institute of Technology 2University of Chicago 3Cornell University 4Universität Hamburg 5Stanford University
This book bridges the gap between machine learning and causal inference, providing rigorous methods for answering causal questions using modern ML tools. Topics span predictive inference, causal identification, double/debiased machine learning, heterogeneous treatment effects, instrumental variables, difference-in-differences, regression discontinuity, and more.
Read the full book and download individual chapters at CausalML-Book.org
All chapters are available for free at causalml-book.org.
| Chapter | |
|---|---|
| P | Preface |
| 0 | Powering Causal Inference with ML and AI |
| Chapter | Topics | |
|---|---|---|
| 1 | Predictive Inference with Linear Regression in Moderately High Dimensions | Prediction Inference |
| 2 | Causal Inference via Randomized Experiments | Causality Inference |
| 3 | Predictive Inference via Modern High-Dimensional Linear Regression | Prediction |
| 4 | Statistical Inference on Predictive Effects in High-Dimensional Linear Regression Models | Causality Inference |
| 5 | Causal Inference via Conditional Ignorability | Causality |
| 6 | Causal Inference via Linear Structural Equations | Causality |
| 7 | Causal Inference via DAGs and Nonlinear Structural Equation Models | Causality |
| 8 | Predictive Inference via Modern Nonlinear Regression | Prediction |
| 9 | Statistical Inference on Predictive and Causal Effects in Modern Nonlinear Regression Models | Causality Inference |
| 10 | Feature Engineering for Causal and Predictive Inference | Causality Inference |
| Chapter | |
|---|---|
| 11 | Deeper Dive into DAGs, Good and Bad Controls |
| 12 | Unobserved Confounders, Instrumental Variables, and Proxy Controls |
| 13 | DML for IV and Proxy Controls Models and Robust DML Inference under Weak Identification |
| 14 | Statistical Inference on Heterogeneous Treatment Effects |
| 15 | Estimation and Validation of Heterogeneous Treatment Effects |
| 16 | Difference-in-Differences |
| 17 | Regression Discontinuity Designs |
Ch 1 — Prediction & Linear Regression
| Lab | Python | R |
|---|---|---|
| OLS and Lasso for Wage Prediction | ||
| The Gender Wage Gap | ||
| Exercise on Overfitting |
Ch 2 — Randomized Experiments
| Lab | Python | R |
|---|---|---|
| Vaccination RCT (Polio 1954) | ||
| Covariates in RCT: Precision Adjustment | ||
| Reemployment Bonus RCT |
Ch 3 — High-Dimensional Linear Regression
| Lab | Python | R |
|---|---|---|
| Penalized Linear Regressions: Simulation | ||
| Case Study: Wage Prediction with ML |
Ch 4 — Inference in High-Dimensional Models
| Lab | Python | R |
|---|---|---|
| Simulation on Orthogonal Estimation | ||
| Comparing Orthogonal vs Non-Orthogonal Methods | ||
| Testing the Convergence Hypothesis | ||
| Heterogeneous Effect of Sex on Wage |
Ch 6–7 — DAGs & Structural Equations
| Lab | Python | R |
|---|---|---|
| Collider Bias (Hollywood) | ||
| Causal Identification in DAGs | ||
| DoSearch for Causal Identification |
Ch 8 — Nonlinear Prediction
| Lab | Python | R |
|---|---|---|
| ML Estimators for Wage Prediction | ||
| Functional Approximations by Trees and Neural Nets |
Ch 9 — DML for Causal & Predictive Effects
| Lab | Python | R |
|---|---|---|
| Effect of Gun Ownership on Homicide | ||
| DAG Analysis of 401(k) Impact | ||
| DML Inference on 401(k) Wealth Effects | ||
| DML for Partially Linear Model (Growth) |
Ch 10 — Feature Engineering
| Lab | Python | R |
|---|---|---|
| Variational Autoencoders and PCA | ||
| DoubleML Feature Engineering with BERT | — |
Ch 12–13 — IV, Proxy Controls & Weak Identification
| Lab | Python | R |
|---|---|---|
| Sensitivity Analysis with Sensmakr | ||
| Negative (Proxy) Controls | ||
| DML for 401(k) with IV | ||
| Weak IV Experiments | ||
| DML for Partially Linear IV Model |
Ch 14–16 — Heterogeneous Effects & Diff-in-Diff
| Lab | Python | R |
|---|---|---|
| CATE Estimation with Causal Forests | — | |
| CATE Inference: Best Linear Predictors | — | |
| Conditional Average Treatment Effects | — | |
| Difference-in-Differences: Minimum Wage |
Companion open-source implementations for the methods covered in the book.
![]() DoubleML Double/Debiased ML in Python & R |
![]() EconML Heterogeneous Treatment Effects |
Stata ML & ddml
Regularized Regression & DML for Stata |
|
PLR, IRM, PLIV, IIVM models. Builds on scikit-learn (Python) and mlr3 (R). |
Double ML, Causal Forests, Meta-Learners, IV methods. Part of PyWhy. |
@article{chernozhukov2024applied,
title = {Applied Causal Inference Powered by ML and AI},
author = {Chernozhukov, Victor and Hansen, Christian and Kallus, Nathan
and Spindler, Martin and Syrgkanis, Vasilis},
journal = {arXiv preprint arXiv:2403.02467},
year = {2024},
doi = {10.48550/arXiv.2403.02467}
}
