Targeted Learning

Causal Inference for Observational and Experimental Data

Targeted learning is a framework for causal and statistical inference methodology incorporating machine learning. This site contains research updates and resources. The book Targeted Learning: Causal Inference for Observational and Experimental Data, by Mark J. van der Laan and Sherri Rose, was published in 2011.

COMING 2017: Targeted Learning in Data Science by van der Laan and Rose, published by Springer.


Tutorial Papers: Super Learning for Prediction & TMLE for Causal Effects


L. Balzer, M. van der Laan, M. Petersen, SEARCH Collaboration. Adaptive pre-specification in randomized trials with and without pair-matching. Statistics in Medicine. [Link]

M. Schuler, S. Rose. Targeted maximum likelihood estimation for causal inference in observational studies. American Journal of Epidemiology. [PDF]

M. Schnitzer, J. Lok, S. Gruber. Variable selection for confounder control, flexible modeling and collaborative targeted minimum loss-based estimation in causal inference. IJB. [Link]

S. Rose. A machine learning framework for plan payment risk adjustment. Health Services Research. [Link]

L. Cain, M. Saag, M. Petersen, et al. Using observational data to emulate a randomized trial of dynamic treatment-switching strategies: An application to antiretroviral therapy. Int J Epidemiology. [Link]

M. Schnitzer, J. Lok, R. Bosch. Double robust and efficient estimation of a prognostic model for events in the presence of dependent censoring. Biostatistics. [Link]

A. Mirelman, S. Rose, J. Khan, S. Ahmed, D. Peters, L. Niessen, A. Trujillo. The relationship between non-communicable disease occurrence and poverty: Evidence from demographic surveillance in Matlab, Bangladesh. Health Policy & Planning. [Link]


A. Chambaz, P. Neuvial. tmle.npvi: Targeted, integrative search of associations between DNA copy number and gene expression, accounting for DNA methylation. Bioinformatics, 31(18):3054-6. [Link]

J. Ahern, L. Balzer, S. Galea. The roles of outlet density and norms in alcohol use disorder. Drug Alcohol Depend, 151:144-50. [Link]

D. Brown, M. Petersen, S. Costello, et al. Occupational exposure to PM2.5 and incidence of ischemic health disease: Longitudinal targeted minimum loss-based estimation. Epidemiology, 26(6):806-14. [Link]

A. Weber, M.J. van der Laan, M. Petersen. Assumption trade-offs when choosing identification strategies for pre-post treatment effect estimation: An illustration of a community-based intervention in Madagascar. JCI, 3(1):109-130. [Link]

S. Gruber. Targeted learning in healthcare research. Big Data, 3(4):211-18. [PDF]

M.J. van der Laan, A. Luedtke. Targeted learning of the mean outcome under an optimal dynamic treatment rule. JCI, 3(1):61-95. [Link]

E. LeDell, M. Petersen, M.J. van der Laan. Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. Electron J Stat, 9(1):1583-1607. [Link]

S. Lendle, B. Fireman, M.J. van der Laan. Balancing score adjustment targeted minimum loss-based estimation. JCI, 3(2):139-55. [Link]

S. Rose. Targeted learning for pre-analysis plans in public health and health policy research. Observational Studies, 1:294-306. [PDF]

I. Díaz, A. Hubbard, A. Decker, M. Cohen. Variable importance and prediction methods for longitudinal problems with missing variables. PLoS One, 10(3):e0120031. [Link]

M. Petersen, E. LeDell, J. Schwab, et al. Super learner analysis of electronic adherence data improves viral prediction and may provide strategies for selective HIV RNA monitoring. J Acquir Immune Defic Syndr, 69(1):109-18. [Link]

S. Gruber. A causal perspective on OSIM2 data generation, with implications for simulation study design and interpretation. JCI, 3(2):177-87. [Link]

R. Pirracchio, M. Petersen, M. Carone, M. Rigon, S. Chevret, M.J. van der Laan. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): A population-based study. Lancet Respir Med, 3(1):42-52. [Link]

R. Pirracchio, M. Petersen, M. van der Laan. Improving propensity score estimators' robustness to model misspecification using super learner. Am J Epidemiol, 181(2):108-19. [Link]

L. Balzer, M. Petersen, M.J. van der Laan and the SEARCH Consortium. Adaptive pair-matching in randomized trials with unbiased and efficient effect estimation. Stat Med, e-pub before print doi: 10.1002/sim.6380. [Link]

R. Kessler, C. Warner, C. Ivany, et al. Predicting suicides after psychiatric hospitalization in US Army soldiers. JAMA Psychiatry, 72(1):49-57. [Link]


H. Wang, Z. Zhang, S. Rose, M.J. van der Laan. A novel targeted learning method for quantitative trait loci mapping. Genetics, 198(4):1369-76. [PDF]

M. Petersen, J. Schwab, S. Gruber, N. Blaser, M. Schomaker, M. van der Laan. Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. J Causal Inference, 2(2):147-185. [PDF]

H. Leslie, D. Karasek, L. Harris, E. Chang, N. Abdulrahim, M. Maloba, M. Huchko. Cervical cancer precursors and hormonal contraceptive use in HIV-positive women: A application of a causal model and semi-parametric methods. PLoS ONE, 9(6): e101090. [PDF]

M.J. van der Laan, R. Starmans. Entering the era of data science: Targeted learning and the integration of statistics and computational data analysis. Advances in Statistics, 2014: 502678. [PDF]

K. Rudolph, I. Díaz, M. Rosenblum, E. Stuart. Estimating population treatment effects from a survey subsample. Am J Epidemiol, 180(7):737-48. [Link]

R. Thomas, A. Hubbard, C. McHale, L. Zhang, S. Rappaport, et al. Characterization of changes in gene expression and biochemical pathways at low levels of benzene exposure. PLoS ONE, 9(5): e91828. [PDF]

P. Kotwani, L. Balzer, D. Kwarisiima, T. Clark, K. Kabami, et al. Evaluating linkage to care for hypertension after community-based screening in rural Uganda. Trop Med Int Health, 19(4):459-68. [Link]

M. Schnitzer, E. Moodie, M.J. van der Laan, R. Platt, M. Klein. Modeling the impact of hepatitis C viral clearance and end-stage liver disease in an HIV co-infected cohort with targeted maximum likelihood estimation. Biometrics, 70(1):144-52. [PDF]

R. Neugebauer, J. Schmittdiel, M.J. van der Laan. Targeted learning in real-world comparative effectiveness research with time-varying interventions. Stat Med, 33(14):2480-2520. [Link]

R. Kessler, S. Rose, K. Koenen, E Karam, et al. How well can post-traumatic stress disorder be predicted from pre-trauma risk factors? An exploratory study in the WHO World Mental Health Surveys. World Psychiatry, 13(3):265-74. [PDF]

M. Schnitzer, M.J. van der Laan, E. Moodie, R. Platt. Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data. Ann Appl Stat, 8(2):703-25. [PDF]

M.J. van der Laan. Targeted estimation of nuisance parameters to obtain valid statistical inference. Int J Biostat, 10(1):29-57. [Link]

S. Sapp, M.J. van der Laan. Subsemble: An ensemble method for combining subset-specfic algorithm fits. J Appl Statist, 41(6). [Link]

M. Petersen, M.J. van der Laan. Causal models and learning from data: Integrating causal modeling and statistical estimation. Epidemiology, 25(3):418-426. [Link]

M.J. van der Laan. Causal inference for a population of causally connected units. JCI, 2(1). [Link]

S. Sapp, M.J. van der Laan. Targeted estimation of binary variable importance measures with interval-censored outcomes. Int J Biostat, 10(1). [Link]

S. Rose, M.J. van der Laan. Rose and van der Laan respond to "Some advantages of RERI."  Am J Epidemiol, 179(6)172-6. [PDF]

S. Rose, M.J. van der Laan. A double robust approach to causal effects in case-control studies.  Am J Epidemiol, 179(6):663-9. [PDF]

A. Chambaz, D. Choudat, C. Huber, J-C. Pairon, M.J. van der Laan. Analysis of the effect of occupational exposure to asbestos based on threshold regression modeling of case-control data. Biostatistics, 15(2):327-340. [Link]


I. Díaz, M.J. van der Laan. Targeted data adaptive estimation of the causal dose-response curve. JCI, 2(1):171-192. [Link]

M. Legrand, R. Pirracchio, A. Rosa, M. Petersen, M.J. van der Laan, J. Fabiani et al. Incidence, risk factors and prediction of post-operative acute kidney injury following cardiac surgery for active infective endocarditis: An observational study. Crit Care, 17(5):R220. [PDF]

M. Subbaraman, S. Lendle, M.J. van der Laan, L. Kaskutas, J. Ahern. Cravings as a mediator and moderator of drinking outcomes in the COMBINE study. Addiction, 108(10):1737-44. [Link]

S. Lendle, B. Fireman, M.J. van der Laan. Targeted maximum likelihood estimation i nsafety analysis. J Clin Epidemiol, 66(8):S91-8. [PDF]

S. Lendle, M. Subbaraman, M.J. van der Laan. Identification and efficient estimation of the natural direct effect among the untreated. Biometrics, 69(2):310-17. [Link]

S. Rose. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol, 177(5):443-452. [PDF] ["Editor's Choice" article

J. Brooks, M.J. van der Laan, D. Singer, A. Go. Targeted minimum loss-based estimation of causal effect in right-censored survival data with time-dependent covariates: Warfarin, stroke, and death in atrial fibrillation. JCI, 1(2):235-54. [Link]

T. Haight, M.J. van der Laan, T. Manini, I. Tager. Direct effects of leisure-time physical activity on walking speed. J Nutr Health Aging, 17(8):666-73. [Link]

I. Díaz, M.J. van der Laan. Assessing the causal effect of policies: An example using stochastic interventions. Int J Biostat, 9(2):161-74. [Link]

S. Gruber, M.J. van der Laan. An application of targeted maximum likelihood estimation to the meta-analysis of safety data. Biometrics, 69(1):254-62. [Link]

M.J. van der Laan, M. Petersen, W. Zheng. Estimating the effect of a community-based intervention with two communities. JCI, 1(1):83-106. [Link]

I. Díaz, M.J. van der Laan. Sensitivity analysis for causal inference under unmeasured confounding and measurement error problems. Int J Biostat, 9(2):149-60. [Link]


K. Moore, R. Neugebauer, M.J. van der Laan, I. Tager. Causal inference in epidemiological studies with strong confounding. Stat Med, 31(13):1380-1404. [PDF]

P. Chaffee, M.J. van der Laan. Targeted maximum likelihood estimation for dynamic treatment regimes in sequentially randomized controlled trials. Int J Biostat, 8(1). [Link]

S. Gruber, M.J. van der Laan. Consistent causal effect estimation under dual misspecification and implications for confounder selection procedures. Stat Methods Med Res. [PDF]

A. Chambaz, P. Neuvial, M.J. van der Laan. Estimation of a nonparametric variable importance measure of a continuous exposure. Electon J Stat, 6:1059-99. [PDF]

D. Rubin, M.J. van der Laan. Statistical issues and limitations in personalized medicine research with clinical trials. Int J Biostat, 8(1). [Link]

P. Chaffee, M.J. van der Laan. Discussion of "Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer," by Wang et al. JASA, 107(498):513-517. [PDF]

C.W. Wester, O. Stitelman, V. deGruttola, H. Bussmann, R. Marlink, M.J. van der Laan. Effect modification by sex and baseline CD4+ cell count among adults receiving combination antiretroviral therapy in Botswana: Results from a clinical trial. AIDS Res Hum Retroviruses, 28(9):981-8. [PDF]

M.J. van der Laan, S. Gruber. Targeted minimum loss based estimation of causal effects of multiple time point interventions. Int J Biostat, 8(1). [Link]

I. Díaz Muñoz, M.J. van der Laan. Population intervention causal effects based on stochastic interventions. Biometrics, 68(2):541-549. [PDF]

G. Geeven, M.J. van der Laan, M. de Gunst. Comparison of targeted maximum likelihood and shrinkage estimators of parameters in gene networks. SAGMB, 11(5):Article 2. [Link]

M.J. van der Laan, S. Gruber. Targeted minimum loss based estimator that outperforms a given estimator. Int J Biostat, 8(1). [Link]

A. Malani, O. Bembom, M.J. van der Laan. Accounting for heterogenous treatment effects in the FDA approval process. Food Drug Law J, 67(1):23-50. [Link]

W. Zheng, M.J. van der Laan. Targeted maximum likelihood estimation of natural direct effects. Int J Biostat, 8(1):1-40. [Link]


M. MCullock, M. Broffman, M. van der Laan, A. Hubbard, et al. Colon cancer survival with herbal medicine and vitamins combined with standard therapy in a whole-systems approach: Ten-year follow-up data analyzed with marginal structural models and propensity score methods. Integr Cancer Ther, 10(3):240-59. [PDF]

K. Moore, R. Neugebauer, T. Valappil, M.J. van der Laan. Robust extraction of covariate information to improve estimation efficiency in randomized trials. Stat Med, 30(19):2389-2408. [PDF]

H. Wang, M.J. van der Laan. Dimension reduction with gene expression data using targeted variable importance measurement. BMC Bioinformatics, 12:312. [PDF]

M. Odden, I. Tager, M.J. van der Laan, J. Delaney, C. Peralta, R. Katz, M. Sarnak, B. Psaty, M. Shilpak. Antihypertensive medication use and change in kidney function in elderly adults: A marginal structural model analysis. Int J Biostat, 7(1). [Link]

C. Tuglus, M.J. van der Laan. Repeated measures semiparametric regression using targeted maximum likelihood methodology with application to transcription factor activity discovery. SAGMB, 10(1):2. [PDF]

I. Díaz, M.J. van der Laan. Super learner based conditional density estimation with application to marginal structural models. Int J Biostat, 7(1). [Link]

H. Wang, S. Rose, M.J. van der Laan. Finding quantitative trait loci genes with collaborative targeted maximum likelihood learning. Stat Probabil Lett, 81(7):792–796. [PDF] [Featured in issue editorial]

O. Stitelman, C.W. Wester, V. De Gruttola, M.J. van der Laan. Targeted maximum likelihood estimation of effect modification parameters in survival analysis. Int J Biostat, 7(1). [Link]

S. Rose, J. Snowden, K.M. Mortimer. Rose et al. respond to “G-computation and standardization in epidemiology.” Am J Epidemiol, 173(7):743–744. [PDF]

J. Snowden, S. Rose, K.M. Mortimer. Implementation of G-Computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol, 173(7):731–738. [PDF] [Evaluated by Faculty of 1000]

A. Chambaz, M.J. van der Laan. Targeting the optimal design in randomized clinical trials with binary outcomes and no covariate. Int J Biostat, 7(1):10. [PDF]

S. Rose, M.J. van der Laan. A targeted maximum likelihood estimator for two-stage designs. Int J Biostat, 7(1):17. [PDF]

K. Porter, S. Gruber, M.J. van der Laan, J. Sekhon. The relative performance of targeted maximum likelihood estimators. Int J Biostat, 7(1):31. [PDF]


S. Gruber, M.J. van der Laan. A targeted maximum likelihood estimation of a causal effect on a bounded continuous outcome. Int J Biostat, 6(1):26. [PDF]

M.J. van der Laan. Targeted maximum likelihood based causal inference: Part II. Int J Biostat, 6(2):3. [PDF]

M.J. van der Laan. Targeted maximum likelihood based causal inference: Part I. Int J Biostat, 6(2):2. [PDF]

M. Rosenblum, M.J. van der Laan. Targeted maximum likelihood estimation of the parameter of a marginal structural model. Int J Biostat, 6(2):19. [PDF]

O. Stitelman, M.J. van der Laan. Collaborative targeted maximum likelihood for time to event data. Int J Biostat, 6(1). [Link]

S. Gruber, M.J. van der Laan. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. Int J Biostat, 6(1):18. [PDF]

M.J. van der Laan, S. Gruber. Collaborative double robust targeted maximum likelihood estimation. Int J Biostat, 6(1):17. [PDF]


K. Moore, M.J. van der Laan. Increasing power in randomized trials with right censored outcomes through covariate adjustment. J Biopharm Stat, 19(6):1099-1131. [PDF]

K. Moore, M.J. van der Laan. Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation. Stat Med, 28(1):39-64. [PDF]

S. Rose, M. J. van der Laan. Why match? Investigating matched case-control study designs with causal effect estimation. Int J Biostat, 5(1):1. [PDF]

O. Bembom, M. Petersen, S-Y. Rhee, W. Fessel, S. Sinisi, R. Shafer, M.J. van der Laan. Biomarker discovery using targeted maximum likelihood estimation: Application to the treatment of antiviral-resistant HIV infection. Stat Med, 28(1):152-172. [Link]


S. Rose, M.J. van der Laan. Simple optimal weighting of cases and controls in case-control studies. Int J Biostat, 4(1):19. [PDF]

M.J. van der Laan. Estimation based on case-control designs with known prevalence probability. Int J Biostat, 4(1):18.  [Link]


S. Sinisi, E. Polley, M. Petersen, S-Y. Rhee, M.J. van der Laan. Super learning: An application to the prediction of HIV-1 drug resistance. SAGMB, 6(1):7. [PDF]

O. Bembom, M.J. van der Laan. A practical illustration of the importance of realistic individualized treatment rules in causal inference. Electron J Stat, 1:574-596. [PDF]

M.J. van der Laan, E. Polley, A. Hubbard. Super learner. SAGMB, 6(1). [Link]


M.J. van der Laan. Targeted maximum likelihood learning. Int J Biostat, 2(1). [Link]

M.J. van der Laan. Statistical inference for variable importance. Int J Biostat, 2(1). [Link]


C. Rudin, D. Dunson, R. Irizarry, H. Ji, E. Laber, J. Leek, T. McCormick, S. Rose, C. Schafer, M.J. van der Laan, L. Wasserman, L. Xue; A Working Group of the American Statistical Association (2014). Discovery with Data: Leveraging Statistics and Computer Science to Transform Science and Society.  [PDF] [Press Release] [Amstat News Article]


For the most up-to-date list of technical reports (i.e., preprints) please visit the bepress site.