Session 2: Causal Inference for Big Health Data  

<<  Back to 2nd CANSSI-NISS Health Data Science Workshop Event Page

Location:  DC 1302  

Session 2 Chair: Richard Cook, University of Waterloo

2:00 pm - 2:30 pm:

An Encoding Approach to Causal Inference

Speaker: Debashis Ghosh, Colorado School of Public Health

Abstract: With the increasing availability of large-scale observational databases as well as electronic health record datasets, it is of paramount importance to have available methods for performing causal inference with large-scale datasets that are computationally feasible yet have a sound inferential framework.  In this talk, we propose the use of encoding methods for confounder adjustment.  These ideas are inspired by deep learning and relational databases.   The approach  leads to a simple estimation procedure as well as an attendant theoretical framework for estimating consistency and asymptotic normality of causal effects using martingale theory.  Further, the approach unifies many existing and introduces new strategies for confounder adjustment.   In addition, this framework allows for complex forms of confounding.  Some applications using real-data examples from the Veteran’s administration (VA) will be provided.  This is joint work with Ted Warsavage.  


2:30 pm - 3:00 pm:

Robust estimation of the effect of intervening variable in settings with unmeasured confounding

Speaker: Lan Wen, University of Waterloo

Abstract: This work is motivated by two major threats to valid causal inference in the age of big data: unmeasured confounding and ill-defined interventions. We present new results on average causal effects in settings with unmeasured exposure-outcome confounding. Our proposed causal estimands are motivated by practical questions corresponding to an average causal effect of an intervening variable, a manipulable descendant of the exposure that precedes the outcome. We argue that effects of intervening variables are replicable and have clear policy implications, and furthermore are non-parametrically identified by the classical frontdoor formula without assuming cross-world conditions or ill-defined exposure interventions. Semiparametric estimators that guarantee sample bounds are constructed, with extensions to incorporate machine learning algorithms. New theoretical results are illustrated using data from the National Health and Nutrition Examination Survey.


3:00 pm - 3:30 pm:

Test-Then-Pool: A uniformly valid inferential framework for data integration 

Speaker: Shu Yang, North Carolina State University

Abstract: Parallel randomized clinical trial (RCT) and real-world data (RWD) are becoming increasingly available for treatment evaluation. Test-Then-Pool (TTP) analysis of RCT and RWD is a natural idea for accurate and robust estimation of the treatment effect. When the RWD are not subject to bias, e.g., due to hidden confounding, our approach combines the RCT and RWD for optimal estimation. Utilizing the design advantage of RCTs, we construct a built-in test procedure to gauge the reliability of the RWD and decide whether or not to use RWD in an integrative analysis. The TTP estimator belongs to pre-testing estimators and is non-regular. Consequently, standard fixed-parameter asymptotics provides a poor approximation to the finite sample distribution. We resort to local-parameter asymptotics to faithfully capture non-regularity as the sample size grows large. Finally, we construct an adaptive confidence interval that has a good finite-sample coverage property. We apply the proposed method to characterize who can benefit from adjuvant chemotherapy in patients with stage IB non-small cell lung cancer based on RCT and RWD cohorts. 

Paper #1:  S. Yang, C. Gao, X. Wang, and D. Zeng (2022). Elastic integrative analysis of randomized trial and real-world data for treatment heterogeneity estimation. Journal of the Royal Statistical Society: Series B.
Paper #2:   C. Gao and S. Yang (2023). Pretest estimation in combining probability and non-probability samples. Electronic Journal of Statistics.

3:30 pm - 4:00 pm:

Vaccine effectiveness estimation under the test-negative design: identifiability and efficiency theory for causal inference under conditional exchangeability

Speakers: Mireille Schnitzer and Cong Jiang, Université de Montréal 

Abstract: The test-negative design (TND) is routinely used for the monitoring of seasonal flu vaccine effectiveness and recently become integral to COVID-19 vaccine surveillance. Distinct from the case-control study, it recruits participants with a common symptom presentation and tests them for the target infection. Positive tests are considered "cases," while negative tests are "controls." Logistic regression has traditionally adjusted for confounders to estimate vaccine effectiveness in TND. However, it may be biased if effect modification by a confounder exists. We first review an inverse probability of treatment weighting estimator for the marginal risk ratio that is valid under effect modification but requires parametric modeling for vaccination probability. To address this limitation, we propose a novel doubly robust and efficient estimator of the marginal risk ratio.  We theoretically and empirically demonstrate the parametric convergence rates achieved through machine learning of the nuisance functions.

About the Speakers

Debashis Ghosh, Colorado School of Public Health
Chair (BIOS Dept), Professor; Department of Biostatistics & Informatics

Profile Link:

Debashis Ghosh is the chair of the Department of Biostatistics and Informatics, one of the five departments within the Colorado School of Public Health. Ghosh is responsible for planning, managing and implementing the academic, service and research initiatives of the department including collaborations between biostatistics faculty and students, and with the school’s institutional, public and community health research partners in the region. He also is an associate director of the Center for Biomedical Informatics and Personalized Medicine on the Anschutz Medical Campus. Prior to arriving at CU Anschutz, Ghosh was professor of Statistics and Public Health Sciences at Penn State University where he worked with the Biostatistics and Epidemiology Research Design Group, and he was an affiliate investigator at the Methodology Center and co-director of the Computation, Bioinformatics and Statistics (CBIOS) Training Grant. Ghosh’s research interests involve two tracks: modeling of genomic data and research methodology in biostatistics. The former has dealt primarily with applications to oncology, and Ghosh was involved with the development of the statistical methods for ONCOMINE, an online data-mining platform used in cancer research and genetics. He has published more than 150 peer-reviewed articles, commentaries and book chapters in statistical and scientific literature. He is a Fellow of the American Statistical Association and was recently honored with the 2013 Mortimer Spiegelman Award for outstanding early career statistical contributions to public health. Currently, he also serves as chair of the Biostatistical Methods and Research Design (BMRD) Study Section for the National Institutes of Health. Before he joined Penn State, Ghosh was an assistant and associate professor in the Department of biostatistics at the University of Michigan, and he received his Ph.D. in biostatistics from the University of Washington in 2000.


Lan Wen, University of Waterloo
Assistant Professor in the Department of Statistics and Actuarial Science

Profile Link:

Lan Wen is an Assistant Professor at University of Waterloo in the Department of Statistics and Actuarial Science. She completed her PhD in Biostatistics at the University of Cambridge and her postdoc fellowship at Harvard University. Her research area is in development of statistical methods in causal inference using observational studies where complications arise due to model misspecification, time-varying confounding, and censoring/missing data. 


Shu Yang, North Carolina State University
Associate Professor, Department of Statistics

Profile Link:

Shu Yang is an Associate Professor in the Department of Statistics at North Carolina State University. She graduated from Iowa State University in 2014 with major in Mathematics and co-major in Statistics working with J.K. Kim and Z. Zhu. After graduation, she joined Harvard TH Chan School of Public Health as a post-doc with Judith Lok. She then joined NC State as a faculty member since 2016. She was promoted to Associate Professor in 2021 and became a Goodnight Early Career Innovator and a University Faculty Scholar in the same year. Her research interests include causal inference from longitudinal observational data; semiparametric efficient estimation; missing data analysis and imputation methods; spatial data analysis, nonstationary process and spectral methods; survey sampling and methodology.


Mireille Schnitzer, Université de Montréal

Profile Link:

Mireille Schnitzer is an Associate Professor of Biostatistics at the Université de Montréal. She holds a Canada Research Chair in Causal Inference and Machine Learning in Health Science. Mireille received her PhD in Biostatistics from McGill University in 2012 and was a postdoctoral researcher at the Harvard T.H. Chan School of Public Health in 2013. Mireille's current research interests are causal inference methodology in pharmacoepidemiology, semiparametric efficient estimation in longitudinal and survival settings with an emphasis on targeted maximum likelihood estimation, and individual participant data meta-analysis. Mireille currently holds an NSERC Discovery Grant and a CIHR Project Grant as PI, and is a co-investigator on multiple CIHR-funded health studies.


Cong Jiang, Université de Montréal

Profile Link:

Cong Jiang is a postdoctoral researcher at the Université de Montréal under the supervision of Mireille Schnitzer and Denis Talbot.  With a PhD from the University of Waterloo in 2022, under the supervision of Mary Thompson and Michale Wallace, he specializes in developing methods for dynamic treatment regimes with interference. Currently, his research focuses on machine learning and nonparametric efficiency within causal inference.


<<  Back to 2nd CANSSI-NISS Health Data Science Workshop Event Page