Abstract:
Propensity score matching is often used in observational studies to create treatment and control groups with similar distributions of observed covariates. Typically, propensity scores are estimated using logistic regressions that assume linearity between the logistic link and the predictors. When the actual assignment mechanism is not governed by linearity, matching on the poorly estimated propensity scores might not produce groups with similar covariate distributions. In this paper, we evaluate the use of generalized additive models (GAMs), which use flexible rather than linear functions of the predictors, for estimating propensity scores. Using empirical studies, we compare GAMs to logistic regressions in terms of balancing covariate distributions when matching on estimated propensity scores. We find that, when the distributions of covariates in the treatment and control groups overlap sufficiently, using GAMs can improve overall covariate balance, especially for higher order moments and fine features of distributions. When the distributions in the two groups overlap insufficiently, GAMs more clearly reveals this fact than logistic regression do.
Keywords:
Causal inference; Logistic regression; Observational study.
