Statement of the Problem:
One of the most common statistical procedures in quantitative social science research is to examine the association between a key predictor, X, and an outcome, Y, before and after adjusting for another predictor, Z. In some cases X represents a quasi-experimental treatment, Z is a pre-treatment covariate, and the aim is to estimate the extent to which the covariate accounts for the unadjusted difference between treatments. In other cases, Xis an exogenous variable known to be associated with the Y and the question is whether this association is "explained" by an endogenous Z viewed as mediating the X,Y relationship. In practice, most investigators simply "eyeball" the regression coefficient for X before and after adding Z. If the absolute value of that coefficient is reduced after adding Z, they infer that Z explains, at least in part, the relationship between X and Y. Clogg, Petkova, and Haritou (1995) and Allison (1995) advised researchers to take a more rigorous approach to this problem by computing standard errors, tests, and confidence intervals for the difference in the X, Y coefficient associated with adding Z. The general issue at hand, then, is "comparing regression coefficients between models."
Not surprisingly, the inferential issues involved in such comparisons have arisen frequently in data analyses contracted by the National Center for Educational Statistics (NCES). Concerned about the possible subjectivity associated with comparisons using the "eyeball" method, Susan Ahmed, Chief Statistician at NCES, asked the National Institute of Statistical Sciences (NISS) to consult with NCES on advice for contractors analyzing NCES data. Under the leadership of Ingram Olkin, a meeting was convened at NISS in October, 1996, to consider these issues, and a sub-group of participants volunteered to serve on a Task Force to write this report. The report is restricted to the case of a continuous or approximately continuous outcome as a first step in establishing standards. The procedures we recommend apply when there is no statistical interaction between X and Z. We illustrate by example how to test for such interactions and how to compare coefficients across models when no such interactions are found.
