Accounting for Missing Data in Educational Surveys

Research Project

The National Center for Education Statistics (NCES) charged the National Institute of Statistical Sciences (NISS) with convening a panel of technical experts to consider the issues of accounting for missing data in educational surveys. In particular, the panel was asked to address the following questions:

  1. Should we analyze and report on datasets for which we have low response rates? What steps should be taken when response rate goals are not met? How should a nonresponse bias study be conducted?
  2. How should nonresponse be measured? Should weighting be used in computing response rates? Can the measurement process be made comparable across all surveys? How do we report response rates for surveys involving screening or several rounds of followup? Should we compound conditional response rates? How do we define a complete case? How do we report response rates when nonrespondents are replaced by substitutes?
  3. Should NCES generally adopt imputation methods in addition to adjusting for unit nonresponse? Should multiple imputation methods be utilized? What are the cost and practical limitations?
  4. Should NCES set minimum response rate standards? If so, should they be the same for future surveys in the planning and design stage? What should they be when addressing public release of an existing data set? Should they be the same in both cases?

Summary and Recommendations

1. Evaluating Nonresponse Bias

Nonresponse bias evaluation should be an integral part of the quality evaluation for all NCES surveys. The extent of the evaluation should be scaled to the seriousness of the nonresponse level based on initial evaluations. Several methods of evaluating nonresponse bias may be employed, ranging from a simple comparison of known characteristics for respondents and nonrespondents to conducting a sample-based followup of nonrespondents on key items. The more intensive methods (followup of nonrespondents) should be implemented when the potential or projected bias is large.

Continue to apply nonresponse adjustment factors at the unit level based on weighting classes, poststratification to known totals, response propensity modeling, or a combination of such techniques, as these are generally effective for reducing nonresponse bias when applied judiciously. For the key items at a minimum, adopt item imputation strategies based on relationships of missing survey characteristics to reported characteristics. Many methods are available for item imputation including matched donor methods (e.g., hot deck) and model-based methods which utilize reported data to predict missing data. Properly conducted, item imputation should also be effective in reducing nonresponse bias. Consider multiple imputation methods to better assess the total error of estimates based on partially imputed data.

2. Measuring Nonresponse

Recognize that the response rate is itself a survey estimate based on the particular sample and the base weights applied to that sample.

Continue to use response rates which incorporate the basic weights at the level of the unit of analysis. Apply base weights at the screening unit level for the screening rate component and base weights at the analysis unit level for the conditional response rate. Express the overall response rate as a product of rates. Technical documentation should include not only the overall response rates, but all unweighted and weighted counts that entered into the computation of each unconditional or conditional response rates.

For the rare cases when matching rather than probability selection approaches are used to substitute for nonrespondents, base the reported response rate on the initial sample only. The response rate for the substitutions should be reported separately to give an indication of the amount of substitution that was used.

If reasonable models for improved imputation of eligibility can be developed, use them to allocate unknown cases to eligible and ineligible categories (an elaboration of Standard 2 of NCES Standard III-02-92).

3. Imputation and Multiple Imputation

Item imputation methods are widely used in government surveys, including NCES surveys.

Continue to use item imputation methods because they can be made effective in reducing nonresponse bias.

In the past, lacking a better alternative, analysts have often treated the imputed values as reported values; however this leads to substantial underestimation of standard errors computed from the data if the amount of missing data is sizeable. Several approaches have been developed and more are being developed to properly estimate the standard errors when data are partially imputed.

The panel is not prepared to recommend a single methodology for NCES to apply routinely, but nonetheless does recommend using a standard error estimation approach which recognizes that data have been imputed.

4. Setting Standards

NCES has taken an important step in developing a Statistical Standards document to guide its statistical activities. These standards should support a process for improving response rates and for improving analytic methods used to deal with nonresponse in all NCES surveys.

There is a danger in setting exact levels of response as a standard because there may be a tendency to be complacent when that level is achieved rather than to strive for continuous improvement in response coverage and the consequent reduction in potential nonresponse bias. Any standards set for individual surveys should be high but within reasonable expectations based on actual experience in similar surveys. A single standard for all surveys does not appear feasible.

Project Goal: 

To consider the issues of accounting for missing data in educational surveys.

Research Team: 

Workshop Participants

Johnny Blair - University of Maryland
James Chromy - Research Triangle Institute
Richard Jaeger - University of North Carolina at Greensboro
Lyle V. Jones - University of North Carolina at Chapel Hill
Graham Kalton - Westat
Roderick Little - University of Michigan
Ingram Olkin - Stanford University
Valerie S. L. Williams - North Carolina Central University

National Center for Education Statistics

Dennis Carroll
Pascal Forgione
Daniel Kasprzyk
Marilyn McMillen
Martin Orland
Gary Phillips

Education Statistics Services Institute

Karol Krotki

Workshop convened by National Institute of Statistical Sciences

Jerome Sacks - National Institute of Statistical Sciences