For decades NCES has collected data on the state of education, nationally and internationally, via validated assessments, surveys, and collections of administrative data. Many NCS reports of these data focus on “significant” findings. The prime challenges facing NCES are: What to report as significant, how to report it, and how to explain it.
The National Center for Education Statistics (NCES) charged the National Institute of Statistical Sciences (NISS) with convening a panel of technical experts to focus on how significance of findings from data is reported in NCES publications, presentations of data summaries in a variety of forms on the website, and other citations of significance of NCES statistical summaries of NCES data that are produced by or for the Center.
The broad charge to the panel was to examine the representation of significance in recent NCES publications, and to deliberate the conceptual issues of defining significance prior to making recommendations to NCES. In addition, the panel was asked specifically to consider possible definitions of significance including the dichotomy (significant or not, p < 0.05) in current use for NCES reports. A second specific request was for the panel to consider possible publication practices and whether to restrict publication to significant findings (i.e., meeting the threshold definition, p<0.05). Because reports on multiple variables pose special problems, the panel was asked to review practices for handling multiple tests and to make recommendations for ensuring that quoted probabilities (p-values) are correct. A final request was for the panel to provide advice on effective communication of the meaning of a “significant finding” to a broad readership. The panel met in person in September 2018. This white paper is based on the panel’s report.
The panel’s discussion covered four areas: concepts of significance and importance, statistical issues, standards, and publication practices.
The overarching goal is to reduce the gap in information and in understanding between statisticians and policy makers and the lay public. Therefore, the panel encourages NCES to ensure that reports accurately reflect in full all the important complexities in the data. Recommendations follow, grouped by area.
Primary Recommendations: Significance and Importance
- Lead with magnitude of effect; follow with significance.
- Communicate importance in terms of magnitude and associated variance, probability (e.g., p-value, interval or other) and strength of evidence or sensitivity.
- Replace dichotomization and eliminate nebulous expressions (e.g., “substantially”) Statistics and Methodology
- Expand the collection of analytic methods employed to meet the needs for analysis and interpretation. In particular univariate methods used alone can be seriously misleading because unidimensional analyses cannot reflect interactions, clustering or differences in responses among subsets of the population.
NOTE: Multivariate analysis is often necessary for accurate interpretation of the data, but such an analysis does not imply causality.
- For multiple tests (or probability statements or intervals) indicate the required adjustments to calculated probabilities.
Planned and Exploratory Analyses
- Require an analytic plan at the outset that specifies analysis to be done and commits to full reporting of all planned analyses.
- Anticipate and allow exploratory analyses that are discretionary, but when reported are separated and clearly identified in the text, noting that probability calculated cannot be correct without adjustment for conditional decisions and multiplicity.
- Report analysis details and process to provide technical support for interpretations as supplemental material.
Standards and Guidelines
- Review and revise (as needed) Standards and Guidelines every 3 to 5 years with attention to relevant advances in statistical and technological methodology. Start with an immediate comprehensive review.
- Add one or more new Standards (and accompanying Guidelines) in each of the following areas. Seek external consultants with specific expertise where appropriate.
- Statistical graphics and data visualization
- Measuring and reporting model fit for survey and administrative data
- Require that submission of reports for review include specific response to each Standard or Guideline indicating “consider. . .”
- Write clearly but accurately so that information as interpreted by a broad readership will be consistent with deeper analyses of the data that support the reported results. • Ensure complete publication of results for all statistical analyses and include statistical methods employed (especially tests!).
- Disseminate reports at two levels by providing details of analyses including analytic process and supporting statistical information. For example, expand Data Point to supply deeper data analysis results by appending or linking to detail required by a more sophisticated reader or policymaker to validate methodology, results and conclusions or to make decisions.
- Indicate precision (and/or probability measure of significance) wherever data is presented – text, table, graph, other data visualization.
- Use technology wisely to link elaborations and detailed explanations, additional graphics or data visualizations, and important definitions to simple statements in online reports.
Note on Implementation
The expert panel recognizes that transitioning away from a threshold-based, single-variable-at-a-time conception of significance will require effort, expertise and time to accomplish. Attainable change will be a balance of feasibility in terms of resources (staff time, funding, etc.) with best practices; however, this does not change the urgency for moving forward.
Michael L. Cohen, Ph.D., Senior Program Officer for the Committee on National Statistics at the National Academies of Sciences, Engineering, and Medicine.
Jee-Seon Kim, Ph.D., Professor in the Department of Educational Psychology at the University of Wisconsin-Madison.
Finbarr “Barry” Sloane, Ph.D., Program Director in the Knowledge Building Cluster (EHR/DRL), Building Community and Capacity in Data Intensive Research in Education (BCC-EHR), Division of Research on Learning in Formal and Informal Settings (EHR/DRL) at the National Science Foundation.
Linda J. Young, Ph.D., Chief Mathematical Statistician & Director of Research and Development, USDA’s National Agricultural Statistics Service.
Nell Sedransk, Ph.D., Director, National Institute of Statistical Sciences-DC.