The discussion continues! Nearly 350 participants attended the third in a series of webinars hosted by NISS on the topic of the use of p-values.
Three distinguished statisticians provided their insights for the use of p-values in their areas of research -- where p-values are used in decision-making and where multiplicity adjustment in the frequentist sense is a core consideration for drawing inference. The speakers for this webinar were Yoav Benjamini, the Nathan and Lily Silver Professor of Applied Statistics at the Department of Statistics and Operations Research at Tel Aviv University, Alicia Laura Carriquiry, Distinguished Professor of Statistics at Iowa State University, and Hsien-Ming James Hung, Director of Division of Biometrics I, Office of Biostatistics, Office of Translational Sciences, Center for Drug Evaluation and Research (CDER), U.S. Food and Drug Administration (FDA).
Setting the stage by reviewing the cry for reproducibility and replicability Yoav Benjamini began the webinar in a distinctive fashion by providing evidence that the use of p-values is undergoing a “misguided attack.” He then directed his remarks to answer the question, “Is it the p-values’ fault?” As a next, logical step, Yoav then addressed the problems inherent with selective inference, which in turn led to a discussion of how these issues are being handled in clinical trials and Bayesian methods. He ended his comments by stating that while ignoring the selective inference evidence in the published work is the current status in many branches of science, “Sweeping the p-values under the rug worsens the situation.”
Alicia Carriquiry’s perspective was very different. She demonstrated how forensic scientists need to take a different approach to evaluating the basic hypothesis, “innocent until proven guilty.” In the first part of her talk she focused on the areas of forensic science where p-values should NOT be used. She walked through an example where glass fragments are involved as evidence in a crime, demonstrating the problems that exist when using hypothesis testing and p-values in this legal context. Essentially the hypotheses end up being backwards making the p-value useless. In the second part of her talk, she provided examples of where p-values are useful. One example she provided was a case involving the prediction of whether a person who is released on bail will show up at trial, i.e., the use of classification-type algorithms of risk assessment tools.
James Hung, the final speaker of the day, directed his comments at the role of multiplicity adjustments within regulatory applications. After a quick review of the drug development process, James zeroed in on clinical trials for proving efficacy where relationships among the hypotheses are often complex. His explanation demonstrated how the p-value is often used to assess the strength of statistical evidence against the null hypothesis in individual trials when multiplicity control is applied to screen out the null hypotheses for the purpose of regulatory decision and labeling. He concluded his part of the session by providing an overview of the challenges that remain in this area, posing questions that researchers need to consider.
A variety of thought-provoking questions followed the talks that demonstrated involvement of the attendees. Question topics included: “Are Equivalence Tests for determining no difference in means used in forensic science?”, “What do you think of Holm-Bonferroni procedure?” and “RA Fisher proposed pooling p-values from multiple experiments in his writings. Has that idea been considered by FDA?” Review the recording of this session so that you can see what other questions were asked and how the speakers responded.
The webinar was moderated by James Rosenberger, Professor Emeritus of Statistics at Penn State University, and Director of NISS.
Recording of the Session
Slides Used by the Speakers
Alicia Laura Carriquiry
Hsien-Ming James Hung