About
The NISS AI, Statistics and Data Science in Practice is a monthly event series that brings together leading experts from industry and academia to discuss the latest advances and practical applications in AI, data science, and statistics.
Each session will feature a keynote presentation on cutting-edge topics, where attendees can engage with speakers on the challenges and opportunities in applying these technologies in real-world scenarios. This series is intended for professionals, researchers, and students interested in the intersection of AI, data science, and statistics, offering insights into how these fields are shaping various industries. The series is designed to provide participants with exposure to and understanding of how modern data analytic methods are being applied in real-world scenarios across various industries, offering both theoretical insights, practical examples, and discussion of issues.
During Spring 2026, from January through May 2026, the series will focus on large language models (LLMs) and the statistical and methodological foundations required to develop, evaluate, and deploy them responsibly and effectively. As LLMs become central to a wide range of scientific, industrial, and societal applications, careful attention to data generation, model training, evaluation, and inference is essential to ensure reliability, robustness, and transparency. As LLMs become increasingly central to scientific research, industry workflows, and societal decision-making, rigorous attention to how training data are constructed, curated, and sampled is critical for understanding model behavior and limitations. The series will highlight methodological considerations in model training and fine-tuning, including sources of bias, variability, and uncertainty, as well as principled approaches to benchmarking and evaluation that move beyond surface-level performance metrics. Emphasis will be placed on transparent and reproducible evaluation frameworks that support meaningful comparisons across models and use cases, and on statistical perspectives that help clarify what LLM outputs do and do not represent. By grounding discussions of LLM development and deployment in sound statistical reasoning, the series aims to promote more reliable, interpretable, and trustworthy language models in practice.
See full list of featured topics (also below)
Featured Topics:
- Veridical Data Science - Speaker: Bin Yu, October 15,2024
- Random Forests: Why they Work and Why that’s a Problem - Speaker: Lucas Mentch, November 19, 2024
- Causal AI in Business Practices - Speakers: Victor Lo, and Victor Chen, January 24, 2025
- Large Language Models: Transforming AI Architectures and Operational Paradigms - Speaker: Frank Wei, February 18, 2025
- Machine Learning for Airborne Biological Hazard Detection - Speaker: Jared Schuetter, March 11, 2025
- Trustworthy AI in Weather, Climate, and Coastal Oceanography - Speaker: Dr. Amy McGovern, May 13, 2025
- Sequential Causal Inference in Experimental or Observational Settings - Speaker: Aaditya Ramdas, August 26, 2025
- Covariate Adjustment, Intro to Resampling, and Surprises - Speaker: Tim Hesterberg, October 3, 2025
- Bayesian Geospatial Approaches for Prediction of Opioid Overdose Deaths Utilizing the Real-Time Urine Drug Test - Speaker: Joanne Kim, November 18, 2025
- COVID-19 Focused Cost-benefit Analysis of Public Health Emergency Preparedness and Crisis Response Programs - Speaker: Nancy McMillan, December 11, 2025
- LabOS: The AI-XR Co-Scientist That Reasons, Sees and Works With Humans - Speaker: Mengdi Wang, January 20, 2026
- From LLMs to World Foundation Models & Robotics: The Next Frontier of Artificial Intelligence - Speaker: Robert Clark, February 24, 2026
- Recent Advances in the Statistical Foundations of Large Language Models - Speaker: Weijie Su, March 17, 2026
- Ai, Statistics & Data Science in Practice Webinar - April 17, 2026 - Speaker: Anastasios N Angelopoulos, April 17, 2026
Upcoming Webinars in Series
|

|
Speaker: Weijie Su | Tuesday, March 17, 2026 - 12:00pm to 1:30pm ET Abstract: In this talk, we advocate for the development of rigorous statistical foundations for large language models (LLMs). We begin by elaborating two key features that motivate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the complexity and black box nature of Transformer architectures. To illustrate how statistical insights can directly benefit LLM development and applications, we present two concrete examples. First, we introduce a novel statistical framework to analyze the efficiency of watermarking schemes, with a focus on a watermarking scheme developed by OpenAI for which we derive optimal detection rules that outperform existing ones. Second, we demonstrate statistical inconsistencies and biases arising from the current approach to aligning LLMs with human preference. We propose a regularization term for aligning LLMs that is both necessary and sufficient to ensure consistent alignment. Collectively, these findings showcase how statistical insights can address pressing challenges in LLMs while simultaneously illuminating new research avenues for the broader statistical community to advance responsible generative AI research. This talk is based on arXiv:2404.01245, 2405.16455, 2503.10990, and 2510.22007.
Register on Zoom
|
Previous Webinars + Recordings
|

|
Speaker: Professor Bin Yu | October 15, 2024 Abstract: The rapid advancement of AI relies heavily on the foundation of data science, yet its education significantly lags its demand in practice. The upcoming book 'Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making' (Yu and Barter, MIT Press, 2024; free online at www.vdsbook.com) tackles this gap by promoting Predictability, Computability, and Stability (PCS) as core principles for trustworthy data insights. PCS for veridical data science (VDS) has been developed in the process of solving scientific data science problems. It thoroughly integrates these principles into the Data Science Life Cycle (DSLC), from problem formulation to data cleansing and to result communication, fostering a new standard for responsible data analysis. This talk explores PCS' motivations, and compare the VDS book appraoch with traditional ones. Then I will describe two PCS projects on prostate cancer detection and discovery of epistastic genetic drivers for a heart disease. I will end with on-going work for PCS uncertainty quantificaiton in regression and its comparison with conformal prediction, PCS software pacakages (v-flow, simChe), and MEIRTS guidelines for data-inspired simulations.
Watch Recording
|

|
Speaker: Lucas Mentch | November 19, 2024 Abstract: Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success, a full and satisfying explanation for their success has yet to be put forth. In this talk, we will show that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. From a model-complexity perspective, this means that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicit regularization procedures like the lasso. Realizing this, we demonstrate that alternative forms of randomness can provide similarly beneficial stabilization. In particular, we show that augmenting the feature space with additional features consisting of only random noise can substantially improve the predictive accuracy of the model. This surprising fact has been largely overlooked within the statistics community, but has crucial implications for thinking about how best to define and measure variable importance. Numerous demonstrations on both real and synthetic data are provided.
Watch Recording
|


|
Speakers: Victor Lo, and Victor Chen | January 24, 2025 This webinar will explore the growing role of causal AI in uncovering cause-and-effect relationships within complex systems. The session will highlight how causal AI differs from traditional predictive models, emphasizing its potential to improve decision-making across various domains. Attendees will gain insights into techniques for measuring the impact of interventions and understanding causal mechanisms. Broader examples will illustrate its application in optimizing strategies and enhancing outcomes. Key challenges, such as data reliability and model validation, will also be explored. The webinar will conclude with practical guidance on leveraging causal AI in dynamic and high-impact settings.
Watch Recording
|
 |
Speaker: Frank Wei | February 18, 2025 Abstract: The emergence of Large Language Models (LLMs) represents a paradigm shift in artificial intelligence, fundamentally transforming our approach to natural language processing and machine learning architectures. In this presentation, we will navigate through the evolutionary trajectory of LLMs, beginning with their historical foundations and theoretical underpinnings that have shaped the current landscape of AI. We will then delve into the architectural intricacies of transformer-based models, examining their self-attention mechanisms, positional encodings, and multi-head architectures that enable unprecedented language understanding and generation capabilities. As we explore the transformative impact of LLMs on traditional machine learning paradigms, we will analyze the evolution from conventional ML to LLM, highlighting the specialized operational frameworks, deployment strategies, and infrastructure requirements that distinguish these approaches. This transition encompasses novel considerations in computational orchestration, model versioning, prompt engineering, and systematic evaluation methodologies. We will critically examine how these operational paradigms are reshaping feature engineering, model architectures, and deployment pipelines in AI systems. To demonstrate these theoretical and operational principles in practice, we will conclude with a demonstration of our innovative LLM-based solution, illustrating how sophisticated architectural designs and robust operational frameworks converge to address complex real-world challenges.
Watch Recording
|
 |
Speaker: Jared Schuetter | March 11, 2025 Abstract: Biological aerosols are small particles that are pervasive in our environment, airborne and mobile due to their size and frequently inhaled by those who come into contact with them. In most cases, these particles are innocuous, but in some scenarios these aerosols can generate adverse health events when encountered in large enough doses. This can happen with natural spread of a disease (e.g., influenza or the recent COVID pandemic), growth of an organism within a hospitable environment (e.g., mold), or an intentional release (e.g., dissemination of biological warfare agents like anthrax). Detection and identification of such threatening biological aerosols is important for ensuring the safety of a vulnerable population, and a field of sensor development has grown around this need. These sensors produce data, and the data must be analyzed by data scientists!
In this talk, we will discuss the development effort for one such device, Battelle's Resource Effective Bioidentification System (REBS), focusing on how the sensor works, what data it produces, what issues the team ran into during the development process, and how those issues were resolved. No background in this domain is expected and efforts will be made to explain the concepts involved. Unfortunately, there may also be some dad jokes involved, so if you are looking for an entertaining talk, don't hold your breath.
Watch Recording
|
 |
Speaker: Amy McGovern | May 13, 2025 Abstract: In this talk, Dr. Amy McGovern provided an overview of the work being conducted at the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES). As AI applications in weather forecasting continue to grow exponentially, there is a critical need to ensure these methods are trustworthy before they are widely deployed. Dr. McGovern will present a broad summary of the institute’s research and delve more deeply into efforts focused on understanding and building trust in AI for weather-related applications. She highlighted key findings on how forecasters perceive trust in AI systems and how these perceptions influence the AI development lifecycle. The presentation concluded with a discussion of Extreme Weather Bench, a novel framework for verifying the trustworthiness of AI models in extreme weather scenarios.
Watch Recording
|
|
Fall 2025 Theme: Experimental Design
|
|
During Fall 2025, the Ai, Statistics and Data Science in Practice Series focused on the critical role of experimentation in the development and refinement of artificial intelligence (AI) systems: "Incorporating principles of design of experiments and randomization ensures that AI models are trained on reliable, unbiased data, leading to more generalizable and interpretable results. By planning data collection with experimental design and randomization, researchers can minimize bias from uncontrolled variables and improve the statistical validity of their conclusions, whether the models are inferential or predictive. However, in many real-world scenarios, fully controlled experiments may not be feasible. When working with observational data, researchers can employ quasi-experimental techniques to approximate the benefits of randomized trials. These methods help isolate the effects of key variables and adjust for potential confounders, improving the robustness of AI-driven insights. By integrating structured experimentation and causal inference methodologies, AI developers can enhance the reliability and applicability of their models in practice.
|
 |
Speaker: Tim Hesterberg | October 3, 2025 Abstract: Covariate adjustment is a relatively simple way to improve A/B test accuracy and velocity, and to reduce bias. Huh? Aren't randomized experiments unbiased? Well, yes and no - yes by the usual statistical definition, but no by a more common-sense definition. Furthermore, in practice there are some complications, particularly related to measuring how accurate the results are. Enter resampling (bootstrap, jackknife) - generally useful techniques for measuring accuracy. These bring their own surprises. Remember the old n >= 30 rule for using normal approximations? We'll see just how bad that is.
Watch Recording
|
 |
Speaker: Joanne Kim | November 18, 2025 Abstract: Join us for the next session of the NISS AI, Statistics, and Data Science in Practice webinar series, featuring Joanne Kim, PhD, Assistant Professor-Clinical, Biomedical Informatics at The Ohio State University, will discuss the application of Bayesian geospatial modeling for predicting opioid overdose deaths. As the opioid crisis continues to impact communities nationwide, leveraging real-time data sources with geospatial modeling can provide critical insights into geographic and population-level risk factors. This study utilizes the urine drug test (UDT) as a real-time proxy for overdose death prediction to explain the spatiotemporal characteristics of the drug overdose trend. This presentation will explore how advanced modeling techniques can enhance surveillance efforts, inform public health interventions, and improve policy decision-making.
Recording Coming Soon!
|
 |
Speaker: Nancy McMillan | December 11, 2025 Background: The United States (US) Centers for Disease Control and Prevention (CDC) plays a crucial role in supporting state, local, and territorial governments through the Public Health Emergency Preparedness (PHEP) cooperative agreement program. During the COVID-19 pandemic, supplemental funding was available to bolster response efforts through the Public Health Crisis Response (PHCR) cooperative agreement. PHEP and PHCR program implementation data was used to evaluate the COVID-19 response effectiveness through a cost-benefit analysis.
Methods: Annual workplans and progress reports provided significant components of the program implementation information for both PHEP and PHCR. Natural language processing was used to recode recipient workplans, which allowed us to standardize common implementation across recipients. Path analysis and lasso regression models were used to assess the relationship between reported activities and outcomes. These methods addressed the issue of handling a big-p (activities), little-n (recipients) problem. Outcomes assessed included time to implement control measures, availability of COVID-19 therapeutics, COVID-19 tests and vaccines administered, and hospital bed availability. The benefits associated with specific implementation decisions (funding allocation, planned activities, and outputs) were estimated for statistically significant relationships.
Results: Activities and outputs were associated with faster non-essential business closures, earlier implementation of mask mandates, more frequent reporting to the public, more COVID-19 test administration, and larger availability of hospital beds and COVID-19 therapeutics during surges. Additionally, funding allocations for 4 of the 6 preparedness capability domain areas (countermeasures and mitigation, incident management, information management, and surge management) were associated with the ability to administer more COVID-19 tests and vaccines and increased hospital bed availability during peak surges.
Conclusions: PHEP and PHCR funding had measurable positive effects on recipients’ ability to respond to the COVID-19 pandemic effectively. Ongoing efforts in specific areas of public health emergency preparedness will improve future responses to COVID-19-like events.
Recording Coming Soon!
|
|
Spring 2025 Theme: Large Language Models (LLMs)
|
|
During Spring 2026, from January through May 2026, the series will focus on large language models (LLMs) and the statistical and methodological foundations required to develop, evaluate, and deploy them responsibly and effectively. As LLMs become central to a wide range of scientific, industrial, and societal applications, careful attention to data generation, model training, evaluation, and inference is essential to ensure reliability, robustness, and transparency. As LLMs become increasingly central to scientific research, industry workflows, and societal decision-making, rigorous attention to how training data are constructed, curated, and sampled is critical for understanding model behavior and limitations. The series will highlight methodological considerations in model training and fine-tuning, including sources of bias, variability, and uncertainty, as well as principled approaches to benchmarking and evaluation that move beyond surface-level performance metrics. Emphasis will be placed on transparent and reproducible evaluation frameworks that support meaningful comparisons across models and use cases, and on statistical perspectives that help clarify what LLM outputs do and do not represent. By grounding discussions of LLM development and deployment in sound statistical reasoning, the series aims to promote more reliable, interpretable, and trustworthy language models in practice.
|
 |
Speaker: Mengdi Wang | January 20, 2026 Abstract: Modern science advances fastest when thought meets action. LabOS represents the first AI co-scientist that unites computational reasoning with physical experimentation through multimodal perception, self-evolving agents, and Extended-Reality(XR)-enabled human-AI collaboration. By connecting multi-model AI agents, smart glasses, and robots, LabOS allows AI to see what scientists see, understand experimental context, and assist in real-time execution. Across applications -- from cancer immunotherapy target discovery to stem-cell engineering and material science -- LabOS shows that AI can move beyond computational design to participation, turning the laboratory into an intelligent, collaborative environment where human and machine discovery evolve together.
Recording Unavailable (Declined Consent)
|

|
Speaker: Robert Clark | Tuesday, February 24, 2026 - 12:00pm to 1:30pm ET Abstract: Artificial Intelligence has grown exponentially over the past several years and continues to make its way into our everyday lives. While large language models have proven quite capable in a wide range of tasks from code generation to solving complex math problems, creative writing, summarization, and more, we are at an inflection point in the AI world as the next frontier of models are being produced. In this session, we will discuss a brief history of LLMs and how they are currently used in industry as tools to aid productivity. We will then transition to the next domains for AI models to learn, such as world foundation models and robotics, including some of the challenges that need to be solved in this space.
Recording Coming Soon!
|