About
The NISS AI, Statistics and Data Science in Practice is a monthly event series will bring together leading experts from industry and academia to discuss the latest advances and practical applications in AI, data science, and statistics. Each session will feature a keynote presentation on cutting-edge topics, where attendees can engage with speakers on the challenges and opportunities in applying these technologies in real-world scenarios. This series is intended for professionals, researchers, and students interested in the intersection of AI, data science, and statistics, offering insights into how these fields are shaping various industries. The series is designed to provide participants with exposure to and understanding of how modern data analytic methods are being applied in real-world scenarios across various industries, offering both theoretical insights, practical examples, and discussion of issues.
This year’s series, during Fall 2025, will focus on the critical role of experimentation in the development and refinement of artificial intelligence (AI) systems: "Incorporating principles of design of experiments and randomization ensures that AI models are trained on reliable, unbiased data, leading to more generalizable and interpretable results. By planning data collection with experimental design and randomization, researchers can minimize bias from uncontrolled variables and improve the statistical validity of their conclusions, whether the models are inferential or predictive. However, in many real-world scenarios, fully controlled experiments may not be feasible. When working with observational data, researchers can employ quasi-experimental techniques to approximate the benefits of randomized trials. These methods help isolate the effects of key variables and adjust for potential confounders, improving the robustness of AI-driven insights. By integrating structured experimentation and causal inference methodologies, AI developers can enhance the reliability and applicability of their models in practice.
Featured Topics:
- Veridical Data Science - Speaker: Bin Yu, October 15,2024
- Random Forests: Why they Work and Why that’s a Problem - Speaker: Lucas Mentch, November 19, 2024
- Causal AI in Business Practices - Speakers: Victor Lo, and Victor Chen, January 24, 2025
- Large Language Models: Transforming AI Architectures and Operational Paradigms - Speaker: Frank Wei, February 18, 2025
- Machine Learning for Airborne Biological Hazard Detection - Speaker: Jared Schuetter, March 11, 2025
- ML and Bayesian geospatial approaches for prediction of opioid overdose deaths - Speaker: Soledad Fernández, POSTPONED
- Trustworthy AI in Weather, Climate, and Coastal Oceanography - Speaker: Dr. Amy McGovern, May 13, 2025
- Sequential Causal Inference in Experimental or Observational Settings - Speaker: Aaditya Ramdas, August 26, 2025
Upcoming Webinars in Series
Sequential Causal Inference in Experimental or Observational Settings
ML and Bayesian geospatial approaches for prediction of opioid overdose deaths
Speaker: Soledad Fernández | POSTPONED
Join us for the next session of the NISS AI, Statistics, and Data Science in Practice webinar series, featuring Dr. Soledad Fernández, Distinguished Professor and Division Chief of Biostatistics and Population Health at The Ohio State University. Dr. Fernández also serves as the Director of the Center for Biostatistics in the Department of Biomedical Informatics, College of Medicine. In this talk, Dr. Fernández will discuss the application of machine learning (ML) and Bayesian geospatial modeling for predicting opioid overdose deaths. As the opioid crisis continues to impact communities nationwide, leveraging statistical and AI-driven approaches can provide critical insights into geographic and population-level risk factors. This presentation will explore how advanced modeling techniques can enhance surveillance efforts, inform public health interventions, and improve policy decision-making.
Previous Webinars + Recordings
Veridical Data Science
Speaker: Professor Bin Yu | October 15, 2024
Abstract: The rapid advancement of AI relies heavily on the foundation of data science, yet its education significantly lags its demand in practice. The upcoming book 'Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making' (Yu and Barter, MIT Press, 2024; free online at www.vdsbook.com) tackles this gap by promoting Predictability, Computability, and Stability (PCS) as core principles for trustworthy data insights. PCS for veridical data science (VDS) has been developed in the process of solving scientific data science problems. It thoroughly integrates these principles into the Data Science Life Cycle (DSLC), from problem formulation to data cleansing and to result communication, fostering a new standard for responsible data analysis. This talk explores PCS' motivations, and compare the VDS book appraoch with traditional ones. Then I will describe two PCS projects on prostate cancer detection and discovery of epistastic genetic drivers for a heart disease. I will end with on-going work for PCS uncertainty quantificaiton in regression and its comparison with conformal prediction, PCS software pacakages (v-flow, simChe), and MEIRTS guidelines for data-inspired simulations.
Random Forests: Why They Work and Why That’s a Problem
Speaker: Lucas Mentch | November 19, 2024
Abstract: Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success, a full and satisfying explanation for their success has yet to be put forth. In this talk, we will show that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. From a model-complexity perspective, this means that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicit regularization procedures like the lasso. Realizing this, we demonstrate that alternative forms of randomness can provide similarly beneficial stabilization. In particular, we show that augmenting the feature space with additional features consisting of only random noise can substantially improve the predictive accuracy of the model. This surprising fact has been largely overlooked within the statistics community, but has crucial implications for thinking about how best to define and measure variable importance. Numerous demonstrations on both real and synthetic data are provided.
Causal AI in Business Practices
Speakers: Victor Lo, and Victor Chen | January 24, 2025
This webinar will explore the growing role of causal AI in uncovering cause-and-effect relationships within complex systems. The session will highlight how causal AI differs from traditional predictive models, emphasizing its potential to improve decision-making across various domains. Attendees will gain insights into techniques for measuring the impact of interventions and understanding causal mechanisms. Broader examples will illustrate its application in optimizing strategies and enhancing outcomes. Key challenges, such as data reliability and model validation, will also be explored. The webinar will conclude with practical guidance on leveraging causal AI in dynamic and high-impact settings.
Large Language Models: Transforming AI Architectures and Operational Paradigms
Speaker: Frank Wei | February 18, 2025
Abstract: The emergence of Large Language Models (LLMs) represents a paradigm shift in artificial intelligence, fundamentally transforming our approach to natural language processing and machine learning architectures. In this presentation, we will navigate through the evolutionary trajectory of LLMs, beginning with their historical foundations and theoretical underpinnings that have shaped the current landscape of AI. We will then delve into the architectural intricacies of transformer-based models, examining their self-attention mechanisms, positional encodings, and multi-head architectures that enable unprecedented language understanding and generation capabilities. As we explore the transformative impact of LLMs on traditional machine learning paradigms, we will analyze the evolution from conventional ML to LLM, highlighting the specialized operational frameworks, deployment strategies, and infrastructure requirements that distinguish these approaches. This transition encompasses novel considerations in computational orchestration, model versioning, prompt engineering, and systematic evaluation methodologies. We will critically examine how these operational paradigms are reshaping feature engineering, model architectures, and deployment pipelines in AI systems. To demonstrate these theoretical and operational principles in practice, we will conclude with a demonstration of our innovative LLM-based solution, illustrating how sophisticated architectural designs and robust operational frameworks converge to address complex real-world challenges.
Machine Learning for Airborne Biological Hazard Detection
Speaker: Jared Schuetter | March 11, 2025
This session will explore the use of machine learning for detecting and identifying airborne biological hazards. Attendees will learn how supervised and unsupervised learning techniques can analyze spectral data to differentiate between harmful and benign substances. The speaker will discuss challenges related to data preprocessing and model accuracy in dynamic environments. Case studies will illustrate real-world applications in public health and national security. The importance of rapid detection and classification in mitigating risks will be emphasized. Practical strategies for deploying machine learning models in field settings will also be shared.
Trustworthy AI in Weather, Climate, and Coastal Oceanography
Speaker: Amy McGovern | May 13, 2025
Dr. Amy McGovern is a professor in the School of Computer Science at the University of Oklahoma and in the School of Meteorology at the University of Oklahoma. Dr McGovern is also the director of the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography. Her research focuses on developing and applying trustworthy AI and machine learning methods primarily for severe weather phenomena. Dr. McGovern received her PhD in Computer Science from the University of Massachusetts Amherst in 2002 and was a senior postdoctoral research associate at the University of Massachusetts until joining the University of Oklahoma in January, 2005. She received her MS from the University of Massachusetts Amherst (1998) and her BS (honors) from Carnegie Mellon University (1996).
Recording Coming Soon!