NISS at JSM 2026!

Join NISS at JSM 2026! See full schedule:

Sunday, August 2, 2026

8:00 AM - 12:00 PM

NISS Writing Workshop for Junior Researchers

Chair: David S. Matteson, Cornell University & National Institute of Statistical Sciences

Organizers: Toryn Schafer, Texas A&M University, Emily Griffith, North Carolina State University, Julia L. Sharp, National Institute of Standards and Technology, Alexander Volfovsky, Duke University, Jun Yan, University of Connecticut

NISS annually hosts the two-day Writing Workshop (WW) for Junior Researchers in conjunction with the Joint Statistical Meetings (JSM). Tutorial sessions on writing articles for technical journals and research proposals are followed by individual personal mentoring by experienced editors and proposal reviewers. To help junior researchers improve their technical writing skills by providing instruction on writing journal articles and grant proposals followed by individual mentoring from senior editors and funding program officers.

Presentations:

Tutorial 1: Writing Well as an Act of Kindness for Our Students and Colleagues

Tutorial 2: Choosing Where to Publish

Panel 1: Grant Writing

Tutorial 3: Reviewing and Revising

Tutorial 4: How to Write a Collaborative Paper

Panel 2: Statistics and Data Science Journals

Tutorial 5: Ethical Issues and Reproducibility

Short Course 6: An Introduction to ChatGPT

Panel 3: Speaking from Experiences and Career Development

Sunday, August 2, 2026

2:00 PM - 3:50 PM

1423 Invited Paper Session

Chair & Organizer: David S. Matteson, Cornell University & National Institute of Statistical Sciences

Sponsors: National Institute of Statistical Sciences; American Association for the Advancement of Science; Data Science in Science

Data Science in Science

Invited Session for ASA Journal: Data Science in Science. Kathy Ensor (Rice): "Uncovering dynamics between SARS-CoV-2 wastewater concentrations and community infections via Bayesian spatial functional concurrent regression" Yang Chen (UMich): Sparse Variational Contaminated Noise Gaussian Process Regression for Forecasting Geomagnetic Perturbations, Jason Cho (Cornell): Non-fungible token transactions: Data and challenges John Kornak (UCSF): An MCMC Approach to Bayesian Image Analysis in Fourier Space.

Presentations:

An MCMC Approach to Bayesian Image Analysis in Fourier Space.

Gaussian Process Regression Models for Forecasting Geomagnetic Perturbations.

Non-fungible token transactions: Data and challenges.

Uncovering dynamics between SARS-CoV-2 wastewater concentrations and community infections via Bayesian spatial functional concurrent regression.

Monday, August 3, 2026

8:30 AM - 10:20 AM

1176 Invited Paper Session

Chair: Likai Chen, Washington University in St. Louis

Organizer: Elynn Chen, New York University

Speaker: David S. Matteson, Cornell University & National Institute of Statistical Sciences

Speaker: Elynn Chen, New York University

Speaker: Yao Xie, Georgia Institute of Technology

Speaker: Xiaofeng Shao, Washington University in St Louis, Dept of Statistics and Data Science

Speaker: Likai Chen, Washington University in St Louis

Main Sponsor: Business and Economic Statistics Section

Co-Sponsors: Committee on Women in Statistics; International Chinese Statistical Association

Dynamics at Scale: Statistical Frontiers in High-Dimensional and Streaming Time Series

Network Modeling of Large-scale Time Series with Cumulative Impulse Response Functions

Network modeling of multivariate time series has emerged as a useful framework for understanding interactions amongst the component of a dynamical system in many areas of biological and social sciences. We develop a method to construct sparse, weighted, directed network where each edge captures how a shock to one component dynamically manifests in the other component using cumulative impulse response functions (cIRF). This is in sharp contrast with existing works, where network edges primarily capture in some form the Granger-causal effects (lead-lag association) among the component time series, and rely on a parsimonious vector autoregressive (VAR) representation of the system. Building upon our previous work on large-scale vector autoregressive moving averages (VARMA), we develop an iterative procedure for estimating cIRF. Using simulation experiments, we show that when the data generating process has a sparse vector moving average (VMA) representation, our method outperforms competing alternatives. We also prove that our algorithm, restricted to any finite number of iterations, consistently estimates impulse responses under high-dimensional asymptotics. Finally, we use our method to construct financial networks from realized volatilities of stock prices before, during and after the US financial crisis of 2007-09.

Monday, August 3, 2026

2:00 PM - 3:50 PM

1735 Topic-Contributed Paper Session (NISS/IMSI)

Causal Inference with Interference: What Network and Spatial Statisticians Can Teach Each Other

Chair: Maryclare Griffin, University of Massachusetts Amherst

Sponsors: ASA Advisory Committee on Climate Change Policy; Biometrics Section; Section on Statistics and the Environment

We develop new methodology to improve our understanding of the causal effects of multivariate air pollution exposures on public health accounting for mobility. Typically, in environmental health studies, exposure to air pollution for an individual is assigned based on their residential address, though many people spend time in different regions with potentially different levels of air pollution. To account for this, we incorporate estimates of the mobility of individuals from cell phone mobility data to obtain a more accurate estimate of their air pollution exposure. We treat this as an interference problem, where individuals in one geographic region can be affected by exposures in other regions due to mobility into those areas. We propose policy-relevant estimands and derive expressions showing the extent of bias one would obtain by ignoring individuals' mobility. We additionally highlight the benefits of the proposed interference framework relative to a measurement error framework to account for mobility. Utilizing flexible Bayesian methodology we develop novel estimation strategies to estimate causal effects that account for this spatial spillover. Lastly, we use the proposed methodology to study the health effects of ambient air pollution on mortality among Medicare enrollees in the United States.

Presentations:

A Spatial Interference Approach to Account for Mobility in Air Pollution Studies with Multivariate Continuous Treatments.

Closed-form Expressions for Causal Effects for Causal Effect Estimators Under Dependence.

Estimating the Spillover Effect of Animal Feeding Operations on Air and Water Quality in Iowa: A Causal Framework with Stochastic Controls.

Generative AI–Enabled Causal Mediation Analysis for Network-Valued Health Outcomes.

HODOR: A Two-Stage Randomized Design for A/B Tests with Unobserved Network Spillover.

Tuesday, August 4, 2026

8:30 AM - 10:20 AM

1733 Topic-Contributed Paper Session

The NISS–IMSI Data Science at the Intersection of Environment and Public Health Program

Chair: David S. Matteson, Cornell University & National Institute of Statistical Sciences

Discussant: Bo Li, Washington University in St. Louis

Organizer: Whitney Huang, Clemson University

Organizer: David S. Matteson, Cornell University & National Institute of Statistical Sciences

Main Sponsor: ASA Advisory Committee on Climate Change Policy

Co-Sponsors: Section on Statistics and the Environment; Section on Statistics in Epidemiology

This session highlights emerging research directions from the 2025 IMSI–NISS Ideas Lab on Data Science at the Intersection of Public Health and the Environment. The Ideas Lab convened statisticians, environmental scientists, epidemiologists, and data science researchers to tackle complex challenges that link environmental stressors, extreme events, and human health outcomes.

The aim of this session is to showcase early progress from the NISS–IMSI Ideas Lab teams; foster cross-disciplinary dialogue among researchers working in causal inference, spatial statistics, and extreme-value modeling as they integrate their methods to address questions involving environmental exposures and public health outcomes; and illuminate new methodological directions in environmental public-health statistics arising from this collaborative initiative.

Presentations in this session feature methodological innovations in Bayesian calibration for air-pollution exposure data fusion, causal inference for complex environmental mixtures, and causal impact evaluation in the context of extreme events and/or extreme health outcomes. Additional work highlights generalized synthetic control approaches for assessing how hurricanes affect access to opioid use disorder treatment.

Eva Murphy, Jay Xu, and Vijay Kumar, all current postdoctoral fellows, exemplify the emerging talent supported by this program. Whitney Huang, the session co-organizer (joint with David Matteson, the overall program lead), is an early- to mid-career researcher who has benefited substantially from initiatives of this kind-beginning as a SAMSI postdoctoral fellow and later contributing to several IMSI research programs.

The session will conclude with a discussion by Bo Li, one of the program leaders, whose broad research expertise across these domains positions her to explore emerging connections among causal inference, spatial and spatio-temporal statistics, and extreme-value analysis in epidemiological studies of complex environmental exposures.

Presentations:

Bayesian calibration of low-cost air quality sensors.

Continuous-Time Framework for Estimating Time-Varying Survivor Average Causal Effects for a Longitudinally Measured Outcome.

Extremes in Environmental Exposures and Health Outcomes.

Impact of Hurricane Florence on Buprenorphine Transactions for Opioid Use Disorder: A Dual-Perspective Synthetic Control Study.

Wednesday, August 5, 2026

10:30 AM - 12:20 PM

1669 Topic-Contributed Paper Session

Chair & Co-Organizer: Luca Sartore, National Institute of Statistical Sciences and US Department of Agriculture, National Agricultural Statistics Service

Discussant: Valbona Bejleri, US Department of Agriculture, National Agricultural Statistics Service

Organizer: David S. Matteson, Cornell University & National Institute of Statistical Sciences

Main Sponsor: Section on Statistical Computing

Co-Sponsors: Biometrics Section; Government Statistics Section

Navigating High-Dimensional Landscapes: Innovations in Model Estimation and Predictive Inference

Traditional regression approaches are not suitable for analyzing high-dimensional data sets. Recent advances in big-data analytics have enabled the sparse selection of informative variables to enhance the interpretability and predictive accuracy of models for high-dimensional data. However, several challenges in high-dimensional spaces remain unaddressed in the statistical literature. For example, from a frequentist perspective, model selection and its properties are not fully studied in capture-recapture contexts or when dealing with data from heterogeneous domains. From a Bayesian perspective, however, approaches to modeling high-dimensional data sets focus on stochastic variable selection, adaptive shrinkage, or model averaging. Nevertheless, current state-of-the-art Bayesian methods are not fully equipped to simultaneously handle hierarchical population structures, heteroscedastic designs, various missing data mechanisms, and different levels of missingness. Addressing these challenges requires the development of new methods that improve computational efficiency relative to existing techniques. These innovations are crucial for advancements in various fields such as econometrics, healthcare, and social sciences. Overall, this section presents diverse perspectives to advance high-dimensional analytics, providing reliable and effective alternatives for statistical practitioners.
Habtamu Benecha from the United States Department of Agriculture's National Agricultural Statistics Service will begin the session with an advanced variable selection method designed for the US Census of Agriculture. He will highlight iterative approaches for the initialization and successive optimization of model parameters in high-dimensional settings. Ivy Yuexin Zhang from Stanford University will present a delta-invariant method for feature selection, addressing the challenges of retrieving a stable signal in high-dimensional heterogeneous domains. Johannes Bleher from Hohenheim University will discuss a probabilistic procedure for variable selection when missing covariate data are handled through multiple imputations. He will evaluate his procedure through a Monte Carlo study under several missing data mechanisms and demonstrate its application using survey data. Aliaksandr Hubin from Oslo University will introduce the concept of active paths for accurately identifying true covariates in high-dimensional non-linear systems. He will offer a novel perspective on a sparse representation of latent binary Bayesian neural networks to identify over-parameterized models. Finally, Valbona Bejleri from the United States Department of Agriculture's National Agricultural Statistics Service will conclude the session as a discussant. She will summarize the innovations in high-dimensional methods, highlighting future research directions and opportunities for collaboration among statisticians from various backgrounds.

Presentation:

Variable Selection in Capture-Recapture Models for Adjustment Weight Estimation

USDA's National Agricultural Statistics Service (NASS) conducts the Census of Agriculture every five years. Because the Census Mailing List (CML) is incomplete, NASS uses the June Area Survey (JAS) to assess undercoverage. A capture–recapture framework allows for the estimation of weights to adjust for undercoverage, nonresponse, and misclassification. First, the CML and JAS records are linked, then sigmoidal models are fitted to all records. Standard penalized logistic regression may fail to identify the most important covariates, resulting in higher bias and uncertainty of model-based estimates. We introduce a novel penalty structure that enables joint variable selection across multiple models and yields improved adjustment weights and unbiased Census totals. Our approach combines advanced penalties with fractional gradient descent to handle high-dimensional settings, where predictors and interactions exceed ten million elements. Applied to 2022 Census data, it isolates critical predictors, reduces bias, and preserves parsimony, offering a scalable solution for accurate and efficient agricultural statistics.

Speaker: Habtamu Benecha, US Department of Agriculture, National Agricultural Statistics Service

Co-Authors: Habtamu Benecha, US Department of Agriculture, National Agricultural Statistics Service; Justin van Wart, US Department of Agriculture, National Agricultural Statistics Service; Valbona Bejleri, US Department of Agriculture, National Agricultural Statistics Service; Luca Sartore, National Institute of Statistical Sciences and US Department of Agriculture, National Agricultural Statistics Service

Wednesday, August 5, 2026

1:45 PM – 3:30 PM

1191 Invited Panel Session

Opportunities and Obstacles of the U.S. Department of Health and Human Services (HHS) Open Data Plan

Chair: Theresa Kim, National Institutes of Health, National Institute on Aging

Organizers: Theresa Kim, National Institutes of Health/National Institute on Aging; Christopher Steven Marcum, Data Foundation

Primary Sponsor: Government Statistics Section

Secondary Sponsor: Health Policy Statistics Section

Tertiary Sponsor: The Caucus for Women in Statistics and Data Science

Background: Sharing scientific data is essential for building and maintaining public trust in science and supporting evidence-based policymaking. The transparency that public access to government funded public health research is increasingly critical during a time when disinformation, especially as disinformation about scientific research continues to mount challenges against scientific integrity. As our collective understanding and insights into the needs for improving health and wellbeing of Americans deepens, so does our need for findable, accessible, interoperable, and reuseable, data. The recently updated HHS Open Data Plan provides a pathway for a making public data assets related to healthcare more readily available for researchers, industry analysts, policymakers, and more.

The update to the HHS Open Data plan couldn’t come at a more critical time. In an increasingly data-driven health research landscape, complexities involving record-linkage, AI-readiness, and privacy protections challenge the shareability of data for research purposes. For example, there is a need to harmonize disparately collected electronic health records, imaging data, claims, patient surveys, test results, and other data in a manner that is fit for both machine and human interpretation with a goal to provide more comprehensive care, facilitate greater insights, and ultimately improve health outcomes.

Data sharing creates new data governance and privacy challenges to ensure these complex data work together. Although there are increasing calls for data and meta-data standards for critical data sources, health systems researchers typically adhere to several local-specific, institution-specific, or healthcare vendor-specific standards for data. Since the adoption of the Patient Protection and Affordable Care Act, health systems must use electronic health records to manage patients’ care while increasingly unifying ontologies and vocabularies as well as promoting data harmonization to prevent negative health outcomes such as unexpected emergency department use from medication errors.

Scientific Problem: The US Department of Health and Human Services recognizes that there are challenges associated with data sharing, such as privacy and confidentiality concerns, informed consent procedures, the costs that businesses may pay for data collection, and data administration through multiple institutions or organizations may have collected the data, where ownership may be unclear. Increased use of AI algorithms adds further data governance and privacy challenges and complexities despite the promise of increased efficiency and access to health services, where there are increased healthcare expenditures, healthcare workforce burnout, and health disparities.

Content: This invited session panel will discuss the benefits and limitations of data sharing and the current opportunities and challenges of data sharing plans within the Department of Health and Human Services. The panel will consider how data sharing has provided information about disease surveillance, improved readability of unstructured data, opportunities for greater support of rare diseases through condition-specific registries, highlighted safety in clinical trials and other health data while simultaneously maintaining or further creating inefficiencies, costs, and systemic biases.

The panel includes:

1. Christopher Steven Marcum, Data Foundation

2. Claire McKay Bowen, The Urban Institute and the Administration of Children and Families

3. Jason Gerson, Patient-Centered Outcomes Research Institute

4. David S. Matteson, Cornell University & National Institute of Statistical Sciences

5. Nick Hart, Data Foundation

6. Stephanie Psaki, invited, Center for Strategic and International Studies

Thursday, August 6, 2026

10:30 AM - 12:20 PM

1745 Topic-Contributed Paper Session

Modeling Complexity at the Climate–Health Interface

Chair & Co-Organizer: Whitney Huang, Clemson University

Organizer: David S. Matteson, Cornell University & National Institute of Statistical Sciences

Main Sponsor: Section on Statistics and the Environment

Co-Sponsors: Biometrics Section; Section on Statistics in Epidemiology

Presentations:

A Maximum Proxy-Likelihood Estimator for Linear Methods in Multivariate Extremes.

Bayesian Vecchia Gaussian Process Tree Model for High-Dimensional Data.

Blackbox Posterior Approximation via Gaussian Process Surrogates with Applications in Ecology and Pharmacokinetics.

Forecasting and Nowcasting Variant Proportions with Genomic Data at the Regional Level.

Modeling count data in climate-sensitive systems.

Monday, June 8, 2026 by Megan Glenn

You are here

NISS at JSM 2026!