### September, 2020

*Each month, NISS Assistant Director Lingzhou Xue highlights a number of events hosted at NISS Affiliate institutions that you will not want to miss!*

The year 2020 marks the 30th anniversary of the creation of NISS and the 20th anniversary of the formation of NISS Affiliates program. I am fortunate to witness how NISS and its affiliates are coping with the new normal and growing during these unprecedented, uncertain times. I am thrilled to share several events hosted at NISS affiliate institutions that may interest a broader audience in our community. I hope these events on the horizon will also encourage junior researchers and graduate students to explore some of these topics in the future.

Lingzhou Xue, NISS Assistant Director

*Check these out. Login to post comments or send them to gjohnson@niss.org.*

#### September 1, Tuesday 3:00 - 4:15 p.m. EDT

**Title**: “On Nearly Assumption-Free Tests of Nominal Confidence Interval Coverage for Causal Parameters Estimated by Machine Learning”**Host**: Online Seminar on Mathematical Foundations of Data Science**Event Webpage**: https://sites.google.com/view/seminarmathdatascience/home **Speaker**: James Robins (https://www.hsph.harvard.edu/james-robins/), Harvard University

**Abstract**: For many causal effect parameters of interest, doubly robust machine learning (DRML) estimators are the state-of-the-art, incorporating the good prediction performance of machine learning; the decreased bias of doubly robust estimators; and the analytic tractability and bias reduction of sample splitting with cross fitting. Nonetheless, even in the absence of confounding by unmeasured factors, the nominal Wald confidence interval may still undercover even in large samples, because the bias may be of the same or even larger order than its standard error.

In this paper, we introduce essentially assumption-free tests that (i) can falsify the null hypothesis that the bias is of smaller order than its standard error, (ii) can provide a upper confidence bound on the true coverage of the Wald interval, and (iii) are valid under the null under no smoothness/sparsity assumptions on the nuisance parameters. The tests, which we refer to as Assumption Free Empirical Coverage Tests (AFECTs), are based on a U-statistic that estimates part of the bias.

Our claims need to be tempered in several important ways. First no test, including ours, of the null hypothesis that the ratio of the bias to its standard error is smaller than some threshold can be consistent [with-out additional assumptions (e.g. smoothness or sparsity) that may be incorrect]. Second the above claims only apply to certain parameters in a particular class. For most of the others, our results are unavoidably less sharp.

#### September 3, Thursday 3:30 - 4:30 p.m. EDT

**Title**: “Probabilistic Inference and Learning with Stein's Method”**Host**: Penn State University - Department of Statistics**Event Webpage**: https://science.psu.edu/stat/colloquia/fa20/lester-mackey**Speaker**: Lester Mackey (https://web.stanford.edu/~lmackey/), Microsoft Research New England

**Abstract**: Stein's method is a powerful tool from probability theory for bounding the distance between probability distributions. In this talk, I'll describe how this tool designed to prove central limit theorems can be adapted to assess and improve the quality of practical inference procedures. I'll highlight applications to Markov chain Monte Carlo sampler selection, goodness-of-fit testing, variational inference, and nonconvex optimization and close with several opportunities for future work.

#### September 9, Wednesday 12:00 - 1:00 p.m. EDT

**Title**: “Adding Numbers and Shuffling Cards”**Host**: University of North Carolina in Greensboro - Department of Mathematics and Statistics**Event Webpage**: https://mathstats.uncg.edu/event/adding-numbers-and-shuffling-cards/ **Speaker**: Persi Diaconis (https://statistics.stanford.edu/people/persi-diaconis), Stanford University

**Abstract**: When numbers are added in the usual way, ‘carries’ occur along the way. It is natural to ask, ‘for typical numbers, how do the carries go?’ It turns out that the carries form a Markov chain with an ‘AMAZING’ transition matrix. This same matrix occurs in analyzing the usual riffle shuffle we use when mixing cards. The matrix also occurs in taking sections of generating functions and in the fractal analysis of Pascal’s triangle. The different appearances interact and remind us that different areas of mathematics all connect. I will explain all of this ‘in English’.

#### September 10, Thursday 3:30 - 4:30 p.m. EDT

**Title**: “Why Aren’t Network Statistics Accompanied by Uncertainty Statements?”**Host**: University of Illinois Urbana-Champaign - Department of Statistics**Event Webpage**: https://calendars.illinois.edu/detail/1439/33377746 **Speaker**: Eric Kolaczyk (http://math.bu.edu/people/kolaczyk/), Boston University

**Abstract**: Tens of thousands of scientific articles have been published in the last 20 years with the word “network” in the title. And the vast majority of these report network summary statistics of one type or another. However, these numbers are rarely accompanied by any quantification of uncertainty. Yet any error inherent in the measurements underlying the construction of the network, or in the network construction procedure itself, necessarily must propagate to any summary statistics reported. Perhaps surprisingly, there is little in the way of formal statistical methodology for this problem. I summarize results from our recent work, for the case of estimating the density of low-order subgraphs. Under a simple model of network error, we show that consistent estimation of such densities is impossible when the rates of error are unknown and only a single network is observed. We then develop method-of-moment estimators of subgraph density and error rates for the case where a minimal number of network replicates are available (i.e., just 2 or 3). These estimators are shown to be asymptotically normal as the number of vertices increases to infinity. We also provide confidence intervals for quantifying the uncertainty in these estimates, implemented through a novel bootstrap algorithm. We illustrate the use of our estimators in the context of gene coexpression networks — the correction for measurement error is found to have potentially substantial impact on standard summary statistics. This is joint work with Qiwei Yao and Jinyuan Chang.

#### September 17, Thursday 3:30 - 5:00 p.m. EDT

**Title**: “Backfitting for Large Scale Crossed Random Effects Regressions”**Host**: University of Michigan - Departments of Biostatistics**Event Webpage**: https://sph.umich.edu/events/event.php?ID=8463**Speaker**: Art Owen (https://statweb.stanford.edu/~owen/), Stanford University

**Abstract**: Large scale genomic and electronic commerce data sets often have a crossed random effects structure, arising from genotypes x environments or customers x products. Regression models with crossed random effect error models can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as N^(3/2) (or worse) for N observations. Papaspiliopoulos, Roberts and Zanella (2020) present a collapsed Gibbs sampler that costs O(N), but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs O(N) under greatly relaxed though still strict sampling assumptions. Empirically, the backfitting algorithm costs O(N) under further relaxed assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.

#### September 18, Friday 10:30 - 11:30 a.m. EDT

**Title**: “Integrating Domain-Knowledge into Deep Learning”**Host**: Purdue University - Department of Statistics**Event Webpage**: https://www.stat.purdue.edu/theme_seminar/index.html**Speaker**: Ruslan Salakhutdinov (https://www.cs.cmu.edu/~rsalakhu/), Carnegie Mellon University

**Abstract**: I will first discuss deep learning models that can find semantically meaningful representations of words, learn to read documents and answer questions about their content. I will introduce methods that can augment neural representation of text with structured data from Knowledge Bases (KBs) for question answering, and show how we can answer complex multi-hop questions using a text corpus as a virtual KB. In the second part of the talk, I will show how we can design modular hierarchical reinforcement learning agents for visual navigation that can perform tasks specified by natural language instructions, perform efficient exploration and long-term planning, learn to build the map of the environment, while generalizing across domains and tasks.

September 23, Friday 12:00 - 1:00 p.m. EDT

**Title**: “Space-Filling Designs for Computer Experiments”**Host**: University of North Carolina in Greensboro - Department of Mathematics and Statistics**Event Webpage**: https://mathstats.uncg.edu/event/space-filling-designs-for-computer-expe...**Speaker**: Roshan Joseph (https://sites.gatech.edu/roshan-joseph/), Georgia Tech

**Abstract**: Space-filling properties are important in designing computer experiments. In this talk, first I will review the popular space-filling designs such as Latin hypercube, uniform, maximin, and minimax designs. Then, I will present a useful design known as maximum projection design in detail. Application of these designs in computer experiments, Bayesian computation, and model calibration will be illustrated with several examples.

- Log in to post comments