The NISS Graduate Student Network is very excited to announce a two-day graduate student research conference!
This two-day event will take place on June 12 and 13, 2021, from noon - 5 pm ET and will feature graduate student presentations, invited speakers and a social networking hour.
Graduate Student Presentations
Students can choose to present either an oral presentation or a poster presentation within the following categories:
- Original Research (their own research work),
- Literature Research (presentation of a published paper that is not authored by the presenter), or
- Literature Review (presentation on recent developments in an area -- this would be a chance to present a couple of papers to highlight recent developments in an area of interest.)
Selected oral presentations will involve a 20 minute live presentation including 5 minutes of Q&A
NISS Affiliate Graduate Students - Use the link below to submit your interest in presenting at this conference! (Please Note: You will be asked to submit the title of your presentation, an abstract of 250 words. In your abstract for literature research presentations, please be sure to include a couple of sentences that highlight your interest in the paper!) Presenters will be notified with results of selection of live presentations on or around May 28.
At this time the Graduate Student Network Committee is now in the reviewing process and are no longer taking in abstracts
Submission Deadline: May 7th at 5 pm ET
There will be two special presentations - titles coming soon! These presentation will align with the goal of the conference.
Networking Social Hour
There will be a Networking Social Hour at the end of the conference at 5 pm June 13th!
Register to Attend this Event Today!
$20 registration for Graduate Students. All other registrations are $50.
NISS Affiliates - this event is Affiliate Award Fund Eligible.
Please Note: NISS affiliates can use their Affiliate Award Funds towards registration, please contact your NISS liaison (Check the List of NISS Affiliates) to learn more about these funds. Students registering from NISS affiliated universities are eligible for reimbursement after the conference.
Conference Organizing Committee Members
Esra Kurum (chair), (University of California, Riverside),
Sumanta Basu, (Cornell University),
Rebecca Kurtz-Garcia, (University of California, Riverside),
Xinjun Wang, (University of Pittsburgh),
David Kent (Cornell University),
Piaomu Liu (Bentley University),
Kevin Lee (Western Michigan University), and
Hannah Waddel (Emory University)
Saturday, June 12th
(Please Note: All Times ET)
12:00 – 12:10 Welcome
Jim Rosenberger, (Director, NISS)
12:10 – 12:20 Opening Remarks
Esra Kurum, (University of California, Riverside)
12:20 – 13:25 High-dimensional statistical methods with applications
Session Chair: Kevin Lee, (Western Michigan University)
12:20 – 12:40 Lei Fang, (University of Kentucky)
Distance measure on independence with respect to a statistical functional of interest
Abstract: In this article, we propose a general framework of independence measure with respect to a statistical functional of interest, which unifies some existing independent measures, such as distance covariance, Hilbert Schmidt independence criteria and martin-gale difference correlation. Under the framework, we propose a new metric, the martingale difference correlation with Reproducing Kernel to measure the conditional mean in-dependence. This is a generalization of the existing martingale difference correlation by extending the distance in Euclidean space to Reproducing Kernel Hilbert Space, which is a broader class to allow different choices of kernels. In simulations and real data applications, for the purpose of variable screening, the sample counterpart of our new metric can effectively select variables that marginally contributes to the mean or quantile of the response variable. To address the potential issue of missing important variables that have zero marginal utility with the response, we further propose a forward variable screening method based on a new conditional martingale difference correlation, which measures the conditional mean independence given third variable. Under regularity conditions, it is able to select the variables that jointly but not marginally contribute to the mean or quantile of the response variable.
12:40 – 13:00 Sara Venkatraman, (Cornell University)
A Bayesian dynamical systems approach to clustering gene expression time series data
Abstract: Gene expression time series data provides insight into the dynamics of complex biological processes, such as immune response and disease progression. It is of interest to identify genes with similar expression patterns over time because such genes often share similar biological characteristics. However, this task is challenging due to the high dimensionality of gene expression datasets and the nonlinearity of gene expression time dynamics. We propose a Bayesian approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive similarity metrics that can be used to identify groups of genes with co-moving or time-delayed expression patterns. These metrics can be used for clustering and network analysis. A key feature of this method is that it leverages biological databases that document known interactions between genes. This information is used to define informative prior distributions on the ODE model’s parameters, and acts as a regularizer that helps filter out unlikely associations. Using real gene expression data, we demonstrate that our biologically-informed similarity metrics allow us to recover sparse gene networks with clear interpretations in genetics, as well as identify less well-known genes whose time dynamics are visibly similar to those of better understood ones.
13:00 – 13:20 Gregory Hawk, (University of Kentucky)
Interpreting Linear Models in the Era of Big Data: Deriving the Distribution and Exploring the Utility of the Coefficient of Partial Determination
Abstract: A central goal in the world of statistics and data science is the construction of linear regression and ANOVA models for continuous variables of interest. Often, our objective is to examine the impact of one or more explanatory variables, after adjusting for demographic variables or some other known/relevant factor(s). While the traditional methodology in this scenario is to use a combination of partial F-tests and individual t-tests to determine statistical significance, we know that p-values obtained from such methods are heavily dependent on sample size. This is particularly problematic for large datasets or “overpowered” studies, where even the tiniest of effects will appear to be highly significant. As computing capabilities and cloud-enhanced data sharing has accelerated exponentially in the 21st century, our access to Big Data has revolutionized the way we see data around the world, from healthcare to investments to manufacturing to retail and supply-chain. While machine learning and AI are improving our ability to make predictions, we need better statistics to improve our ability to understand and translate our models into meaningful and actionable insights. The coefficient of partial determination (also known as "partial R-squared") is widely used in the applied sciences to supplement hypothesis testing, but little work has been done to understand its statistical properties. A quick sketch of the derivation for the complete distribution of partial R-squared will be given, along with simulated and real-world data examples to show the advantages of adding it to your next analysis of Big Data.
13:20 – 13:25 Floor discussion
13:30 – 14:30 Dr. John Bailer, (ISI President, Distinguished Professor and Chair Department of Statistics, Miami University)
Getting ready to tell the statistics behind the stories and the stories behind the statistics
Session Chair: Sumanta Basu, (Cornell University)
14:30 – 14:50 Break
14:50 – 16:15 Recent advances in statistical models and their applications
Session Chair: Rebecca Kurtz-Garcia, (University of California, Riverside)
14:50 – 15:10 Jericho Lawson, (University of California, Riverside)
Real-Time Classification of Atrial Fibrillation using ECG Characteristics
Abstract: 2.7 million Americans currently have atrial fibrillation (AFib), a heart issue that is described as a “quivering or irregular heartbeat." AFib can lead to other health issues, such as blood clots, stroke, and heart failure. Using AFib data from the MIT-BIH Atrial Fibrillation Database and the 2017 PhysioNet Challenge Dataset, as well as methods from Moody and Mark’s 1983 paper, we attempt to explore simple models using various classification methods and resampling techniques to detect AFib. Proportions of transition states between RR intervals are used as covariates in logistic regression, LDA, QDA, boosting, and XGBoost models. With 5-Fold Cross Validation, we can get up to 97.1% prediction accuracy and 97.0% sensitivity using the MIT-BIH dataset. Similar results can be seen with fewer covariates and dimension reduction techniques, which are important to note if implementation of these methods are used in real-time heart rate devices, such as Fitbits. Additionally, using covariates, such as RR interval variance and dimensions from the multidimensional scaling of pairwise differences in the Kolmogorov-Smirnov Tests, provide potential usefulness for AFib classification.
15:10 – 15:30 Hannah Waddel, (Emory University)
Bayesian Inverse Reinforcement Learning for Collective Animal Movement
Abstract: “Bayesian Inverse Reinforcement Learning for Collective Animal Movement” by Toryn L.J. Schafer, Christopher K. Wikle, and Mevin B. Hooten, is the first application of Bayesian inverse reinforcement learning (IRL) to the problem of collective animal movement. Agent-based models attempt to recreate natural processes by defining a priori mechanistic models to govern individual behavior. However, they suffer from problems of automatic behavior, difficulties incorporating interactions with the environment, and no memory. Inverse reinforcement learning moves in the opposite direction: it estimates a reward or cost function used in a Markov decision process (MDP) by observing an "expert" agent’s actions as the agent moves between states. Collective animal movement is a complex agent-based model, so this application of Bayesian inverse reinforcement learning can infer the cost function that individual animals use to govern their movement within a group. The authors extend the variational Bayes algorithm for IRL by approximating the posterior cost function over a continuous state space with Gaussian basis functions to increase computational efficiency. They apply their method to a simulated self-propelled particle model and real data gathered from guppy movement experiments. They conclude with their method that guppies prefer collective movement with their neighbors to targeted movement toward a reward.
15:30 – 15:50 Mai N Nguyen, (University of Kentucky)
Examining associations of trauma, aggression, and family history of psychiatric problems among adolescents psychiatric inpatients by sexual orientation
Abstract: People who are sexual minorities often face substantial mental and physical health risks and disparities compared to cisgender and heterosexual peers. Adolescence can be a particularly critical time to help these individuals given the normal developmental transitions which often include risk behaviors and experiences. A retrospective chart review of adolescent patients (n= 432, mean age 15.00±1.70, 70.1% female) was conducted on individuals who were hospitalized on a university psychiatric inpatient unit in 2012 and 2017. The overall goal of this study was to examine associations between adolescent patients’ sexual orientation and demographics, trauma histories, treatment histories and family histories using Chi square and Fisher’s Exact tests. There were fewer adolescents who identified as non-heterosexual in 2012 (3.7% were identified as homosexual, 3.7% as bisexual and 0.9% as others) than in 2017 (5.1% were identified as homosexual, 15.7% as bisexual and 3.7% as others). For both years, there was positive association between non-heterosexual individuals and having trauma history (p=0.007, φ =0.183), non- heterosexual individuals and aggression (p=0.008, φ =0.181), non-heterosexual and living in urban areas (p=0.085, φ =0.137), non-heterosexual individuals and having family psychiatric history (p=0.019, φ =0.177). In 2012, there was positive association between non- heterosexual and having trauma history (p=0.057, φ = 0.220), non- heterosexual and having family psychiatric history (p=0.016, φ = 0.370). In 2017, there was positive association between non- heterosexual and having trauma history (p=0.007, φ = 0.252), non- heterosexual and aggression (p=0.011, φ= 0.219). Further studies are needed to clarify the implications of these findings.
15:50 – 16:10 Anran Liu, (University of Buffalo)
Statistical Distances in Goodness-of-Fit
Abstract: One of the conventional approaches to the problem of model selection is to view it as a hypothesis testing problem. When the hypothesis testing framework for model selection is adopted, one usually thinks about likely alternatives to the model, or alternatives that seem to be most dangerous to the inference, such as “heavy tails”. In this context, goodness of fit problems consist a fundamental component of model selection viewed via the lens of hypothesis testing.
Statistical distances or divergences have a long history in the scientific literature, where they are used for a variety of purposes, including that of testing for goodness of fit. We develop a goodness of fit test that is locally quadratic. Our proposed test statistic for testing a simple null hypothesis is based on measures of statistical distance. The asymptotic distribution of the statistic is obtained and a test of normality is presented as an example of the derived distributional results. Our simulation study shows the test statistic is powerful and able to detect alternatives close to the null hypothesis.
16:10 – 16:15 Floor discussion
16:20 – 17:00 Methods for Spatial, Temporal, and Spatio-Temporal Data – I
Session Chair: Xinjun Wang, (University of Pittsburgh)
16:20 – 16:40 Stephan Komladzei, (University of New Orleans)
Co-localization Analysis of Bivariate Spatial Point Pattern
Abstract: Spatial point pattern analysis investigates the localizations of random events in a defined spatial space. The Spatial distribution of two types of events observed in these images reflects their underlying interactions, which is the focus of co-localization analysis in spatial statistics. In real data applications, co-localization analysis based on spatial data is often needed to study the interactions between two species using events’ locations. For example, the co-clustering of distinct types of trees, dispersion of different types of cells, and so on. Second-order statistics based on analysis of point pairs are the most popular methods, such as nearest distances, cross correlation, Ripley’s K functions, etc. Coordinate-based Colocalization (CBC) analysis is one method that was recently developed by Malkusch et al. in 2012. The method was oriented for applications of co-localization analysis for microscopic data due to the fast development of microscopic imaging in the past 10 years and hence the availability of high-resolution microscopic images. However, the CBC method did not incorporate edge corrections using raw point proportions and also ignores correlations of point proportions across incremental observational regions. Hence, the method often yields false positive results for even complete random distributions. In this research, we propose the K(r) function Coordinate-based Colocalization (KCBC) method for analysing co-localization of events between two channels for spatial data. Simulation studies for the complete spatial randomness (CSR) case are conducted to demonstrate the improvement of the KCBC method. An application to real life data was provided to illustrate the applicability of our method.
16:40 – 17:00 Isaac Quintanilla Salinas, (University of California, Riverside)
Overview of Joint Longitudinal-Survival Models: Modeling the Association Between Dependent Outcomes
Abstract: In follow-up studies implemented in applied fields, such as medicine and epidemiology, two types of outcomes may arise: a longitudinal outcome collected over time and the time to an event of interest, such as death or development of a disease. In these types of studies, it is common to analyze the outcomes separately with marginal models: the longitudinal outcome using a mixed-effects regression model and the time-to-event outcome (such as time-to-death) using a survival model. However, this approach ignores possible dependence among these outcomes and cannot be employed when the main interest is to study the association between these outcomes. Thus, a new class of models, namely, joint models for longitudinal and time-to-event data has been developed. Joint models can determine the association between the longitudinal and time-to-event outcomes, and the models can identify the response-predictor relationships, that is, the relationship between each outcome and a set of predictors such as demographics. In addition, it has been shown that joint modeling leads to considerably more precise estimators than marginal modeling. In this literature review, we present the most common joint modeling approaches and the recent developments in the field, along with a brief demonstration of how to utilize these models in R.
Sunday, June 13th
12:00 – 13:05 New advances in risk analysis
Session Chair: David Kent, (Cornell University)
12:00 – 12:20 Thilini Mahanama, (Texas Tech University)
Global Index on Financial Losses due to Crime in the United States
Abstract: Crime can have a volatile impact on investments. Despite the potential importance of crime rates in investments, there are no indices dedicated to evaluating the financial impact of crime in the United States. As such, this paper presents an index-based insurance portfolio for crime in the United States by utilizing the financial losses reported by the Federal Bureau of Investigation for property crimes and cybercrimes. Our research intends to help investors envision risk exposure in our portfolio, gauge investment risk based on their desired risk level, and hedge strategies for potential losses due to economic crashes. Underlying the index, we hedge the investments by issuing marketable European call and put options and providing risk budgets (diversifying risk to each type of crime). We find that real estate, ransomware, and government impersonation are the main risk contributors. We then evaluate the performance of our index to determine its resilience to economic crisis. The unemployment rate potentially demonstrates a high systemic risk on the portfolio compared to the economic factors used in this study. In conclusion, we provide a basis for the securitization of insurance risk from certain crimes that could forewarn investors to transfer their risk to capital market investors.
12:20 – 12:40 Imon Banerjee, (Purdue University)
PAC-Bayes Bounds on Variational Tempered Posteriors for Markov Models
Abstract: Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that place prior distributions over the parameters of the transition kernel of a Markov model, and seek to characterize the resulting, typically intractable, posterior distributions. We present a Probably Approximately Correct (PAC)-Bayesian analysis of variational Bayes (VB) approximations to tempered Bayesian posterior distributions, bounding the model risk of the VB approximations. Tempered posteriors are known to be robust to model misspecification, and their variational approximations do not suffer the usual problems of over confident approximations. Our results tie the risk bounds to the mixing and ergodic properties of the Markov data generating model. We illustrate the PAC-Bayes bounds through a number of example Markov models, and also consider the situation where the Markov model is misspecified.
12:40 – 13:00 Ifeanyichukwu V. Ukenazor, (University of North Texas)
A Simple Formula for the Expected Rate of Return of an Option over a Finite Holding Period
Abstract: Using conditions that do not violate the Black-Scholes-Merton formulation, we derive a simple formula for calculating the expected rate of return of an option over a finite holding period h possibly less than the time to expiration, t. Under these conditions, surprisingly, the expected future price of a European option, including prior to expiration, is given by the Black-Scholes formula, except that the stock price is replaced by its expected future price at the end of holding period h, similarly for the strike price, and the asset price's volatility is revised. r remains the risk-neutral interest rate and t is unchanged. We also give an example illustrating the use of the formula.
13:00 – 13:05 Floor discussion
13:10 – 14:10 Dr. Regina Nuzzo (Senior Advisor for Statistics Communication and Media Innovation, American Statistical Association)
Making your research make sense: quick tips for talking to non-experts
Session Chair: Piaomu Liu, (Bentley University)
14:10 – 14:30 Break
14:30 – 15:35 Methods for Spatial, Temporal, and Spatio-Temporal Data – II
Session Chair: Hannah Waddel, (Emory University)
14:30 – 14:50 Daniel Gulti Kebede, (Purdue University)
Testing Smooth Structural Changes in Predictive Regression Models
Abstract: Prediction is a central activity of econometrics; and predictive regressions are important tools for evaluating and testing economic models. We consider a test for smooth structural changes as well as abrupt breaks with known or unknown change points, in a predictive regression set up with strongly persistence regressors. Here the model parameters are unknown deterministic functions of time except for a finite number of points. In a simulation study this test demonstrated a better power compared to popular tests for structural breaks (mainly the SupF and LM test statistics). In an application, we have tested the stability of stock return prediction models and strongly rejected their stability in the postwar and post oil shocks periods.
14:50 – 15:10 Rebecca Kurtz-Garcia, (University of California, Riverside)
Recent Advances in Time-Average Covariance Matrix Estimation
Abstract: The time-average covariance matrix (TACM) is the variance of the sample mean when there is serially correlated data. Estimation of the TACM is of interest in various fields such as time series analysis, econometrics, and Markov chain Monte Carlo simulations. Currently, a weighted average of sample lag covariance matrices via a kernel function is one of the most common estimation methods. We will discuss kernel estimators, and two newly developed estimation methods which utilize linear combinations of kernel estimators. We will compare the estimators and discuss benefits and limitations of the three methods.
15:10 – 15:30 Eva Murphy, (Clemson University)
Modeling of Wind Speed and Wind Direction
Abstract: Near-surface wind plays an important role in different fields, for instance, renewable energy, air pollution control, building engineering, directing aircraft and ships. It is, therefore, important to accurately model the variability of wind characteristics. However, modeling wind speed and wind direction comes with some challenges. Observational wind data are typically sparse in space and hence, have a limited capacity to identify regions with high (or low) winds. Furthermore, wind assessments are commonly done over large areas with high resolution (in both space and time) climate model output, resulting in a massive and complex data set. In such cases, computations become extensive which can significantly slow down the statistical inference process. In this presentation we first review some commonly used probability distributions of near-surface wind speed and wind direction, respectively. Univariate wind speed models range from parametric (e.g., Weibull) to mixture distributions. On the other hand, wind direction, due to its circular nature, is typically modelled through circular distributions (e.g., von Mises) or finite mixture of these circular distributions. In addition, we discuss strategies to jointly model the spatio-temporal variation of wind speed and wind direction.
15:30 – 15:35 Floor discussion
15:40 –16:45 Recent contributions in methodological statistics
Session Chair: Esra Kurum, (University of California, Riverside)
15:40 – 16:00 Andrews Anum, (The University of Texas at El Paso)
A New Hybrid Optimization Method for Implementing the Minimum Density Power Divergence Estimator
Abstract: We develop a new globally convergent optimization method for solving the constrained minimization problem underlying the minimum density power divergence estimator for univariate Gaussian data in the presence of outliers. Our hybrid procedure combines the classical Newton’s method with a gradient descent iteration with a step control mechanism based on Armijo’s rule to assure global convergence. Based on extensive simulations, we compare the resulting estimation procedure with the more prominent robust competitor, Minimum Covariance Determinant (MCD) estimator, in terms of efficiency across a wide range of breakdown point values. Application to estimation and inference for real-world are also given. This is joint work with Michael Pokojovy (UTEP).
16:00 – 16:20 Sagar K N Ksheera, (Purdue University)
Precision Matrix Estimation under the Horseshoe-like Prior-Penalty Dual
Abstract: The problem of precision matrix estimation in a multivariate Gaussian model is fundamental to network estimation. Although there exist both Bayesian and frequentist approaches to this, it is difficult to obtain good Bayesian and frequentist properties under the same prior--penalty dual, complicating justification. It is well known, for example, that the Bayesian version of the popular lasso estimator has poor posterior concentration properties. To bridge this gap for the precision matrix estimation problem, our contribution is a novel prior--penalty dual that closely approximates the popular graphical horseshoe prior and penalty and performs well in both Bayesian and frequentist senses. A chief difficulty with the horseshoe prior is a lack of closed form expression of the density function, which we overcome in this article, allowing us to directly study the penalty function. In terms of theory, we establish posterior convergence rate of the precision matrix that matches the oracle rate, in addition to the frequentist consistency of the maximum a posteriori estimator. In addition, our results also provide theoretical justifications for previously developed approaches that have been unexplored so far, e.g. for the graphical horseshoe prior. Computationally efficient Expectation Conditional Maximization and Markov chain Monte Carlo algorithms are developed respectively for the penalized likelihood and fully Bayesian estimation problems, using the same latent variable framework. In numerical experiments, the horseshoe-based approaches echo their superior theoretical properties by comprehensively outperforming the competing methods. A protein--protein interaction network estimation in B-cell lymphoma is considered to validate the proposed methodology.
16:20 – 16:40 Gauri Kamat, (Brown University)
Leveraging Random Assignment to Impute Missing Covariates in Causal Studies
Abstract: Baseline covariates in randomized experiments are often used in the estimation of treatment effects, for example, when estimating treatment effects within covariate-defined subgroups. In practice, however, covariate values may be missing for some data subjects. To handle missing values, analysts can use imputation methods to create completed datasets, from which they can estimate treatment effects. Common imputation methods include mean imputation, single imputation via regression, and multiple imputation. For each of these methods, we investigate the benefits of leveraging randomized treatment assignment in the imputation routines, that is, making use of the fact that the true covariate distributions are the same across treatment arms. We do so using simulation studies that compare the quality of inferences when we respect or disregard the randomization. We consider this question for imputation routines implemented using covariates only, and imputation routines implemented using the outcome variable. In either case, accounting for randomization offers only small gains in accuracy for our simulation scenarios. Our results also shed light on the performances of these different procedures for imputing missing covariates in randomized experiments when one seeks to estimate heterogeneous treatment effects.
16:40 – 16:45 Floor discussion
16:50 – 17:00 Closing remarks
Xinjun Wang, (University of Pittsburgh)
17:00 – 18:00 Networking Happy Hour
Event Host: NISS GSN Executive Committee
About the Invited Speakers
John Bailer is a University Distinguished Professor of Statistics at Miami University in Ohio, USA. He is also affiliated with the Departments of Media, Journalism and Film, Biology and Sociology and Gerontology at Miami. His work focuses on risk assessment in occupational health, and combining journalism to statistics. He created the podcast Stats+Stories, which “Addresses The Story Behind The Statistics And The Statistics Behind The Stories.” It is sponsored by the American Statistical Association and is hosted on National Public Radio podcasts and other podcast locations. In 2021, Bailer and his podcast colleagues received the JBPM Communications Award.
Bailer received his PhD in Biostatistics from University of North Carolina in 1986. He is a fellow of the American Statistical Association and of the Society for Risk Analysis, an elected member of the International Statistical Institute and an elected fellow of the American Association for the Advancement of Science. He has received several Distinguished Teaching awards at Miami University, and the Founders Award of the American Statistical Association. He is a member of the Executive Committee of the International Statistical Institute 2013-2021, serving the last two years as president. He has published some 150 peer-reviewed papers and five books.
Dr. Regina Nuzzo is a freelance science writer and professor in Washington, DC. After studying engineering as an undergraduate she earned her PhD in Statistics from Stanford University. Currently she’s teaching statistics in American Sign Language at Gallaudet University, the world’s only liberal arts college for deaf and hard-of-hearing students. Dr. Nuzzo is also a graduate of Science Communication program at the University of California-Santa Cruz. Her science journalism specialties center around data, probability, statistics, and the research process. Her work has appeared in Nature, Los Angeles Times, New York Times, Reader’s Digest, New Scientist, and Scientific American, among others. Dr. Nuzzo has been invited to speak to a variety of audiences about her work, such as why we just can’t understand p-values, how our brain can fool us during data analysis, what happens when people abuse and misuse statistics, and tips and tricks for communicating anything with numbers and statistics.