The National Agricultural Statistics Service (NASS) conducts the U.S. Census of Agriculture every five years. In 2012, NASS began using a capture-recapture approach to adjust the Census estimates for under-coverage, non-response, and misclassification. This requires two independent samples. NASS has kept its Census Mailing List (CML) independent from its area frame, which is used for the June Area Survey (JAS) every June. NASS is exploring the use of web-scraping to develop a third list-frame (TL) that would be independent of the CML and the area frame. In this paper, a Triple-System Estimation (TSE) methodology based on regularized multinomial regression is proposed to investigate for possible dependence between the CML and the TF. A simulation study is performed to compare the performance of the estimator based on the proposed methodology, which can take into account the frame dependence with others already presented in the literature.

%B JSM 2017 %G eng %U https://www.niss.org/sites/default/files/Sartore_RestMultiReg_TSE_20170901.pdf %0 Journal Article %J Chance %D 2011 %T Research access to restricted-use data %A A. F. Karr %A S. K. Kinney %B Chance %V 24 %P 41-45 %G eng %0 Journal Article %J International Statistical Review %D 2011 %T Risk-utility paradigms for statistical disclosure limitation: How to think, but not how to act (with discussion) %A A. F. Karr %A L. H. Cox %A S. K. Kinney %XRisk-utility formulations for problems of statistical disclosure limitation are now common. We argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short in essential ways from providing a sound basis for acting upon the problems. We illustrate this position in three specific contexts—transparency, tabular data and survey weights, with shorter consideration of two key emerging issues—longitudinal data and the use of administrative data to augment surveys.

%B International Statistical Review %V 79 %P 160-199 %G eng %R 10.1111/j.1751-5823.2011.00140.x %0 Conference Paper %B Proc. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality %D 2009 %T The role of transparency in statistical disclosure limitation %A A. F. Karr %B Proc. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality %C Bilbao, Spain %8 December %G eng %U http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2009/wp.41.e.pdf %0 Journal Article %J Pharmacogenomics %D 2005 %T Recursive partitioning as a tool for pharmcogenetic studies of complex diseases: II. Statistical considerations %A Zaykin, D.V. %A Young, S.S. %XIdentifying genetic variations predictive of important phenotypes, such as disease susceptibility, drug efficacy, and adverse events, remains a challenging task. There are individual polymorphisms that can be tested one at a time, but there is the more difficult problem of the identification of combinations of polymorphisms or even more complex interactions of genes with environmental factors. Diseases, drug responses or side effects can result from different mechanisms. Identification of subgroups of people where there is a common mechanism is a problem for diagnosis and prescribing of treatment. Recursive partitioning (RP) is a simple statistical tool for segmenting a population into non-overlapping groups where the response of interest, disease susceptibility, drug efficacy and adverse events are more homogeneous within the segments. We suggest that the use of RP is not only more technically feasible than other search methods but it is less susceptible to multiple-testing problems. The numbers of combinations of gene?gene and gene?environment interactions is potentially astronomical and RP greatly reduces the effective search and inference space. Moreover, the certain reliance of RP on the presence of marginal effects is justifiable as was found by using analytical and numerical arguments. In the context of haplotype analysis, results suggest that the analysis of individual SNPs is likely to be successful even when susceptibilities are determined by haplotypes. Retrospective clinical studies where cases and controls are collected will be a common design. This report provides methods that can be used to adjust the RP analysis to reflect the population incidence of the response of interest. Confidence limits on the incidence of the response in the segmented subgroups are also discussed. RP is a straightforward way to create realistic subgroups, and prediction intervals for the within-subgroup disease incidence are easily obtained.

%B Pharmacogenomics %V 6 %P 77-89 %G eng %R 10.1517/14622416.6.1.77 %0 Conference Paper %B Proc. dg.o 2004, National Conference on Digital Government Research %D 2004 %T Regression on distributed databases via secure multi-party computation %A A. F. Karr %A X. Lin %A J. P. Reiter %A A. P. Sanil %B Proc. dg.o 2004, National Conference on Digital Government Research %P 405-406 %G eng %0 Journal Article %J Phys Rev E Stat Nonlin Soft Matter Phys %D 2003 %T Random-walk-based estimates of transport properties in small specimens of composite materials %A Jeffrey D. Picka %A Chermakani, Karthik %K Advanced Traveler Information Systems %K random walks %XA method based on random walks is developed for estimating the dc conductance and similar transport properties in small specimens of composite materials. The method is valid over a much wider range of material structures than are asymptotic methods, and requires only that the internal structure of the material be known. The error in its estimates is limited primarily by CPU speed. It is found to work best for composites consisting of a bulk conducting phase and inclusions of lower conductivity.

%B Phys Rev E Stat Nonlin Soft Matter Phys %V 4 %G eng %0 Journal Article %J Proceedings of the National Academy of Sciences %D 2003 %T Robust singular value decomposition analysis of microarray data %A Liu L %A Hawkins DM %A Ghosh S %A Young SS %B Proceedings of the National Academy of Sciences %V 100 %P 13167-13172 %G eng %0 Book Section %B Generalized Linear Models: A Bayesian Perspective %D 2000 %T Random effects in generalized linear mixed models (GLMMs) %A Sun,Dongchu %A Speckman, Paul %A Tsutakawa, R. K. %B Generalized Linear Models: A Bayesian Perspective %I Marcel dekker, Inc. %P 23-40 %G eng %0 Journal Article %J Environmetrics %D 2000 %T Regression models for air pollution and daily mortality: analysis of data from Birmingham, Alabama %A RICHARD L. SMITH %A J.M. Davis %A Jerome Sacks %A Speckman, Paul %A P. Styer %K Air Pollutants/adverse effects %K Air Pollutants/analysis %K Air Pollution/adverse effects %K Air Pollution/analysis %K Air Pollution/statistics & numerical data %K Alabama/epidemiology %K Humans %K Mortality %K Poisson Distribution %K Regression Analysis %K Risk %K Sensitivity and Specificity %K Statistical Models %XSeveral recent studies have reported associations between common levels of particulate air pollution and small increases in daily mortality. This study examined whether a similar association could be found in the southern United States, with different weather patterns than the previous studies, and examined the sensitivity of the results to different methods of analysis and covariate control. Data were available in Birmingham, Alabama, from August 1985 through 1988. Regression analyses controlled for weather, time trends, day of the week, and year of study and removed any long-term patterns (such as seasonal and monthly fluctuations) from the data by trigonometric filtering. A significant association was found between inhalable particles and daily mortality in Poisson regression analysis (relative risk = 1.11, 95% confidence interval 1.02-1.20). The relative risk was estimated for a 100-micrograms/m3 increase in inhalable particles. Results were unchanged when least squares regression was used, when robust regression was used, and under an alternative filtering scheme. Diagnostic plots showed that the filtering successfully removed long wavelength patterns from the data. The generalized additive model, which models the expected number of deaths as nonparametric smoothed functions of the covariates, was then used to ensure adequate control for any nonlinearities in the weather dependence. Essentially identical results for inhalable particles were seen, with no evidence of a threshold down to the lowest observed exposure levels. The association also was unchanged when all days with particulate air pollution levels in excess of the National Ambient Air Quality Standards were deleted. The magnitude of the effect is consistent with recent estimates from Philadelphia, Steubenville, Detroit, Minneapolis, St. Louis, and Utah Valley.

%B Environmetrics %V 11 %P 719-743 %G eng %0 Conference Proceedings %B 14th International Symposium on Transportation and Traffic Theory %D 1999 %T Route flow entropy maximization in origin-based traffic assignment, transportation and traffic theory %A Bar-Gera, H. %A Boyce, D. E. %B 14th International Symposium on Transportation and Traffic Theory %I Elsevier Science %G eng %0 Book Section %B Knowledge and Networks in a Dynamic Economy %D 1998 %T Roadway Incident Analysis with a Dynamic User-Optimal Route Choice Model %A Boyce, D. E. %A Lee, D.-H. %A Janson, B.N. %E Beckmann, Martin J. %E Johannsson, Börje %E Snickars, Folke %E Thord, Roland %XThe transportation system conveys interdependencies. When analysing the costs and benefits of transport investment projects, it is therefore necessary to address the question of linkages among projects. Such linkages can occur in terms of economies of scale in arising from the combination of projects during the construction phase. Intelligent Transportation Systems (ITS), also known as Intelligent Vehicle Highway Systems (IVHS), are applying advanced technologies (such as navigation, automobile, computer science, telecommunication, electronic engineering, automatic information collection and processing) in an effort to bring revolutionary improvements in traffic safety, network capacity utilization, vehicle emission reductions, travel time and fuel consumption savings, etc. Within the framework of ITS, Advanced Traffic Management Systems (ATMS) and Advanced Traveler Information Systems (ATIS) both aim to manage and predict traffic congestion and provide historical and real time network-wide traffic information to support drivers’ route choice decisions. To enable ATMS/ATIS to achieve the above described goals, traffic flow prediction models are needed for system operation and evaluation. Linkages may also arise in supply through interaction among network components, or among the producers of transportation services. Linkages may also emerge in demand through the creation of new opportunities for interaction.

%B Knowledge and Networks in a Dynamic Economy %I Springer Berlin Heidelberg %P 371-390 %@ 978-3-642-64350-7 %G eng %U http://dx.doi.org/10.1007/978-3-642-60318-1_21 %R 10.1007/978-3-642-60318-1_21 %0 Book Section %B Case Studies in Bayesian Statistics %D 1997 %T A Random-Effects Multinomial Probit Model of Car Ownership Choice %A Nobile, Agostino %A Bhat, Chandra R. %A Pas, Eric I. %E Gatsonis, Constantine %E Hodges, JamesS. %E Kass, RobertE. %E McCulloch, Robert %E Rossi, Peter %E Singpurwalla, NozerD. %K car ownership %K longitudinal data %K Multinomial probit model %XThe number of cars in a household has an important effect on its travel behavior (e.g., choice of number of trips, mode to work and non-work destinations), hence car ownership modeling is an essential component of any travel demand forecasting effort. In this paper we report on a random effects multinomial probit model of car ownership level, estimated using longitudinal data collected in the Netherlands. A Bayesian approach is taken and the model is estimated by means of a modification of the Gibbs sampling with data augmentation algorithm considered by McCulloch and Rossi (1994). The modification consists in performing, after each Gibbs sampling cycle, a Metropolis step along a direction of constant likelihood. An examination of the simulation output illustrates the improved performance of the resulting sampler.

%B Case Studies in Bayesian Statistics %S Lecture Notes in Statistics %I Springer New York %V 121 %P 419-434 %@ 978-0-387-94990-1 %G eng %U http://dx.doi.org/10.1007/978-1-4612-2290-3_13 %R 10.1007/978-1-4612-2290-3_13