%0 Conference Proceedings %B JSM 2017 %T Design Weight and Calibration %A Toppin, K. %A Sartore, L. %A Spiegelman, C. %K Calibration %K Dual System Estimation; Weights; Census of Agriculture %X

The USDA’s National Agricultural Statistics Service (NASS) conducts the U.S. Census of Agriculture in years ending in 2 and 7. Population estimates from the census are adjusted for undercoverage, non-response and misclassification and calibrated to known population totals. These adjustments are reflected in weights that are attached to each responding unit. Calculating these weights has been a two-part procedure. First, one calculates initial (Dual System Estimation or DSE) weights that account for under-coverage, non-response and misclassification. and in the second step, calibration is used to adjust the weights by forcing the weighted estimates obtained in the first step to match known population totals. Recently, a calibration algorithm, Integer Calibration
(INCA), was developed to produce integer calibrated weights as required in NASS publications. This paper considers combining the two steps of calculating weights into one. This new algorithm is based on a regularized constrained dual system estimation methodology, which combines capture-recapture and calibration (CaRC).

%B JSM 2017 %G eng %U https://www.niss.org/sites/default/files/Toppin_CaRC_20170926.pdf %1

Download: https://www.niss.org/sites/default/files/Toppin_CaRC_20170926.pdf

%0 Conference Proceedings %B JSM 2017 %T Estimated Covariance Matrices Associated with Calibration %A Sartore, L. %A Toppin, K. %A Spiegelman, C. %K Agriculture %K Calibration %K Census %K Estimation %K NASS %K Survey %K Variance %K Weighting %X

Surveys often provide numerous estimates of population parameters. Some of the population values may be known to lie within a small range of values with a high level of certainty. Calibration is used to adjust survey weights associated with the observations within a data set. This process ensures that the “sample” estimates for the target population totals (benchmarks) lie within the anticipated ranges of those population values. The additional uncertainty due to the calibration process needs to be captured. In this paper, some methods for estimating the variance of the population totals are proposed for an algorithmic calibration process based on minimizing the L1-norm relative error. The estimated covariance matrices for the calibration totals are produced either by linear approximations or bootstrap techniques. Specific data structures are required to allow for the computation of massively large covariance matrices. In particular, the implementation of the proposed algorithms exploits sparse matrices to reduce the computational burden and memory usage. The computational efficiency is shown by a simulation study.

%B JSM 2017 %G eng %U https://www.niss.org/sites/default/files/Sartore_Variance_Estim_20170926.pdf %0 Conference Proceedings %B JSM 2017 %T Estimation of Capture Probabilities by Accounting for Sample Designs %A Abernethy, J. %A Sartore, L. %A Benecha, H. %A Spiegelman, C. %K Agriculture %K CaptureRecapture %K Estimation %K government %K NASS %K Research %K SampleDesigns %K Weights %X

The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) conducts the Census of Agriculture every five years to estimate the number of U.S. farms, as well as other agriculturally related population totals. NASS applies a Dual-System Estimation (DSE) methodology on data collected from the Census and the June Area Survey (JAS) to estimate the number of farms in the U.S.. Traditional multinomial-based capture-recapture methodology requires a model to estimate the probability of capture for every captured operation on either survey. Of course, the selection probabilities associated with the JAS area frame design are different from those associated with the Census. Such a difference makes it difficult to compute the exact JAS selection probabilities for farm records captured only by the Census. For this reason, we propose and compare three methods for estimating the overall capture probability. The first two methods involve approximating the JAS selection probabilities and the third conditions them out. We compare these three techniques to investigate their precision through a simulation study.

%B JSM 2017 %G eng %U https://www.niss.org/sites/default/files/Abernethy_Capture_Probs_20170920.pdf %1

In Proceedings of the Government Statistics Section, JSM 2017. Download https://www.niss.org/sites/default/files/Abernethy_Capture_Probs_20170920.pdf

%0 Conference Proceedings %B JSM 2017 %T Evaluation of a New Approach for Estimating the Number of U.S. Farms %A Benecha, H. %A Abreu, D. %A Abernethy, J. %A Sartore, L. %A Young, L. Y. %K Agriculture %K Area-frame %K BigData %K Capture-Recapture %K List Frame %K Logistic Regression %K Misclassification Error %K NASS %X

USDA’s National Agricultural Statistics Service (NASS) employs the June Area Survey (JAS) to produce annual estimates of U.S. farm numbers. The JAS is an area-frame-based survey conducted every year during the first two weeks of June. NASS also publishes an independent estimate of the number of farms from the quinquennial Census of Agriculture. Studies conducted by NASS have shown that farm number estimates from the JAS can be biased, mainly due to misclassification of agricultural tracts during the pre-screening and data collection processes. To adjust for the bias, NASS has developed a capture-recapture model that uses NASS’s list frame as the second sample, where estimation is performed based on records in the JAS with matches in the list frame. In the current paper, we describe an alternative capture-recapture approach that uses all available data from the JAS and the Census of Agriculture to correct for biases due to misclassification and to produce more stable farm number estimates.

%B JSM 2017 %G eng %U https://www.niss.org/sites/default/files/Benecha_Estim_Farms_20170929.pdf %0 Journal Article %T Multidimensionality in the Performance-based Online Reading Comprehension Assessment %A W. Cui %A Nell Sedransk %G eng %0 Journal Article %T The Performance Characteristics of Three Formats for Assessing Internet Research Skills in Science %A Kulikowich, J.M. %A Leu, D. %A Nell Sedransk %A Coiro, J. %A Forzani, E. %G eng %0 Journal Article %T Psychometric Invariance of Online Reading Comprehension Assessment across Measurement Conditions %A W. Cui %A Nell Sedransk %G eng %0 Conference Proceedings %B JSM 2017 %T Restricted Multinomial Regression for a Triple-System Estimation with List Dependence %A Sartore, L. %A Benecha, H. %A Toppin, K. %A Spiegelman, C. %K Agriculture %K BigData %K Capture %K DataScience %K Dependence %K Estimation %K NASS %K Probability %K Triple-System %K Weights %X

The National Agricultural Statistics Service (NASS) conducts the U.S. Census of Agriculture every five years. In 2012, NASS began using a capture-recapture approach to adjust the Census estimates for under-coverage, non-response, and misclassification. This requires two independent samples. NASS has kept its Census Mailing List (CML) independent from its area frame, which is used for the June Area Survey (JAS) every June. NASS is exploring the use of web-scraping to develop a third list-frame (TL) that would be independent of the CML and the area frame. In this paper, a Triple-System Estimation (TSE) methodology based on regularized multinomial regression is proposed to investigate for possible dependence between the CML and the TF. A simulation study is performed to compare the performance of the estimator based on the proposed methodology, which can take into account the frame dependence with others already presented in the literature.

%B JSM 2017 %G eng %U https://www.niss.org/sites/default/files/Sartore_RestMultiReg_TSE_20170901.pdf %0 Journal Article %J Reading Horizons %D 2016 %T The Common Core Writing Standards: A descriptive study of content and alignment with a sample of former state standards %A Troia, G. A. %A Olinghouse, N. G. %A Wilson, J. %A Stewart, K. O. %A Mo, Y. %A Hawkins, L. %A Kopke, R.A. %B Reading Horizons %G eng %0 Journal Article %J Computers and Geosciences %D 2016 %T spMC: an R-package for 3D Lithological Reconstructions Based on Spatial Markov Chains %A Sartore, L. %A Fabbri, P. %A Gaetan, C. %X

The paper presents the spatial Markov Chains (spMC) R-package and a case study of subsoil prediction/simulation in a plain site of the NE Italy. spMC is a quite complete collection of advanced methods for data inspection, besides spMC implements Markov Chain models to estimate experimental transition probabilities of categorical lithological data. Furthermore, in spMC package the most known estimation/simulation methods as indicator Kriging and CoKriging were implemented, but also most advanced methods such as path methods and Bayesian procedure exploiting the maximum entropy. Because the spMC package was thought for intensive geostatistical computations, part of the code is implemented with parallel computing via the OpenMP constructs, allowing to deal with more than five lithologies, but trying to keep a computational efficiency. A final analysis of this computational efficiency of spMC compares the prediction/simulation results using different numbers of CPU cores, considering the example data set of the case study available in the package.

%B Computers and Geosciences %V 94 %G eng %U http://www.sciencedirect.com/science/article/pii/S0098300416301479 %& 40-47 %R http://dx.doi.org/10.1016/j.cageo.2016.06.001 %0 Journal Article %J Journal of Survey Statistics and Methodology %D 2015 %T Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples %A Schifeling, T. %A Cheng, C. %A Jerome Reiter %A Hillygus, D.C. %B Journal of Survey Statistics and Methodology %V 3 %P 265–295 %8 18 August 2015 %G eng %N 3 %0 Journal Article %J Molecular Cell Proteomics %D 2015 %T Large-Scale Interlaboratory Study to Develop, Analytically Validate and Apply Highly Multiplexed, Quantitative Peptide Assays to Measure Cancer-Relevant Proteins in Plasma. %A Susan Abbatiello %A Birgit Schilling %A D.R. Mani %A L.I. Shilling %A S.C. Hall %A B. McLean %A M. Albetolle %A S. Allen %A M. Burgess %A M.P. Cusack %A M Gosh %A V Hedrick %A J.M. Held %A H.D. Inerowicz %A A. Jackson %A H. Keshishian %A C.R. Kinsinger %A Lyssand, JS %A Makowski L %A Mesri M %A Rodriguez H %A Rudnick P %A Sadowski P %A Nell Sedransk %A Shaddox K %A Skates SJ %A Kuhn E %A Smith D %A Whiteaker, JR %A Whitwell C %A Zhang S %A Borchers CH %A Fisher SJ %A Gibson BW %A Liebler DC %A M.J. McCoss %A Neubert TA %A Paulovich AG %A Regnier FE %A Tempst, P %A Carr, SA %X

There is an increasing need in biology and clinical medicine to robustly and reliably measure tens to hundreds of peptides and proteins in clinical and biological samples with high sensitivity, specificity, reproducibility, and repeatability. Previously, we demonstrated that LC-MRM-MS with isotope dilution has suitable performance for quantitative measurements of small numbers of relatively abundant proteins in human plasma and that the resulting assays can be transferred across laboratories while maintaining high reproducibility and quantitative precision. Here, we significantly extend that earlier work, demonstrating that 11 laboratories using 14 LC-MS systems can develop, determine analytical figures of merit, and apply highly multiplexed MRM-MS assays targeting 125 peptides derived from 27 cancer-relevant proteins and seven control proteins to precisely and reproducibly measure the analytes in human plasma. To ensure consistent generation of high quality data, we incorporated a system suitability protocol (SSP) into our experimental design. The SSP enabled real-time monitoring of LC-MRM-MS performance during assay development and implementation, facilitating early detection and correction of chromatographic and instrumental problems. Low to subnanogram/ml sensitivity for proteins in plasma was achieved by one-step immunoaffinity depletion of 14 abundant plasma proteins prior to analysis. Median intra- and interlaboratory reproducibility was <20%, sufficient for most biological studies and candidate protein biomarker verification. Digestion recovery of peptides was assessed and quantitative accuracy improved using heavy-isotope-labeled versions of the proteins as internal standards. Using the highly multiplexed assay, participating laboratories were able to precisely and reproducibly determine the levels of a series of analytes in blinded samples used to simulate an interlaboratory clinical study of patient samples. Our study further establishes that LC-MRM-MS using stable isotope dilution, with appropriate attention to analytical validation and appropriate quality control measures, enables sensitive, specific, reproducible, and quantitative measurements of proteins and peptides in complex biological matrices such as plasma.

%B Molecular Cell Proteomics %V 14 %P 2357-74 %8 09/2015 %G eng %N 9 %R 10.1074/mcp.M114.047050 %0 Journal Article %J Biometrics %D 2014 %T Calibration using Constrained Smoothing with Application to Mass Spectrometry Data %A Feng, X. %A Sedransk, N. %A Xia, J-Q %B Biometrics %V 70 %P 398-408 %G eng %U http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291541-0420 %& 398 %R 10.1111/biom.12135 %0 Book Section %D 2014 %T Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches %A A. F. Karr %A J. P. Reiter %E J. Lane %E V. Stodden %E H. Nissenbaum %E S. Bender %I Cambridge University Press %G eng %& Analytical frameworks for data release: A statistical view %0 Journal Article %J Molecular & Cellular Proteomics %D 2014 %T Improved Normalization of Systematic Biases Affecting Ion Current Measurements in Label-free Proteomics Data %A P. A. Rudnick %A X. Wang %A E. Yan %A Sedransk, N. %A S. E. Stein %B Molecular & Cellular Proteomics %V 13 %P 1341-1351 %G eng %N 5 %0 Book Section %D 2014 %T The New Literacies of Online Research and Comprehension: Assessing and Preparing Students for the 21st Century with Common Core State Standards %A Sedransk, N. %A Leu, D. %A Forzani, E. %A Burlingame, C. %A Kulikowich, J. %A Coiro, J. %A Kennedy, C. %E Neuman, S. B. %E Gambrell, L.B. %I International Reading Association %P to appear %G eng %& to appear %0 Journal Article %J Analytical Chemistry %D 2014 %T QC Metrics from CPTAC Raw LC-MS/MS Data Interpreted through Multivariate Statistics %A X. Wang %A M. C. Chambers %A L. J. Vega-Montoto %A D. M. Bunk %A S. E. Stein %A D. Tabb %X
Shotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, we have created a robust multivariate statistical toolkit that accommodates the correlation structure of these metrics and allows for hierarchical relationships among data sets. The framework enables visualization and structural assessment of variability. Study 1 for the Clinical Proteomics Technology Assessment for Cancer (CPTAC), which analyzed three replicates of two common samples at each of two time points among 23 mass spectrometers in nine laboratories, provided the data to demonstrate this framework, and CPTAC Study 5 provided data from complex lysates under Standard Operating Procedures (SOPs) to complement these findings. Identification-independent quality metrics enabled the differentiation of sites and run-times through robust principalcomponents analysis and subsequent factor analysis. Dissimilarity metrics revealed outliers in performance, and a nested ANOVA model revealed the extent to which all metrics or individual metrics were impacted by mass spectrometer and run time. Study 5 data revealed that even when SOPs have been applied, instrument-dependent variability remains prominent, although it may bereduced, while within-site variability is reduced significantly. Finally, identification-independent quality metrics were shown to bepredictive of identification sensitivity in these data sets. QuaMeter and the associated multivariate framework are available from http://fenchurch.mc.vanderbilt.edu and http://homepages.uc.edu/~wang2x7/, respectively
%B Analytical Chemistry %V 86 %P 2497 − 2509 %G eng %U http://pubs.acs.org/doi/pdf/10.1021/ac4034455 %R dx.doi.org/10.1021 %0 Journal Article %J Technomet %D 2013 %T Analysis of high-dimensional structure-activity screening datasets using the optimal bit string Tree %A Zhang K %A Hughes-Oliver JM %A Young SS %K Classification %K Drug discovery %K High throughput screening %K Prediction %K QSAR %K Simulated annealing %X

We propose a new classification method called the Optimal Bit String Tree (OBSTree) to identify quantitative structure-activity relationships (QSARs). The method introduces the concept of a chromosome to describe the presence/absence context of a combination of descriptors. A descriptor set and its optimal chromosome form the splitting variable. A new stochastic searching scheme that contains a weighted sampling scheme, simulated annealing, and a trimming procedure optimizes the choice of splitting variable. Simulation studies and an application to screening monoamine oxidase inhibitors show that OBSTree is advantageous in accurately and effectively identifying QSAR rules and finding different classes of active compounds. Details of the algorithm, SAS code, and simulated and real datasets are available online as supplementary materials.

%B Technomet %V 55 %P 161-173 %G eng %R 10.1080/00401706.2012.760489 %0 Generic %D 2013 %T Combining NAEP Items into a Baseline Offline Reading Assessment %A Sedransk, N. %A W. Cui %I U. S. Department of Education %G eng %0 Journal Article %J Molecular and Cellular Proteomics %D 2013 %T Design, Implementation and Multisite Evaluation of a System Suitability Protocol for the Quantitative Assessment of Instrument Performance in Liquid Chromatography-Multiple Reaction Monitoring-MS (LC-MRM-MS) %A Abbatiello, S. %A Feng, X. %A Sedransk, N. %A Mani, DR %A Schilling, B %A Maclean, B %A Zimmerman, LJ %A Cusack, MP %A Hall, SC %A Addona, T %A Allen, S %A Dodder, NG %A Ghosh, M %A Held, JM %A Hedrick, V %A Inerowicz, HD %A Jackson, A %A Keshishian, H %A Kim, JW %A Lyssand, JS %A Riley, CP %A Rudnick, P %A Sadowski, P %A Shaddox, K %A Smith, D %A Tomazela, D %A Wahlander, A %A Waldemarson, S %A Whitwell, CA %A You, J %A Zhang, S %A Kinsinger, CR %A Mesri, M %A Rodriguez, H %A Borchers, CH %A Buck, C %A Fisher, SJ %A Gibson, BW %A Liebler, D %A Maccoss, M %A Neubert, TA %A Paulovich, A %A Regnier, F %A Skates, SJ %A Tempst, P %A Wang, M %A Carr, SA %X

Multiple reaction monitoring (MRM) mass spectrometry coupled with stable isotope dilution (SID) and liquid chromatography (LC) is increasingly used in biological and clinical studies for precise and reproducible quantification of peptides and proteins in complex sample matrices. Robust LC-SID-MRM-MS-based assays that can be replicated across laboratories and ultimately in clinical laboratory settings require standardized protocols to demonstrate that the analysis platforms are performing adequately. We developed a system suitability protocol (SSP), which employs a predigested mixture of six proteins, to facilitate performance evaluation of LC-SID-MRM-MS instrument platforms, configured with nanoflow-LC systems interfaced to triple quadrupole mass spectrometers. The SSP was designed for use with low multiplex analyses as well as high multiplex approaches when software-driven scheduling of data acquisition is required. Performance was assessed by monitoring of a range of chromatographic and mass spectrometric metrics including peak width, chromatographic resolution, peak capacity, and the variability in peak area and analyte retention time (RT) stability. The SSP, which was evaluated in 11 laboratories on a total of 15 different instruments, enabled early diagnoses of LC and MS anomalies that indicated suboptimal LC-MRM-MS performance. The observed range in variation of each of the metrics scrutinized serves to define the criteria for optimized LC-SID-MRM-MS platforms for routine use, with pass/fail criteria for system suitability performance measures defined as peak area coefficient of variation <0.15, peak width coefficient of variation <0.15, standard deviation of RT <0.15 min (9 s), and the RT drift <0.5min (30 s). The deleterious effect of a marginally performing LC-SID-MRM-MS system on the limit of quantification (LOQ) in targeted quantitative assays illustrates the use and need for a SSP to establish robust and reliable system performance. Use of a SSP helps to ensure that analyte quantification measurements can be replicated with good precision within and across multiple laboratories and should facilitate more widespread use of MRM-MS technology by the basic biomedical and clinical laboratory research communities.

%B Molecular and Cellular Proteomics %V 12 %P 2623-2639 %G eng %R 10.1074/mcp.M112.027078 %0 Journal Article %J Statistics in Medicine %D 2013 %T A New Functional Data Based Biomarker for Modeling Cardiovascular Behavior %A Zhou, Y-C. %A Sedransk, N. %K electrocardiogram %K QT interval %K ventricular repolarization %X

Cardiac safety assessment in drug development concerns the ventricular repolarization (represented by electrocardiogram (ECG) T-wave) abnormalities of a cardiac cycle, which are widely believed to be linked with torsades de pointes, a potentially life-threatening arrhythmia. The most often used biomarker for such abnormalities is the prolongation of the QT interval, which relies on the correct annotation of onset of QRS complex and offset of T-wave on ECG. A new biomarker generated from a functional data-based methodology is developed to quantify the T-wave morphology changes from placebo to drug interventions. Comparisons of T-wave-form characters through a multivariate linear mixed model are made to assess cardiovascular risk of drugs. Data from a study with 60 subjects participating in a two-period placebo-controlled crossover trial with repeat ECGs obtained at baseline and 12 time points after interventions are used to illustrate this methodology; different types of wave form changes were characterized and motivated further investigation.

%B Statistics in Medicine %V 32 %P 153-164 %G eng %R 10.1002/sim.5518 %0 Book Section %B Quality Reading Instruction in the Age of Common Core Standards %D 2013 %T The New Literacies of Online Research and Comprehension: Assessing and Preparing Students for the 21st Century with Common Core State Standards %A Leu, D. %A Sedransk, N. %E Neuman, S. %E Gambrell, L. %B Quality Reading Instruction in the Age of Common Core Standards %I International Reading Association %G eng %& 16 %0 Journal Article %J Cheminformatics %D 2012 %T ChemModLab: A web-based cheminromates modeling laboratory %A Hughes-Oliver JM %A Brooks A %A Welch W %A Khaldei MG %A Hawkins DM %A Young SS %A Patil K %A Howell GW %A Ng RT %A Chu MT %X

ChemModLab, written by the ECCR @ NCSU consortium under NIH support, is a toolbox for fitting and assessing quantitative structure-activity relationships (QSARs). Its elements are: a cheminformatic front end used to supply molecular descriptors for use in modeling; a set of methods for fitting models; and methods for validating the resulting model. Compounds may be input as structures from which standard descriptors will be calculated using the freely available cheminformatic front end PowerMV; PowerMV also supports compound visualization. In addition, the user can directly input their own choices of descriptors, so the capability for comparing descriptors is effectively unlimited. The statistical methodologies comprise a comprehensive collection of approaches whose validity and utility have been accepted by experts in the fields. As far as possible, these tools are implemented in open-source software linked into the flexible R platform, giving the user the capability of applying many different QSAR modeling methods in a seamless way. As promising new QSAR methodologies emerge from the statistical and data-mining communities, they will be incorporated in the laboratory. The web site also incorporates links to public-domain data sets that can be used as test cases for proposed new modeling methods. The capabilities of ChemModLab are illustrated using a variety of biological responses, with different modeling methodologies being applied to each. These show clear differences in quality of the fitted QSAR model, and in computational requirements. The laboratory is web-based, and use is free. Researchers with new assay data, a new descriptor set, or a new modeling method may readily build QSAR models and benchmark their results against other findings. Users may also examine the diversity of the molecules identified by a QSAR model. Moreover, users have the choice of placing their data sets in a public area to facilitate communication with other researchers; or can keep them hidden to preserve confidentiality.

%B Cheminformatics %V 11 %P 61-81 %G eng %R 10.3233/CI-2008-0016 %0 Book Section %D 2012 %T Current and emerging design and data analysis approaches %A Kulikowich, J.M. %A Sedransk, N. %I APA Handbook of Educational Psychology, American Psychological Association %G eng %0 Journal Article %J Statistics, Politics and Policy %D 2012 %T Data, Statistics and Controversy: Making Scientific Data Intelligible %A Sedransk, N. %A Young, L. %A Spiegelman, C. %K data availability %K Daubert rule %K inference verification %K meta-data %K proprietary data %K publication bias %K reuse of data %K secondary analysis %K synthetic data %X

Making published, scientific research data publicly available can benefit scientists and policy makers only if there is sufficient information for these data to be intelligible. Thus the necessary meta-data go beyond the scientific, technological detail and extend to the statistical approach and methodologies applied to these data. The statistical principles that give integrity to researchers’ analyses and interpretations of their data require documentation. This is true when the intent is to verify or validate the published research findings; it is equally true when the intent is to utilize the scientific data in conjunction with other data or new experimental data to explore complex questions; and it is profoundly important when the scientific results and interpretations are taken outside the world of science to establish a basis for policy, for legal precedent or for decision-making. When research draws on already public data bases, e.g., a large federal statistical data base or a large scientific data base, selection of data for analysis, whether by selection (subsampling) or by aggregating, is specific to that research so that this (statistical) methodology is a crucial part of the meta-data. Examples illustrate the role of statistical meta-data in the use and reuse of these public datasets and the impact on public policy and precedent.

%B Statistics, Politics and Policy %V 3 %P 1-20 %G eng %R 10.1515/2151-7509.1046 %0 Journal Article %J Significance %D 2011 %T Deming, data and observational studies. A process out of control and needing fixing %A Young SS %A Karr Alan %K observational studies %X

Any claim coming from an observational study is most likely to be wrong.? Startling, but true. Coffee causes pancreatic cancer. Type A personality causes heart attacks. Trans-fat is a killer. Women who eat breakfast cereal give birth to more boys. All these claims come from observational studies; yet when the studies are carefully examined, the claimed links appear to be incorrect. What is going wrong? Some have suggested that the scientific method is failing, that nature itself is playing tricks on us. But it is our way of studying nature that is broken and that urgently needs mending, say S. Stanley Young and Alan Karr; and they propose a strategy to fix it.

%B Significance %V 8 %P 116-120 %8 September %G eng %R 10.1111/j.1740-9713.2011.00506.x %0 Journal Article %J Statistical Science %D 2011 %T Make research data public? - Not always so simple: A Dialogue for statisticians and science editors %A Nell Sedransk %A Lawrence H. Cox %A Deborah Nolan %A Keith Soper %A Cliff Spiegelman %A Linda J. Young %A Katrina L. Kelner %A Robert A. Moffitt %A Ani Thakar %A Jordan Raddick %A Edward J. Ungvarsky %A Richard W. Carlson %A Rolf Apweiler %X

Putting data into the public domain is not the same thing as making those data accessible for intelligent analysis. A distinguished group of editors and experts who were already engaged in one way or another with the issues inherent in making research data public came together with statisticians to initiate a dialogue about policies and practicalities of requiring published research to be accompanied by publication of the research data. This dialogue carried beyond the broad issues of the advisability, the intellectual integrity, the scientific exigencies to the relevance of these issues to statistics as a discipline and the relevance of statistics, from inference to modeling to data exploration, to science and social science policies on these issues.

%B Statistical Science %V 5 %P 41-50 %G eng %R 10.1214/10-STS320 %0 Conference Paper %B Proceedings, American Society for Engineering Education %D 2011 %T Studying the Reliability and Validity of Test Scores for Mathematical and Spatial Reasoning Tasks for Engineering Students %A Pauley, L. %A Kulikowich, J. %A Sedransk, N. %A Engel, R. %B Proceedings, American Society for Engineering Education %G eng %0 Journal Article %J PACE %D 2011 %T Systematic decrements in QTc between the first and second day of contiguous daily ECG recordings under controlled conditions %A Beasley CM Jr %A Benson C %A Xia JQ %A Young SS %A Haber H %A Mitchell MI %A Loghin C %K ECG %K QT interval %X

BACKGROUND: Many thorough QT (TQT) studies use a baseline day and double delta analysis to account for potential diurnal variation in QTc. However, little is known about systematic changes in the QTc across contiguous days when normal volunteers are brought into a controlled inpatient environment.

%B PACE %V 34 %P 1116-1127 %8 April %G eng %R doi:10.1111/j.1540-8159.2011.03117.x %0 Journal Article %J PLoS1 %D 2011 %T Variance Component Analysis of a Multi-Site Study of Multiple Reaction Monitoring Measurements of Peptides and Proteins in Human Plasma %A Xia, J. %A Sedransk, N. %A Feng, X. %K analysis of Variance %K blood plasma %K experimental design %K Instrument calibration %K linear regression analysis %K peptides %K plasma proteins %K proteomic databases %X

In the Addona et al. paper (Nature Biotechnology 2009), a large-scale multi-site study was performed to quantify Multiple Reaction Monitoring (MRM) measurements of proteins spiked in human plasma. The unlabeled signature peptides derived from the seven target proteins were measured at nine different concentration levels, and their isotopic counterparts were served as the internal standards.

%B PLoS1 %V 6 %P e14590 %G eng %R 10.1371/journal.pone.0014590 %0 Journal Article %J Journal of Clinical Chemistry %D 2010 %T Analytical Validation of Proteomic-Based Multiplex Assays: A Workshop Report by the NCI-FDA Interagency Oncology Task Force on Molecular Diagnostics %A Stephan A. Carr %A Nell Sedransk. %A Henry Rodriguez %A Zivana Tezak %A Mehdi Mesri %A Daniel C. Liebler %A Susan J. Fisher %A Paul Tempst %A Tara Hiltke %A Larry G. Kessler %A Christopher R. Kinsinger %A Reena Philip %A David F. Ransohoff %A Steven J. Skates %A Fred E. Regnier %A N. Leigh Anderson %A Elizabeth Mansfield %A on behalf of the Workshop Participants %X

Clinical proteomics has the potential to enable the early detection of cancer through the development of multiplex assays that can inform clinical decisions. However, there has been some uncertainty among translational researchers and developers as to the specific analytical measurement criteria needed to validate protein-based multiplex assays. To begin to address the causes of this uncertainty, a day-long workshop titled “Interagency Oncology Task Force Molecular Diagnostics Workshop” was held in which members of the proteomics and regulatory communities discussed many of the analytical evaluation issues that the field should address in development of protein-based multiplex assays for clinical use. This meeting report explores the issues raised at the workshop and details the recommendations that came out of the day’s discussions, such as a workshop summary discussing the analytical evaluation issues that specific proteomic technologies should address when seeking US Food and Drug Administration approval.

%B Journal of Clinical Chemistry %V 56 %P 237-243 %G eng %R 10.1373/clinchem.2009.136416 %0 Conference Paper %B Proceedings, American Society for Engineering Education %D 2010 %T Constructing mathematical and spatial-reasoning measures for engineering students %A Pauley, L. %A Kulikowich, J.M. %A Sedransk, N. %A Engel, R. %B Proceedings, American Society for Engineering Education %G eng %0 Journal Article %J Statistics in Biopharmaceutical Research %D 2010 %T Marking the Ends of T-waves: Algorithms and Experts %A Zhou, Y-C. %A Sedransk, N. %K Bayesian algorithm %K Functional data analysis %K QT interval %X

The prolongation of QT interval on electrocardiogram (ECG) is the current measure for cardiac safety that is used in drug development and drug approval. Although in thorough QT studies pharmaceutical companies need to measure QT intervals for thousands of beats, they mainly rely on experts to mark the QT interval endpoints. However, selected beats of data show that the difference between two experts’ marks can easily exceed 10 milliseconds. Note that for QT analyses presented to the FDA, if the maximal difference over all time points between QT measures comparing control to drug exceeds 10 milliseconds, the question of cardiac safety requires further discussion. Indeed experts appear to use the slope and curvature of the waveform differently in judging the end of the T-wave. This article develops a Bayesian approach combining both slope and curvature information. We show that the difference between the automatic Bayesian marks and either of the experts’ marks is not statistically larger than the difference between two experts’ marks, thus this approach is successful in closely approximating the experts’ results in marking T-wave end, and it is much faster and cost efficient. Being algorithmic, our method offers the opportunity to be more consistent.

%B Statistics in Biopharmaceutical Research %V 2 %P 359-367 %G eng %R 10.1198/sbr.2009.08085 %0 Journal Article %J Statistics in Biopharmaceutical Research %D 2010 %T Marking the Ends of T-waves: Algorithms and Experts %A Zhou, Y-C. %A Sedransk, N. %K Bayesian algorithm %K Functional data analysis %K QT interval %X

The prolongation of QT interval on electrocardiogram (ECG) is the current measure for cardiac safety that is used in drug development and drug approval. Although in thorough QT studies pharmaceutical companies need to measure QT intervals for thousands of beats, they mainly rely on experts to mark the QT interval endpoints. However, selected beats of data show that the difference between two experts’ marks can easily exceed 10 milliseconds. Note that for QT analyses presented to the FDA, if the maximal difference over all time points between QT measures comparing control to drug exceeds 10 milliseconds, the question of cardiac safety requires further discussion. Indeed experts appear to use the slope and curvature of the waveform differently in judging the end of the T-wave. This article develops a Bayesian approach combining both slope and curvature information. We show that the difference between the automatic Bayesian marks and either of the experts’ marks is not statistically larger than the difference between two experts’ marks, thus this approach is successful in closely approximating the experts’ results in marking T-wave end, and it is much faster and cost efficient. Being algorithmic, our method offers the opportunity to be more consistent.

%B Statistics in Biopharmaceutical Research %V 2 %P 359-367 %G eng %R 10.1198/sbr.2009.08085 %0 Conference Paper %B Social Statistics and Higher Education Conference Volume %D 2010 %T Psychometric and Statistical Modeling for the Study of Retention and Graduation in Undergraduate Engineering %A Sedransk, N. %A Kulikowich, J.M. %A Engel, R. %A X. Wang %A Gunning, P. %A Fleming, A. %B Social Statistics and Higher Education Conference Volume %G eng %0 Journal Article %J Journal of Official Statistics %D 2010 %T Statistical Careers in US Government Science Agencies %A Sedransk, N. %K complex system models %K engineering statistics %K high-dimensional data %K History of statistics %K metrology %X

The role of statistics in those U.S. government agencies that focus on progress in science and engineering became prominent at the end of the Second World War. The success of statistics in that historical period came from the power of statistics to enable science to advance more rapidly and with great assurance in the interpretation of experimental results. Over the past three quarters of a century, technology has changed both the practice of science and the practice of statistics. However, the comparative advantage of statistics still rests in the ability to achieve greater precision with fewer errors and a deeper understanding. Examples illustrate some of the challenges that complex science now presents to statisticians, demanding both creativity and technical skills.

%B Journal of Official Statistics %V 26 %P 443-453 %G eng %0 Journal Article %J Annals of Applied Statistics %D 2009 %T Functional Data Analytic Approach of Modeling ECG T-wave shape to Measure Cardiovascular Behavior %A Zhou, Y-C. %A Sedransk, N. %K cardiac safety %K ECG T-wave %K Functional data analysis %K QT interval %K T-wave morphology %X

The T-wave of an electrocardiogram (ECG) represents the ventricular repolarization that is critical in restoration of the heart muscle to a pre-contractile state prior to the next beat. Alterations in the T-wave reflect various cardiac conditions; and links between abnormal (prolonged) ventricular repolarization and malignant arrhythmias have been documented. Cardiac safety testing prior to approval of any new drug currently relies on two points of the ECG waveform: onset of the Q-wave and termination of the T-wave; and only a few beats are measured. Using functional data analysis, a statistical approach extracts a common shape for each subject (reference curve) from a sequence of beats, and then models the deviation of each curve in the sequence from that reference curve as a four-dimensional vector. The representation can be used to distinguish differences between beats or to model shape changes in a subject’s T-wave over time. This model provides physically interpretable parameters characterizing T-wave shape, and is robust to the determination of the endpoint of the T-wave. Thus, this dimension reduction methodology offers the strong potential for definition of more robust and more informative biomarkers of cardiac abnormalities than the QT (or QT corrected) interval in current use.

%B Annals of Applied Statistics %V 3 %P 1382-1402 %G eng %R 10.1214/09-AOAS273 %0 Journal Article %J Annals of Applied Statistics %D 2009 %T Functional Data Analytic Approach of Modeling ECG T-wave shape to Measure Cardiovascular Behavior %A Zhou, Y-C. %A Sedransk, N. %K cardiac safety %K ECG T-wave %K Functional data analysis %K QT interval %K T-wave morphology %X

The T-wave of an electrocardiogram (ECG) represents the ventricular repolarization that is critical in restoration of the heart muscle to a pre-contractile state prior to the next beat. Alterations in the T-wave reflect various cardiac conditions; and links between abnormal (prolonged) ventricular repolarization and malignant arrhythmias have been documented. Cardiac safety testing prior to approval of any new drug currently relies on two points of the ECG waveform: onset of the Q-wave and termination of the T-wave; and only a few beats are measured. Using functional data analysis, a statistical approach extracts a common shape for each subject (reference curve) from a sequence of beats, and then models the deviation of each curve in the sequence from that reference curve as a four-dimensional vector. The representation can be used to distinguish differences between beats or to model shape changes in a subject’s T-wave over time. This model provides physically interpretable parameters characterizing T-wave shape, and is robust to the determination of the endpoint of the T-wave. Thus, this dimension reduction methodology offers the strong potential for definition of more robust and more informative biomarkers of cardiac abnormalities than the QT (or QT corrected) interval in current use.

%B Annals of Applied Statistics %V 3 %P 1382-1402 %G eng %R 10.1214/09-AOAS273 %0 Journal Article %J Journal of Official Statistics %D 2009 %T Privacy-preserving analysis of vertically partitioned data using secure matrix products %A A. F. Karr %A X. Lin %A J. P. Reiter %A A. P. Sanil %B Journal of Official Statistics %V 25 %P 125-138 %G eng %0 Generic %D 2009 %T Task Force Report on Computer Adaptive Testing %A Sedransk, N. %I National Center for Education Statistics %G eng %0 Journal Article %J Proceedings - Royal Society B %D 2008 %T Cereal-induced gender selection? Most likely a multiple testing false positive %A Young SS %A Bang H %A Oktay K %B Proceedings - Royal Society B %V 276 %P 1211-1212 %G eng %U http://rspb.royalsocietypublishing.org/content/276/1660/1211.full %R 10.1098/rspb.2008.1405 %0 Book Section %B Terrorism Informatics %D 2008 %T Homeland Insecurity %A Stephen E. Fienberg %E Chen, Hsinchun %E Reid, Edna %E Sinai, Joshua %E Silke, Andrew %E Ganor, Boaz %X

Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. There have also been claims that prospective datamining could be used to find the “signature” of terrorist cells embedded in larger networks. We present an overview of why the public has concerns about such activities and describe some proposals for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literatures on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of “selective revelation” and their confidentiality implications.

%B Terrorism Informatics %S Integrated Series In Information Systems %I Springer US %V 18 %P 197-218 %@ 978-0-387-71612-1 %G eng %U http://dx.doi.org/10.1007/978-0-387-71613-8_10 %R 10.1007/978-0-387-71613-8_10 %0 Journal Article %J Journal of National Cancer Institute %D 2008 %T Low-fat dietary pattern and cancer incidence in the Women’s Health Initiative Dietary Modification Randomized Controlled Trial %A Young SS %B Journal of National Cancer Institute %V 100 %P 284 %G eng %U http://jnci.oxfordjournals.org/content/100/4/284.1.extract# %R 10.1093/jnci/djm309 %0 Journal Article %J Q. Applied Mathematics %D 2008 %T Sensitivity to noise variance in a social network dynamics model %A H. T. Banks %A H. K. Nguyen %A J. R. Samuels, Jr. %A A. F. Karr %B Q. Applied Mathematics %V 66 %P 233-247 %G eng %0 Journal Article %J Annals of Statistics %D 2007 %T Computer Model Validation with Functional Output %A M.J. Bayarri %A J. Berger %A Garcia-Donato, G. %A Liu, F. %A R. Paulo %A Jerome Sacks %A Palomo, J. %A Walsh, D. %A J. Cafeo %A Parthasarathy, R. %B Annals of Statistics %V 35 %P 1874-190 %G eng %N 5 %0 Journal Article %J Journal of Chemical Information and Modeling %D 2007 %T Exploration of cluster structure-activity relationship analysis in efficient high-throughput screening %A Wang, X. S. %A Salloum, G.A. %A Chipman, H.A. %A Welch, W.J. %A Young, S.S. %X

Sequential screening has become increasingly popular in drug discovery. It iteratively builds quantitative structure-activity relationship (QSAR) models from successive high-throughput screens, making screening more effective and efficient. We compare cluster structure-activity relationship analysis (CSARA) as a QSAR method with recursive partitioning (RP), by designing three strategies for sequential collection and analysis of screening data. Various descriptor sets are used in the QSAR models to characterize chemical structure, including high-dimensional sets and some that by design have many variables not related to activity. The results show that CSARA outperforms RP. We also extend the CSARA method to deal with a continuous assay measurement.

%B Journal of Chemical Information and Modeling %V 47 %P 1206-1214 %G eng %R 10.1021/ci600458n %0 Conference Paper %B Bulletin of International Statistics Institute %D 2007 %T Secure logistic regression with distributed databases %A A. F. Karr %A S. E. Fienberg %A Y. Nardi %A A. Slavkovic %B Bulletin of International Statistics Institute %G eng %0 Journal Article %J Journal of Data Science %D 2007 %T Statistics in metrology: International key comparisons and interlaboratory studies %A Sedransk, N. %A Rukhin, A. %B Journal of Data Science %V 5 %P 393-412 %G eng %0 Journal Article %J IEEE TRANSACTIONS ON SOFTWARE ENGINEERING %D 2007 %T Techniques for classifying executions of deployed software to support software engineering tasks %A Murali Haran %A Alan Karr %A Michael Last %A Alessandro Orso %A Adam A. Porter %A Ashish Sanil %A Sandro Fouché %B IEEE TRANSACTIONS ON SOFTWARE ENGINEERING %V 33 %P 287-304 %G eng %0 Journal Article %J Statistical Methodology %D 2006 %T Data quality: A statistical perspective %A Alan F. Karr %A Ashish P. Sanil %A David L. Banks %B Statistical Methodology %V 3 %P 137–173 %G eng %0 Journal Article %J The American Statistician %D 2006 %T A framework for evaluating the utility of data altered to protect confidentiality %A A. F. Karr %A C. N. Kohnen %A A. Oganyan %A J. P. Reiter %A A. P. Sanil %B The American Statistician %V 60 %P 224-232 %G eng %0 Journal Article %J Journal of Chemical Information and Modeling %D 2006 %T PharmID: Pharmacophore identification using Gibbs sampling %A Feng J. %A Sanil A %A Young SS %X

The binding of a small molecule to a protein is inherently a 3D matching problem. As crystal structures are not available for most drug targets, there is a need to be able to infer from bioassay data the key binding features of small molecules and their disposition in space, the pharmacophore. Fingerprints of 3D features and a modification of Gibbs sampling to align a set of known flexible ligands, where all compounds are active, are used to discern possible pharmacophores. A clique detection method is used to map the features back onto the binding conformations. The complete algorithm is described in detail, and it is shown that the method can find common superimposition for several test data sets. The method reproduces answers very close to the crystal structure and literature pharmacophores in the examples presented. The basic algorithm is relatively fast and can easily deal with up to 100 compounds and tens of thousands of conformations. The algorithm is also able to handle multiple binding mode problems, which means it can superimpose molecules within the same data set according to two different sets of binding features. We demonstrate the successful use of this algorithm for multiple binding modes for a set of D2 and D4 ligands.

%B Journal of Chemical Information and Modeling %V 46 %P 1352-1359 %G eng %R 10.1021/ci050427v %0 Journal Article %J Metrologia %D 2006 %T Statistical analysis for multiple artifact problem in key comparisons with linear trends %A Zhang, N.-F. %A Strawderman, W. %A Liu, H.-k. %A Sedransk, N. %K computational physics %K instrumentation and measurement %X

A statistical analysis for key comparisons with linear trends and multiple artefacts is proposed. This is an extension of a previous paper for a single artefact. The approach has the advantage that it is consistent with the no-trend case. The uncertainties for the key comparison reference value and the degrees of equivalence are also provided. As an example, the approach is applied to key comparison CCEM–K2.

%B Metrologia %V 43 %P 21-26 %G eng %R 10.1088/0026-1394/43/1/003 %0 Journal Article %J Technom %D 2006 %T Statistical design of pools using optimal coverage and minimal collision %A Remlinger KS %A Hughes-Oliver JM %A Young SS %A Lam RL %K Pharmaceutical industry %K Pooled data %K Pooling %K Screening %K Throughput %X

The screening of large chemical libraries to identify new compounds can be simplified by testing compounds in pools. Two criteria for designing pools are considered: optimal coverage of the chemical space and minimal collision between compounds. Four pooling designs are applied to a public database and evaluated by determining how well the design criteria are met and whether the methods are able to find diverse active compounds. While one pool was outstanding, all designed pools outperformed randomly designed pools.

%B Technom %V 48 %P 133-143 %G eng %R 10.1198/004017005000000481 %0 Conference Proceedings %B Proc. ACM SIGSOFT Symposium Foundations of Software Engineering 2005 %D 2005 %T Applying classification techniques to remotely-collected program execution data %A A. F. Karr %A M. Haran %A A. A. Porter %A A. Orso %A A. P. Sanil %B Proc. ACM SIGSOFT Symposium Foundations of Software Engineering 2005 %I ACM %C New York %G eng %0 Journal Article %J Statistical Science %D 2005 %T Data dissemination and disclosure limitation in a world without microdata: A risk-utility framework for remote access analysis servers %A A. F. Karr %A J. Feng %A X. Lin %A J. P. Reiter %A A. P. Sanil %A Young, S.S. %B Statistical Science %V 20 %P 163-177 %G eng %0 Conference Paper %B Bull. International Statistical Inst., 55th Session %D 2005 %T Data quality and data confidentiality for microdata: implications and strategies %A A. F. Karr %A A. P. Sanil %B Bull. International Statistical Inst., 55th Session %G eng %0 Journal Article %J Journal of Official Statistics %D 2005 %T Data Swapping as a Decision Problem %A Shanti Gomatam %A Alan F. Karr %A Ashish P. Sanil %K categorical data %K data confidentiality %K Data swapping %K data utility %K disclosure risk %K risk-utility frontier %X

We construct a decision-theoretic formulation of data swapping in which quantitative measures of disclosure risk and data utility are employed to select one release from a possibly large set of candidates. The decision variables are the swap rate, swap attribute(s) and, possibly, constraints on the unswapped attributes. Risk–utility frontiers, consisting of those candidates not dominated in (risk, utility) space by any other candidate, are a principal tool for reducing the scale of the decision problem. Multiple measures of disclosure risk and data utility, including utility measures based directly on use of the swapped data for statistical inference, are introduced. Their behavior and resulting insights into the decision problem are illustrated using data from the U.S. Current Population Survey, the well-studied “Czech auto worker data” and data on schools and administrators generated by the U.S. National Center for Education Statistics.

%B Journal of Official Statistics %V 21 %P 635–655 %G eng %0 Journal Article %J Biostatistics %D 2005 %T Sample size calculation for multiple testing in microarray data analysis %A Jung SH %A Bang H %A Young SS %B Biostatistics %V 6 %P 157-169 %G eng %0 Journal Article %J J. Computer-Aided Molecular Design %D 2005 %T Secure analysis of distributed chemical databases without data integration %A Alan F. Karr %A Jun Feng %A Xiaodong Lin %A Ashish P. Sanil %A S. Stanley Young %A Jerome P. Reiter %B J. Computer-Aided Molecular Design %V 19 %P 739-747 %8 November %G eng %0 Journal Article %J J. Computational and Graphical Statist %D 2005 %T Secure Regression on Distributed Databases %A Alan F. Karr %A Alan F. Karr %A Xiaodong Lin %A Xiaodong Lin %A Ashish P. Sanil %A Ashish P. Sanil %A Jerome P. Reiter %A Jerome P. Reiter %B J. Computational and Graphical Statist %V 14 %P 263–279 %G eng %0 Conference Paper %B In Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication %D 2005 %T Secure statistical analysis of distributed databases using partially trusted third parties. Manuscript in preparation %A Alan F. Karr %A Xiaodong Lin %A Ashish P. Sanil %A Jerome P. Reiter %E D. Olwell %E A. G.Wilson %E G. Wilson %B In Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication %I Springer–Verlag %C New York %G eng %0 Conference Paper %B Proceedings of 2004 Workshop on Verification & Validation of Computer Models of High-consequence Engineering Systems %D 2005 %T A statistical meteorologist looks at computational system models %A Sedransk, N. %B Proceedings of 2004 Workshop on Verification & Validation of Computer Models of High-consequence Engineering Systems %G eng %0 Generic %D 2005 %T Title IX Data Collection: Technical Manual for Developing the User’s Guide %A A. F. Karr %A A. P. Sanil %I National Institute of Statistical Sciences %G eng %0 Journal Article %J Chance %D 2004 %T Analysis of integrated data without data integration %A A. F. Karr %A X. Lin %A J. P. Reiter %A A. P. Sanil %B Chance %V 17 %P 26-29 %G eng %0 Journal Article %J Current Opinion in Drug Discovery & Development %D 2004 %T Design of diversity and focused combinatorial libraries in drug discovery %A Young SS %A Ge N %B Current Opinion in Drug Discovery & Development %V 7 %P 318-324 %G eng %0 Journal Article %J Chance %D 2004 %T Disclosure Risk vs Data Utility: The R-U Confidentiality Map %A Duncan, George T. %A Stokes, S. Lynne %B Chance %7 3 %V 17 %P 16-20 %G eng %R 0.1080/09332480.2004.10554908 %0 Conference Paper %B Proc. Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining %D 2004 %T Privacy preserving regression modelling via distributed computation %A A. F. Karr %A X. Lin %A J. P. Reiter %A A. P. Sanil %B Proc. Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining %P 677-682 %G eng %0 Conference Paper %B Proc. dg.o 2004, National Conference on Digital Government Research %D 2004 %T Regression on distributed databases via secure multi-party computation %A A. F. Karr %A X. Lin %A J. P. Reiter %A A. P. Sanil %B Proc. dg.o 2004, National Conference on Digital Government Research %P 405-406 %G eng %0 Conference Paper %B ASA Proceedings 2004 %D 2004 %T Secure regression for vertically partitioned, partially overlapping data %A A. F. Karr %A C. N. Kohnen %A X. Lin %A J. P. Reiter %A A. P. Sanil %B ASA Proceedings 2004 %G eng %0 Conference Paper %B Proc. dg.o 2003, National Conference on Digital Government Research %D 2003 %T Data swapping: A risk–utility framework and Web service implementation %A A. F. Karr %A S. Gomatam %A C. Liu %A A. P. Sanil %B Proc. dg.o 2003, National Conference on Digital Government Research %I Digital Government Research Center %P 245-248 %G eng %0 Journal Article %J Journal of Chemistry Information and Computer Sciences %D 2003 %T Design of diverse and focused combinatorial libraries using an alternating algorithm %A Young SS %A Wang M %A Gu F %B Journal of Chemistry Information and Computer Sciences %V 43 %P 1916-1921 %G eng %0 Journal Article %J Chance %D 2003 %T Exploring blood spectra for signs of ovarian cancer %A Hawkins, D.M. %A Wolfinger, R.D. %A L. Liu %A Young. S.S. %B Chance %V 16 %P 19-23 %G eng %R 10.1080/09332480.2003.10554870 %0 Thesis %D 2003 %T Methods for Calibrating and Validating Stochastic Micro-Simulation Traffic Models %A N. Siddiqui %I North Carolina State University %C Raleigh %V Masters %G eng %9 masters %0 Journal Article %J J. Statist. Software %D 2003 %T NISS WebSwap: A Web Service for Data Swapping %A Ashish Sanil %A Shanti Gomatam %A Alan F. Karr %B J. Statist. Software %V 8 %P 2003 %G eng %0 Journal Article %J STATISTICS AND COMPUTING %D 2003 %T Preserving confidentiality of high-dimensional tabular data: Statistical and computational issues %A Adrian Dobra %A Alan F. Karr %A Ashish P. Sanil %B STATISTICS AND COMPUTING %V 8 %P 363–370 %G eng %0 Journal Article %J Proceedings of the National Academy of Sciences %D 2003 %T Robust singular value decomposition analysis of microarray data %A Liu L %A Hawkins DM %A Ghosh S %A Young SS %B Proceedings of the National Academy of Sciences %V 100 %P 13167-13172 %G eng %0 Journal Article %J Comm. ACM %D 2003 %T Table servers protect confidentiality in tabular data releases %A Alan F. Karr %A Adrian Dobra %A Ashish P. Sanil %B Comm. ACM %V 46 %P 57–58 %G eng %0 Book Section %D 2002 %T Advances in Digital Government %A A. F. Karr %A J. Lee %A A. P. Sanil %A J. Hernandez %A S. Karimi %A K. Litwin %E E. Elmagarmid %E W. M. McIver %X

The Internet provides an efficient mechanism for Federal agencies to distribute their data to the public. However, it is imperative that such data servers have built-in mechanisms to ensure that confidentiality of the data, and the privacy of individuals or establishments represented in the data, are not violated. We describe a prototype dissemination system developed for the National Agricultural Statistics Service that uses aggregation of adjacent geographical units as a confidentiality-preserving technique. We also outline a Bayesian approach to statistical analysis of the aggregated data.

%I Kluwer %C Boston %P 181-196 %@ 978-1-4020-7067-9 %G eng %& Web-based systems that disseminate information from data but preserve confidentiality %R 10.1007/0-306-47374-7_11 %0 Book Section %B Trade, Networks and Hierarchies %D 2002 %T Combined Model of Interregional Commodity Flows on a Transportation Network %A Boyce, David %E Hewings, Geoffrey J.D. %E Sonis, Michael %E Boyce, David %X

This chapter is motivated by two ongoing research objectives of the author. The first concerns models of flows on transportation networks. Whether the subject is personal travel or freight transportation, representation of the transportation network is necessary to determine realistically interzonal/interregional travel/transportation costs. The methodological effort required to achieve such results is nontrivial, but the issues raised by such an attempt are enlightening and worthwhile. This insight is demonstrated once more by the models considered here.

%B Trade, Networks and Hierarchies %S Advances in Spatial Science %I Springer Berlin Heidelberg %P 29-40 %@ 978-3-642-07712-8 %G eng %U http://dx.doi.org/10.1007/978-3-662-04786-6_3 %R 10.1007/978-3-662-04786-6_3 %0 Journal Article %J Journal of Chemical Information and Computer Science %D 2002 %T The construction and assessment of a statistical model for the prediction of protein assay data %A Jennifer Pittman Clarke %A Jerome Sacks %A S. Stanley Young %X

The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

%B Journal of Chemical Information and Computer Science %V 42 %P 729-741 %G eng %R 10.1021/ci0103828 %0 Journal Article %J Journal of Chemical Information and Computer Science %D 2002 %T The construction and assessment of a statistical model for the prediction of protein assay data %A Jennifer Pittman Clarke %A Jerome Sacks %A S. Stanley Young %X

The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

%B Journal of Chemical Information and Computer Science %V 42 %P 729-741 %G eng %R 10.1021/ci0103828 %0 Conference Proceedings %B Workshop on Foundations for Modeling and Simulation %D 2002 %T A Framework for Validating Computer Models %A M.J. Bayarri %A J. Berger %A D. Higdon %A M. Kottas %A R. Paulo %A J. Sacks %A J. Cafeo %A J. Cavendish %A C. Lin %A J. Tu %B Workshop on Foundations for Modeling and Simulation %I Society for Computer Simulation %8 2002 %G eng %0 Conference Paper %B Proc. dgo.2002, National Conference on Digital Government Research %D 2002 %T Optimal tabular releases from confidential data %A A. F. Karr %A A. Dobra %A A. P. Sanil %B Proc. dgo.2002, National Conference on Digital Government Research %G eng %0 Journal Article %J ASCE J. Materials %D 2002 %T Permeability of Cracked Steel Fiber–Reinforced Concrete %A Julie Rapoport Corina–Maria %A Surendra P. Shah %A Bruce Ankenman %A Alan F. Karr %X

This research explores the relationship between permeability and crack width in cracked, steel fiber–reinforced concrete. In addition, it inspects the influence of steel fiber reinforcement on concrete permeability. The feedback–controlled splitting tension test (also known as the Brazilian test) is used to induce cracks of up to 500 microns (0.02in) in concrete specimens without reinforcement, and with steel fiber reinforcement volumes of both 0.5% and 1%. The cracks relax after induced cracking. The steel fibers decrease permeability of specimens with relaxed cracks larger than 100 microns. Keywords: permeability, fiber-reinforced concrete, steel fibers 1 NSF Center for Advanced Cement–Based Materials, Northwestern University, 2145 Sheridan Rd., Evanston, IL, 60208–4400, USA 2 Saint Gobain Technical Fabrics, P. Box 728, St. Catharines, Ontario, L2R-6Y3, Canada 3 Department of Industrial Engineering and Management Science, Northwestern University, 2145 Sheridan Rd., Evanston, IL.

%B ASCE J. Materials %V 14 %P 355–358 %G eng %0 Journal Article %J Int. Journal of Uncertainty, Fuzziness and Knowledge Based Systems %D 2002 %T Software Systems for Tabular Data Releases %A Adrian Dobra %A Alan F. Karr %A Ashish P. Sanil %A Stephen E. Fienberg %B Int. Journal of Uncertainty, Fuzziness and Knowledge Based Systems %V 10 %P 529-544 %G eng %0 Journal Article %J Journal of Transportation and Statistics %D 2002 %T Statistically-Based Validation of Computer Simulation Models in Traffic Operations and Management %A Jerome Sacks %A Nagui M. Rouphail %A B. Brian Park %A Piyushimita Thakuriah %K Advanced traffic management systems %K computer simulation %K CORSIM %K model validation %K transportation policy %X

The process of model validation is crucial for the use of computer simulation models in transportation policy, planning, and operations. This article lays out obstacles and issues involved in performing a validation. We describe a general process that emphasizes five essential ingredients for validation: context, data, uncertainty, feedback, and prediction. We use a test bed to generate specific (and general) questions as well as to give concrete form to answers and to the methods used in providing them. The traffic simulation model CORSIM serves as the test bed; we apply it to assess signal-timing plans on a street network of Chicago. The validation process applied in the test bed demonstrates how well CORSIM can reproduce field conditions, identifies flaws in the model, and shows how well CORSIM predicts performance under new (untried) signal conditions. We find that CORSIM, though imperfect, is effective with some restrictions in evaluating signal plans on urban networks.

%B Journal of Transportation and Statistics %V 5 %G eng %0 Journal Article %J Transportation Research Record C %D 2002 %T Variability of travel times on arterial streets: effects of signals and volume %A A. F. Karr %A T.L. Graves %A A. Mockus %A P. Schuster %B Transportation Research Record C %V 10 %P 000-000 %G eng %0 Journal Article %J INTERACTIONS %D 2002 %T Visualizing Software Changes %A Stephen G. Eick %A Paul Schuster %A Audris Mockus %A Todd L. Graves %A Alan F. Karr %B INTERACTIONS %V 17 %P 29–31 %G eng %0 Journal Article %J Res. Official Statist %D 2001 %T Analysis of aggregated data in survey sampling with application to fertilizer/pesticide usage surveys %A Jaeyong Lee %A Christopher Holloman %A Alan F. Karr %A Ashish P. Sanil %X

In many cases, the public release of survey or census data at fine geographical resolution (for example, counties) may endanger the confidentiality of respondents. A strategy for such cases is to aggregate neighboring regions into larger units that satisfy confidentiality requirements. An aggregation procedure employed in a prototype system for the US National Agricultural Statistics Service is used as context to investigate the impact of aggregation on statistical properties of the data. We propose a Bayesian simulation approach for the analysis of such aggregated data. As a consequence, we are able to specify the type of additional information (such as certain sample sizes) that needs to be released in order to enable the user to perform meaningful analyses with the aggregated data.

%B Res. Official Statist %V 4 %P 11–6 %G eng %0 Journal Article %J Transportation Research Record %D 2001 %T Assessment of Stochastic Signal Optimization Method Using Microsimulation %A Byungkyu Park %A Nagui M. Rouphail %A Jerome Sacks %X

A stochastic signal optimization method based on a genetic algorithm (GA-SOM) that interfaces with the microscopic simulation program CORSIM is assessed. A network in Chicago consisting of nine signalized intersections is used as an evaluation test bed. Taking CORSIM as the best representation of reality, the performance of the GA-SOM plan sets a ceiling on how good any (fixed) signal plan can be. An important aspect of this approach is its accommodations of variability. Also discussed is the robustness of an optimal plan under changes in demand. This benchmark is used to assess the best signal plan generated by TRANSYT-7F (T7F), Version 8.1, from among 12 reasonable strategies. The performance of the best T7F plan falls short of the benchmark on several counts, reflecting the need to account for variability in the highly stochastic system of traffic operations, which is not possible under the deterministic conditions intrinsic to T7F. As a sidelight, the performance of the GA-SOM plan within T7F is also computed and it is found to perform nearly as well as the optimum T7F plan.

%B Transportation Research Record %V 1748 %P 40-45 %G eng %R 10.3141/1748-05 %0 Conference Paper %B Concrete Under Severe Conditions, Proceedings of the Third International Conference on Concrete Under Severe Conditions %D 2001 %T Combined effect of cracking and water permeability of fiber-reinforced concrete %A A. F. Karr %A C.-M. Aldea %A J. Rapoport %A S. P. Shah %B Concrete Under Severe Conditions, Proceedings of the Third International Conference on Concrete Under Severe Conditions %P 71?78 %G eng %0 Journal Article %J IEEE Computer %D 2001 %T Disseminating information but protecting confidentiality %A A. F. Karr %A J. Hernandez %A S. Karimi %A J. Lee %A K. Litwin %A A. Sanil %B IEEE Computer %V 34 %P 36?37 %G eng %0 Conference Paper %B 2001 International Symposium on Advanced Highway Technology %D 2001 %T A Framework for Traffic Simulation Model Validation Procedure Using CORSIM as a Test-Bed %A Park, B. %A N. M. Rouphail %A J. Sacks %B 2001 International Symposium on Advanced Highway Technology %8 2001 %G eng %0 Journal Article %J Statistica Sinica %D 2001 %T Propriety of posteriors with improper priors in hierarchical linear mixed models %A Sun,Dongchu %A Tsuakawa, R. K. %A Z. He %B Statistica Sinica %V 2 %P 77-95 %G eng %0 Conference Paper %B Advances in Digital Government. Kluwer, Amserdam %D 2001 %T Web-Based Systems that Disseminate Information but Protect Confidential Data %A Alan F. Karr %A Ashish P. Sanil %B Advances in Digital Government. Kluwer, Amserdam %I Kluwer %G eng %0 Generic %D 2001 %T Workshop Report: Affiliates Workshop on Data Quality %A A. F. Karr %A A. P. Sanil %A J. Sacks %A A. Elmagarmid %I National Institute of Statistical Sciences %G eng %0 Generic %D 2001 %T Workshop Report: Workshop on Statistics and Information Technology %A A. F. Karr %A J. Lee %A A. P. Sanil %I National Institute of Statiatical Sciences %G eng %0 Book Section %B Molecular Modeling and Prediction of Bioactivity %D 2000 %T Analysis of a Large, High-Throughput Screening Data Using Recursive Partitioning %A Young, S.Stanley %A Jerome Sacks %E Gundertofte, Klaus %E Jørgensen, Flemming Steen %X

As biological drug targets multiply through the human genome project and as the number of chemical compounds available for screening becomes very large, the expense of screening every compound against every target becomes prohibitive. We need to improve the efficiency of the drug screening process so that active compounds can be found for more biological targets and turned over to medicinal chemists for atom-by-atom optimization. We create a method for analysis of the very large, complex data sets coming from high throughput screening, and then integrate the analysis with the selection of compounds for screening so that the structure-activity rules derived from an initial compound set can be used to suggest additional compounds for screening. Cycles of screening and analysis become sequential screening rather than the mass screening of all available compounds. We extend the analysis method to deal with multivariate responses. Previously, a screening campaign might screen hundreds of thousands of compounds; sequential screening can cut the number of compounds screened by up to eighty percent. Sequential screening also gives SAR rules that can be used to mathematically screen compound collections or virtual chemical libraries.

%B Molecular Modeling and Prediction of Bioactivity %I Springer US %P 149-156 %@ 978-1-4613-6857-1 %G eng %U http://dx.doi.org/10.1007/978-1-4615-4141-7_17 %R 10.1007/978-1-4615-4141-7_17 %0 Journal Article %J Statistics in Medicine %D 2000 %T Bayesian Analysis of Mortality Rates with Disease Maps %A Sun,Dongchu %A Tsuakawa, R. K. %A Kim, H. %A Z. He %X

This article summarizes our research on estimation of age-specific and age-adjusted mortality rates for chronic obstructive pulmonary disease (COPD) for white males. Our objectives are more precise and informative displays (than previously available) of geographic variation of the age-specific mortality rates for COPD, and investigation of the relationships between the geographic variation in mortality rates and the corresponding variation in selected covariates. For a given age class, our estimates are displayed in a choropleth map of mean rates. We develop a variation map that identifies the geographical areas where inferences are reliable. Here, the variation is measured by considering a set of maps produced using samples from the posterior distribution of the population mortality rates. Finally, we describe the spatial patterns in the age-specific maps and relate these to patterns in potential explanatory covariates such as smoking rate, annual rainfall, population density, elevation, and measures of air quality.

%B Statistics in Medicine %V 19 %P 2015-2035 %G eng %0 Conference Paper %B Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining %D 2000 %T Defection detection: Using online activity profiles to predict ISP customer vulnerability %A A. F. Karr %A N. Raghavan %A R. Bell %A M. Schonlau %A D. Pregibon %B Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining %P 506?515 %G eng %R 10.1145/347090.347193 %0 Conference Proceedings %B XI Pan American Conference in Traffic and Transportation Engineering %D 2000 %T Direct Signal Timing Optimization: Strategy Development and Results %A Rouphail, N. %A Park, B. %A J. Sacks %B XI Pan American Conference in Traffic and Transportation Engineering %P 19-23 %8 2000 %G eng %0 Journal Article %J ACI Materials Journal %D 2000 %T Estimation of water flow through cracked concrete under load %A A. F. Karr %A C.-M. Aldea %A M. Ghandehari %A S. P. Shah %X

This research studied the relationship between cracking and water permeability of normal-strength concrete under load and compared the experimental results with theoretical models. A feedback-controlled wedge splitting test was used to generate width-controlled cracks. Speckle interferometry was used to record the cracking history. Water permeability of the loaded specimens was evaluated by a low-pressure water permeability test at the designed crack mouth opening displacements (CMODs). Water permeability results were compared with those previously obtained for unloaded specimens for which cracks were induced by a feedback-controlled splitting tension test. The experimental results indicate that water permeability of cracked material significantly increases with increasing crack width. The flow for the same cracking level is repeatable regardless of the procedure used for inducing the cracks. No direct relationship between water flow and crack length was observed, whereas clear relationships existed between CMOD or crack area and flow characteristics. Experimentally measured flow was compared with theoretical models of flow through cracked rocks with parallel walls and a correction factor accounting for the tortuosity of the crack was determined. Calculated flow through cracks induced by a wedge-splitting test provided an acceptable approximation of the measured flow.

%B ACI Materials Journal %V 97 %P 567?575 %G eng %0 Journal Article %J ASTM Cement, Concrete and Aggregates %D 2000 %T Experimental and statistical study of chloride permeability of cracked high strength concrete %A A. F. Karr %A C.-M. Aldea %A J.D. Picka %A S. P. Shah %A S.S. Jaiswal %A T. Igusa %X

Within any cast cylinder of concrete, the coarse aggregate will tend to be inhomogeneously distributed. This variability may arise as a result of segregation caused by gravity or as a result of the wall effect that is caused by the inability of the aggregate to penetrate the walls of the mold. Using methods from image analysis, stereology, and statistics, local estimates of aggregate inhomogeniety are defined that quantify phenomena that have been qualitatively described in the past. These methods involve modification of the two-dimensional images to prepare them for analysis, as well as simple diagnostic statistics for determining the presence of a wall effect. While the techniques presented herein are developed specifically for cast cylinders, they can be generalized to other cast or cored concrete specimens.

%B ASTM Cement, Concrete and Aggregates %V 22 %P 000-000 %8 December %G eng %R 10.1520/CCA10473J %0 Conference Paper %B Proc. 12th Engrg. Mechanics Conf %D 2000 %T Impact of the interfacial transition zone on the chloride permeability of concrete %A A. F. Karr %A S. P. Shah %A S.S. Jaiswal %A B.E. Ankenman %A J.D. Picka %A T. Igusa %B Proc. 12th Engrg. Mechanics Conf %P 1134-1137 %G eng %0 Journal Article %J IEEE Transportation Software Engineering %D 2000 %T Predicting fault incidence using software change history %A A. F. Karr %A S. G. Eick %A T.L. Graves %A J. S. Marron %A H. Siy %K aging %K change history %K degradation %K management of change %K software fault tolerance %K software maintenance %X

This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight

%B IEEE Transportation Software Engineering %V 26 %P 653?661 %G eng %R 10.1109/32.859533 %0 Journal Article %J Cement Concrete and Aggregates %D 2000 %T Quantitative description of coarse aggregate volume fraction gradients %A A. F. Karr %A S.S. Jaiswal %A T. Igusa %A J.D. Picka %A S. P. Shah %X

Within any cast cylinder of concrete, the coarse aggregate will tend to be inhomogeneously distributed. This variability may arise as a result of segregation caused by gravity or as a result of the wall effect that is caused by the inability of the aggregate to penetrate the walls of the mold. Using methods from image analysis, stereology, and statistics, local estimates of aggregate inhomogeniety are defined that quantify phenomena that have been qualitatively described in the past. These methods involve modification of the two-dimensional images to prepare them for analysis, as well as simple diagnostic statistics for determining the presence of a wall effect. While the techniques presented herein are developed specifically for cast cylinders, they can be generalized to other cast or cored concrete specimens.

%B Cement Concrete and Aggregates %V 22 %P 151-159 %G eng %R 10.1520/CCA10473J %0 Book Section %B Generalized Linear Models: A Bayesian Perspective %D 2000 %T Random effects in generalized linear mixed models (GLMMs) %A Sun,Dongchu %A Speckman, Paul %A Tsutakawa, R. K. %B Generalized Linear Models: A Bayesian Perspective %I Marcel dekker, Inc. %P 23-40 %G eng %0 Journal Article %J Environmetrics %D 2000 %T Regression models for air pollution and daily mortality: analysis of data from Birmingham, Alabama %A RICHARD L. SMITH %A J.M. Davis %A Jerome Sacks %A Speckman, Paul %A P. Styer %K Air Pollutants/adverse effects %K Air Pollutants/analysis %K Air Pollution/adverse effects %K Air Pollution/analysis %K Air Pollution/statistics & numerical data %K Alabama/epidemiology %K Humans %K Mortality %K Poisson Distribution %K Regression Analysis %K Risk %K Sensitivity and Specificity %K Statistical Models %X

Several recent studies have reported associations between common levels of particulate air pollution and small increases in daily mortality. This study examined whether a similar association could be found in the southern United States, with different weather patterns than the previous studies, and examined the sensitivity of the results to different methods of analysis and covariate control. Data were available in Birmingham, Alabama, from August 1985 through 1988. Regression analyses controlled for weather, time trends, day of the week, and year of study and removed any long-term patterns (such as seasonal and monthly fluctuations) from the data by trigonometric filtering. A significant association was found between inhalable particles and daily mortality in Poisson regression analysis (relative risk = 1.11, 95% confidence interval 1.02-1.20). The relative risk was estimated for a 100-micrograms/m3 increase in inhalable particles. Results were unchanged when least squares regression was used, when robust regression was used, and under an alternative filtering scheme. Diagnostic plots showed that the filtering successfully removed long wavelength patterns from the data. The generalized additive model, which models the expected number of deaths as nonparametric smoothed functions of the covariates, was then used to ensure adequate control for any nonlinearities in the weather dependence. Essentially identical results for inhalable particles were seen, with no evidence of a threshold down to the lowest observed exposure levels. The association also was unchanged when all days with particulate air pollution levels in excess of the National Ambient Air Quality Standards were deleted. The magnitude of the effect is consistent with recent estimates from Philadelphia, Steubenville, Detroit, Minneapolis, St. Louis, and Utah Valley.

%B Environmetrics %V 11 %P 719-743 %G eng %0 Journal Article %J Concrete Science and Engineering %D 2000 %T Statistical studies of the conductivity of concrete using ASTM C1202?94 %A A. F. Karr %A S.S. Jaiswal %A J.D. Picka %A T. Igusa %A S. P. Shah %A B.E. Ankenman %A P. Styer %B Concrete Science and Engineering %V 2 %P 97-105 %G eng %0 Conference Proceedings %B American Society of Civil Engineers %D 2000 %T Traffic Signal Offset Optimization Using Microscopic Simulation Program with Stochastic Process Model %A Park, B. %A N. M. Rouphail %A J. Sacks %B American Society of Civil Engineers %G eng %0 Journal Article %J Journal of Agricultural Biological and Environmental Statistics %D 1999 %T A bivariate Bayes method for improving the estimates of mortality rates with a twofold conditional autoregressive model %A Woodard, R. %A Sun,Dongchu %A Z. He %A Sheriff, S. %X

The Missouri Turkey Hunting Survey (MTHS) is a post-season mail survey conducted by the Missouri Department of Conservation to monitor and aid in the regulation of the turkey hunting season. Questionnaires are distributed after the hunting season to a simple random sample of persons who purchased permits to hunt wild turkey during the spring season. For the 1996 turkey hunting season 95,801 persons purchased hunting permits. From these individuals a simple random sample of 6,999 hunters were selected for the survey and 5,005 of these responded. The MTHS 1 Roger Woodard (E-mail: woodard@stat.missouri.edu) is a Ph.D student and Dongchu Sun (E-mail: dsun@stat.missouri.edu) is Associate Professor of Statistics, Department of Statistics, University of Missouri, Columbia, MO 65211. Zhuoqiong He (E-mail: HEZ@mail.conservation.state.mo.us) is a biometrician and Steven L. Sheri (E-mail: SHERIS@mail.conservation.state.mo.us) is a wildlife biometrics superv

%B Journal of Agricultural Biological and Environmental Statistics %G eng %0 Journal Article %J Mathematical Geology %D 1999 %T Design and Analysis for Modeling and Predicting Spatial Contamination %A Abt, Markus %A Welch, William J. %A Jerome Sacks %K best linear unbiased prediction %K dioxin contamination %K Gaussian stochastic process %K lognormal kriging %K ordinary kriging %K spatial statistics %X

Sampling and prediction strategies relevant at the planning stage of the cleanup of environmental hazards are discussed. Sampling designs and models are compared using an extensive set of data on dioxin contamination at Piazza Road, Missouri. To meet the assumptions of the statistical model, such data are often transformed by taking logarithms. Predicted values may be required on the untransformed scale, however, and several predictors are also compared. Fairly small designs turn out to be sufficient for model fitting and for predicting. For fitting, taking replicates ensures a positive measurement error variance and smooths the predictor. This is strongly advised for standard predictors. Alternatively, we propose a predictor linear in the untransformed data, with coefficients derived from a model fitted to the logarithms of the data. It performs well on the Piazza Road data, even with no replication.

%B Mathematical Geology %I Kluwer Academic Publishers-Plenum Publishers %V 31 %P 1-22 %G eng %U http://dx.doi.org/10.1023/A%3A1007504329298 %R 10.1023/A:1007504329298 %0 Journal Article %J ACSE Journal of Materials in Civil Engineering %D 1999 %T Effect of cracking on water and chloride permeability of concrete %A A. F. Karr. %A C.-M. Aldea %A S. P. Shah %X

The goal of this research was to study the relationship between cracking and concrete permeability and to support accounting for permeability and cracking resistance to other factors besides strength, as criteria to be considered in mix design to achieve a durable concrete. The effect of material composition [normal-strength concrete (NSC) and high-strength concrete (HSC) with two different mix designs] and crack width (ranging from 50 to 400 ?m) on water and chloride permeability were examined. Cracks of designed widths were induced in the concrete specimens using a feedback-controlled splitting tensile test. Chloride permeability of the cracked samples was evaluated using a rapid chloride permeability test and the water permeability of cracked concrete was then evaluated by a low-pressure water permeability test. Uncracked HSC was less water permeable than NSC, as expected, but cracking changed the material behavior in terms of permeability. Both NSC and HSC were affected by cracking, and the water permeability of cracked samples increased with increasing crack width. Among the tested materials, only HSC with a very low water-to-cement ratio chloride permeability was sensitive with respect to cracking. Results indicate that the water permeability is significantly more sensitive than the chloride permeability with respect to the crack widths used in this study.

%B ACSE Journal of Materials in Civil Engineering %V 11 %P 181?187 %G eng %R http://dx.doi.org/10.1061/(ASCE)0899-1561(1999)11:3(181) %0 Journal Article %J Transportation Research Record %D 1999 %T Effect of microcracking on durability of high strength concrete %A A. F. Karr %A C.-M. Aldea %A S. P. Shah %X

The relationship between cracking and chloride and water permeability of high-strength concrete (HSC) was studied. Two different mix designs were used: HSC_1 (w/b = 0.31) and HSC_2 (w/b = 0.25). The effects of crack width and sample thickness on permeability were examined. Cracks of designed widths were induced in the concrete specimens using the feedback-controlled splitting tensile test. Chloride permeability of the cracked samples was evaluated by using a rapid chloride permeability test. The water permeability of cracked concrete was then evaluated by a low-pressure water permeability test. Among the materials tested, only high-strength concrete with a very low water-to-cement ratio conductivity is sensitive with respect to cracking. The water permeability of cracked HSC significantly increases with increasing crack width. Among the parameters considered, crack parameters significantly affect water permeability, and there is little thickness effect. The results indicate that the water permeability is significantly more sensitive than conductivity with respect to the crack width used.

%B Transportation Research Record %V 1668 %P 86-90 %G eng %R 10.3141/1668-13 %0 Journal Article %J In Papers in Regional Science %D 1999 %T Estimation of Demand due to Welfare Reform %A Sen, Ashish %A P. Metaxatos %A Sööt, Siim %A Piyushimita Thakuriah %B In Papers in Regional Science %V 78 %P 195 – 211 %G eng %0 Journal Article %J Environmetrics %D 1999 %T Meteorologically-dependent trends in urban ozone %A Huang, Li-Shan %A RICHARD L. SMITH %K ANOVA %K empirical Bayes %K regression tree %X

Ozone concentrations are affected by precursor emissions and by meteorological conditions. As part of a broad study to assess the effects of standards imposed by the U.S. Environmental Protection Agency (EPA), it is of interest to analyze trends in ozone after adjusting for meteorological influences. Previous papers have studied this problem for ozone data from Chicago, using a variety of regression techniques. This paper presents a different approach, in which the meteorological influence is treated nonlinearly through a regression tree. A particular advantage of this approach is that it allows us to consider different trends within the clusters produced by the regression tree analysis. The variability of trend estimates between clusters is reduced by applying an empirical Bayes adjustment. The results confirm the findings of previous authors that there is an overall downward trend in Chicago ozone values, but they also go beyond previous analyses by showing that the trend is stronger at higher levels of ozone. Copyright © 1999 John Wiley & Sons, Ltd.

%B Environmetrics %V 10 %P 103–118 %G eng %0 Journal Article %J Materials and Structures %D 1999 %T Permeability of cracked concrete %A A. F. Karr %A C.-M. Aldea %A S. P. Shah %X

The goal of the research presented here was to study the relationship between cracking and water permeability. A feedback-controlled test was used to generate width-controlled cracks. Water permeability was evaluated by a low-pressure water permeability test. The factors chosen for the experimental design were material type (paste, mortar, normal and high strength concrete), thickness of the sample and average width of the induced cracks (ranging from 50 to 350 micrometers). The water permeability test results indicated that the relationships between permeability and material type differ for uncracked and cracked material, and that there was little thickness effect. Permeability of uncracked material decreased from paste, mortar, normal strength concrete (NSC) to high strength concrete (HSC). Water permeability of cracked material significantly increased with increasing crack width. For cracks above 100 microns, NSC showed the highest permeability coefficient, where as mortar showed the lowest one.

%B Materials and Structures %V 32 %P 370-376 %G eng %R 10.1007/BF02479629 %0 Conference Paper %B Proceedings of the International Symposium on High Performance and Reactive Powder Concretes %D 1999 %T Permeability of cracked high strength concrete %A A. F. Karr %A C.-M. Aldea %A S. P. Shah %E P. C. Aïtcin %E Y. Delagrave %X

The goal of the research presented here was to study the relationship between cracking and water permeability. A feedback-controlled test was used to generate width-controlled cracks. Water permeability was evaluated by a low-pressure water permeability test. The factors chosen for the experimental design were material type (paste, mortar, normal and high strength concrete), thickness of the sample and average width of the induced cracks (ranging from 50 to 350 micrometers). The water permeability test results indicated that the relationships between permeability and material type differ for uncracked and cracked material, and that there was little thickness effect. Permeability of uncracked material decreased from paste, mortar, normal strength concrete (NSC) to high strength concrete (HSC). Water permeability of cracked material significantly increased with increasing crack width. For cracks above 100 microns, NSC showed the highest permeability coefficient, where as mortar showed the lowest one.

%B Proceedings of the International Symposium on High Performance and Reactive Powder Concretes %P 211-219 %G eng %0 Journal Article %J Biometrika %D 1999 %T Posterior distribution of hierarchical models using CAR(1) distributions %A Sun,Dongchu %A Tsuakawa, R. K. %A Speckman, Paul %K Gibbs sampling %K Linear mixed model %K Multivariate normal %K Partially informative normal distribution %X

We examine properties of the conditional autoregressive model, or CAR(1) model, which is commonly used to represent regional effects in Bayesian analyses of mortality rates. We consider a Bayesian hierarchical linear mixed model where the fixed effects have a vague prior such as a constant prior and the random effect follows a class of CAR(1) models including those whose joint prior distribution of the regional effects is improper. We give sufficient conditions for the existence of the posterior distribution of the fixed and random effects and variance components. We then prove the necessity of the conditions and give a one-way analysis of variance example where the posterior may or may not exist. Finally, we extend the result to the generalised linear mixed model, which includes as a special case the Poisson log-linear model commonly used in disease

%B Biometrika %V 86 %P 341-350 %G eng %R 10.1093/biomet/86.2.341 %0 Book Section %D 1999 %T Probe-based surveillance for travel time information in ITS %A A. F. Karr %A P. Thakuriah %A A. Sen %E R. Emmerink %E P. Nijkamp %I Ashgate Publishing Ltd %P 393-425 %G eng %& 17 %0 Journal Article %J International Transactions in Operational Research %D 1999 %T Variances of link travel time estimates: Implications for optimal routes %A A. F. Karr %A A. Sen %A P. Thakuriah %A X. Zhu %K Advanced Traveler Information System %K Covariance of travel times %K Dependence in travel time observations %K Intelligent Transportation System %K Probe vehicles %K Variance of travel time estimates %K Vehicle simulation model %X

In this paper, we explore the consequences of using link travel time estimates with high variance to compute the minimum travel time route between an origin and destination pair. Because of platoon formation or for other reasons, vehicles on a link separated by small headways tend to have similar travel times. In other words, the covariance of link travel times of distinct vehicles which are close together may not be zero. It follows that the variance of the mean of travel times obtained from a sample of n vehicles on a same link over small time intervals is of the form a+b/n where a and b would usually be positive. This result has an important implication for the quality of road network travel time information given by Intelligent Transportation Systems (ITS)?that the variance of the estimate of mean travel time does not go to zero with increasing n. Thus the quality of information disseminated by ITS is not necessarily improved by increasing the market penetration of vehicles monitoring the system with the necessary equipment (termed probe vehicles). Estimates of a and b for a set of links are presented in the paper and consequences for probe-based ITS are explored by means of a simulation of such a system which is operational on an actual network.

%B International Transactions in Operational Research %V 6 %P 75-87 %8 January %G eng %R 10.1111/j.1475-3995.1999.tb00144.x %0 Journal Article %J Papers in Regional Science %D 1999 %T Welfare reform and spatial matching between clients and jobs %A Sen, Ashish %A Metaxatos, Paul %A Sööt, Siim %A Thakuriah, Vonu %K C13 %K C51 %K C52 %K entry-level job openings. %K I31 %K J23 %K JEL classification:C12 %K Key words:Welfare to work %K R12 %K R41 %K R53 %K targeted service %K travel demand %X

The recent Welfare Reform Act requires several categories of public assistance recipients to transition to the work force. In most metropolitan areas public assistance clients reside great distances from areas of entry-level jobs. Any program designed to provide access to these jobs, for those previously on public aid, needs relevant transportation services when the job search process begins. Therefore it is essential that the latent demand for commuting among public aid clients be assessed in developing public transportation services. The location of entry-level jobs must also be known or, as in this article, estimated using numerous data sources. This article reports on such a demand estimation effort, focusing primarily on the use of Regional Science methods.

%B Papers in Regional Science %I Springer-Verlag %V 78 %P 195-211 %G eng %U http://dx.doi.org/10.1007/s101100050021 %R 10.1007/s101100050021 %0 Book Section %B Case Studies in Environmental Statistics %D 1998 %T Categorical Exposure-Response Regression Analysis of Toxicology Experiments %A Xie, Minge %A Simpson, Douglas %E Nychka, Douglas %E Piegorsch, Walter W. %E Lawrence H. Cox %X

In the mid-1980s, an accident at the Union Carbide pesticides plant in Bhopal, India released the toxic gas methylisocyanate (MIC) in that densely populated region, killing more than 4000 people and injuring 500,000 others. Even today, many people in Bhopal are affected by illnesses related to that earlier exposure. This notorious industrial disaster not only forced scientists to pay greater attention to identifying and handling of hazardous chemicals but also prompted greater awareness of those common industrial products that contain hazard pollutants.

%B Case Studies in Environmental Statistics %S Lecture Notes in Statistics %I Springer US %V 132 %P 121-141 %@ 978-0-387-98478-0 %G eng %U http://dx.doi.org/10.1007/978-1-4612-2226-2_7 %R 10.1007/978-1-4612-2226-2_7 %0 Journal Article %J Journal of the Royal Statistical Society: Series C %D 1998 %T Circuit optimization via sequential computer experiments: design of an output buffer %A Aslett, Robert %A Buck, Robert J. %A Duvall, Steven G. %A Jerome Sacks %A Welch, William J. %K Circuit simulator %K Computer code %K Computer model %K Engineering design %K Parameter design %K Stochastic process %K Visualization %X

In electrical engineering, circuit designs are now often optimized via circuit simulation computer models. Typically, many response variables characterize the circuit’s performance. Each response is a function of many input variables, including factors that can be set in the engineering design and noise factors representing manufacturing conditions. We describe a modelling approach which is appropriate for the simulator’s deterministic input–output relationships. Non-linearities and interactions are identified without explicit assumptions about the functional form. These models lead to predictors to guide the reduction of the ranges of the designable factors in a sequence of experiments. Ultimately, the predictors are used to optimize the engineering design. We also show how a visualization of the fitted relationships facilitates an understanding of the engineering trade-offs between responses. The example used to demonstrate these methods, the design of a buffer circuit, has multiple targets for the responses, representing different trade-offs between the key performance measures.

%B Journal of the Royal Statistical Society: Series C %V 47 %P 31-48 %G eng %0 Journal Article %J Mathematical and Computer Modelling %D 1998 %T Estimation of static travel times in a dynamic route guidance system—II %A Sen, Ashish %A Sööt, Siim %A Piyushimita Thakuriah %A Condie, Helen %K Advanced Traveler Information Systems %K Dynamic Route Guidance %K Link travel times %K Static estimates %X

In an earlier paper a method for computing static profiles of link travel times was given. In this paper, the centrality of such profiles for ATIS is examined and the methods given in the earlier paper are applied to actual data. Except for a minor, easily correctable problem, the methods are shown to work very well under real-life conditions.

%B Mathematical and Computer Modelling %V 27 %P 67–85 %G eng %R 10.1016/S0895-7177(98)00052-1 %0 Journal Article %J Lecture Notes-Monograph Series %D 1998 %T Global versus Local Search in Constrained Optimization of Computer Models %A M. Schonlau %A Welch, William J. %A Jones, Donald R. %K Bayesian global optimization %K Computer code %K sequential design %K Stochastic process %X

Engineering systems are now frequently optimized via computer models. The input-output relationships in these models are often highly nonlinear deterministic functions that are expensive to compute. Thus, when searching for the global optimum, it is desirable to minimize the number of function evaluations. Bayesian global optimization methods are well-suited to this task because they make use of all previous evaluations in selecting the next search point. A statistical model is fit to the sampled points which allows predictions to be made elsewhere, along with a measure of possible prediction error (uncertainty). The next point is chosen to maximize a criterion that balances searching where the predicted value of the function is good (local search) with searching where the uncertainty of prediction is large (global search). We extend this methodology in several ways. First, we introduce a parameter that controls the local-global balance. Secondly, we propose a method for dealing with nonlinear inequality constraints from additional response variables. Lastly, we adapt the sequential algorithm to proceed in stages rather than one point at a time. The extensions are illustrated using a shape optimization problem from the automotive industry.

%B Lecture Notes-Monograph Series %V 34 %P 11-25 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Projecting to the NAEP Scale: Results from the North Carolina End-of-Grade Testing Program %A Williams, Valerie %A Billeaud, Kathleen %A Davis, Lori A. %A Thissen, David %A Sanford, Eleanor E. %X

Data from the North Carolina End-of-Grade test of eighth-grade mathematics are used to estimate the achievement results on the scale of the National Assessment of Educational Progress (NAEP) Trial State Assessment. Linear regression models are used to develop projection equations to predict state NAEP results in the future, and the results of such predictions are compared with those obtained in the 1996 administration of NAEP. Standard errors of the parameter estimates are obtained using a bootstrap resampling technique.

%B Journal of Educational Measurement %V 35 %P 277-296 %G eng %0 Book Section %B Knowledge and Networks in a Dynamic Economy %D 1998 %T Roadway Incident Analysis with a Dynamic User-Optimal Route Choice Model %A Boyce, D. E. %A Lee, D.-H. %A Janson, B.N. %E Beckmann, Martin J. %E Johannsson, Börje %E Snickars, Folke %E Thord, Roland %X

The transportation system conveys interdependencies. When analysing the costs and benefits of transport investment projects, it is therefore necessary to address the question of linkages among projects. Such linkages can occur in terms of economies of scale in arising from the combination of projects during the construction phase. Intelligent Transportation Systems (ITS), also known as Intelligent Vehicle Highway Systems (IVHS), are applying advanced technologies (such as navigation, automobile, computer science, telecommunication, electronic engineering, automatic information collection and processing) in an effort to bring revolutionary improvements in traffic safety, network capacity utilization, vehicle emission reductions, travel time and fuel consumption savings, etc. Within the framework of ITS, Advanced Traffic Management Systems (ATMS) and Advanced Traveler Information Systems (ATIS) both aim to manage and predict traffic congestion and provide historical and real time network-wide traffic information to support drivers’ route choice decisions. To enable ATMS/ATIS to achieve the above described goals, traffic flow prediction models are needed for system operation and evaluation. Linkages may also arise in supply through interaction among network components, or among the producers of transportation services. Linkages may also emerge in demand through the creation of new opportunities for interaction.

%B Knowledge and Networks in a Dynamic Economy %I Springer Berlin Heidelberg %P 371-390 %@ 978-3-642-64350-7 %G eng %U http://dx.doi.org/10.1007/978-3-642-60318-1_21 %R 10.1007/978-3-642-60318-1_21 %0 Journal Article %J In Transportation Research Record %D 1998 %T Transportation Planning Process for Linking Welfare Recipients to Jobs %A Metaxatos, Paul %A Sööt, Siim %A Piyushimita Thakuriah %A Sen, Ashish %B In Transportation Research Record %V 1626 %P 149 - 158 %G eng %0 Journal Article %J Ecological Modeling %D 1997 %T Characterization of Parameters in Mechanistic Models: A Case Study of PCB Fate and Transport in Surface Waters %A Steinberg, Laura J. %A Reckhow, Kenneth H. %A Wolpert, Robert L. %B Ecological Modeling %V 97 %G eng %N 1 %0 Journal Article %J Journal of Transportation Engineering, ASCE %D 1997 %T Frequency of probe vehicle reports and variances of link travel time estimates %A A. Sen %A P. Thakuriah %A X. Zhu %A A. F. Karr %X

An important design issue relating to probe-based Advanced Traveler Information Systems (ATISs) and Advanced Traffic Management Systems is the sample size of probes (or the number of link traversals by probe vehicles) per unit time used in order to obtain reliable network information in terms of link travel time estimates. The variance of the mean of travel times obtained from n probes for the same link over a fixed time period may be shown to be of the form a+b/n where a and b are link-specific parameters. Using probe travel time data from a set of signalized arterials, it is shown that a is positive for well-traveled signalized links. This implies that the variance does not go to zero with increasing n. Consequences of this fact for probe-based systems are explored. While the results presented are for a specific set of links, we argue that because of the nature of the underlying travel time process, the broad conclusions would hold for most well-traveled links with signal control.

%B Journal of Transportation Engineering, ASCE %V 123 %P 290?297 %G eng %R http://dx.doi.org/10.1061/(ASCE)0733-947X(1997)123:4(290) %0 Conference Paper %B Brittle Matrix Composites - International Symposium %D 1997 %T Influence of microstructure and fracture on the transport properties in cement-based materials %A S. Jaiswal %A T. Igusa %A T. Styer %A A. F. Karr %B Brittle Matrix Composites - International Symposium %V 5 %P 199-220 %G eng %0 Journal Article %J Cement Concrete Res. %D 1997 %T Permeability study of cracked concrete %A K. Wang %A D.C. Jansen %A S. P. Shah %A A. F. Karr %X

Cracks in concrete generally interconnect flow paths and increase concrete permeability. The increase in concrete permeability due to the progression of cracks allows more water or aggressive chemical ions to penetrate into the concrete, facilitating deterioration. The present work studies the relationship between crack characteristics and concrete permeability. In this study, feedback controlled splitting tests are introduced to generate crack width-controlled concrete specimens. Sequential crack patterns with different crack widths are viewed under a microscope. The permeability of cracked concrete is evaluated by water permeability tests. The preliminary results indicate that crack openings generally accelerate water flow rate in concrete. When a specimen is loaded to have a crack opening displacement smaller than 50 microns prior to unloading, the crack opening has little effect on concrete permeability. When the crack opening displacement increases from 50 microns to about 200 microns, concrete permeability increases rapidly. After the crack opening displacement reaches 200 microns, the rate of water permeability increases steadily. The present research may provide insight into developing design criteria for a durable concrete and in predicting service life of a concrete structure.

%B Cement Concrete Res. %V 27 %P 381-393 %G eng %R http://dx.doi.org/10.1016/S0008-8846(97)00031-8. %0 Book Section %B Case Studies in Bayesian Statistics %D 1997 %T A Random-Effects Multinomial Probit Model of Car Ownership Choice %A Nobile, Agostino %A Bhat, Chandra R. %A Pas, Eric I. %E Gatsonis, Constantine %E Hodges, JamesS. %E Kass, RobertE. %E McCulloch, Robert %E Rossi, Peter %E Singpurwalla, NozerD. %K car ownership %K longitudinal data %K Multinomial probit model %X

The number of cars in a household has an important effect on its travel behavior (e.g., choice of number of trips, mode to work and non-work destinations), hence car ownership modeling is an essential component of any travel demand forecasting effort. In this paper we report on a random effects multinomial probit model of car ownership level, estimated using longitudinal data collected in the Netherlands. A Bayesian approach is taken and the model is estimated by means of a modification of the Gibbs sampling with data augmentation algorithm considered by McCulloch and Rossi (1994). The modification consists in performing, after each Gibbs sampling cycle, a Metropolis step along a direction of constant likelihood. An examination of the simulation output illustrates the improved performance of the resulting sampler.

%B Case Studies in Bayesian Statistics %S Lecture Notes in Statistics %I Springer New York %V 121 %P 419-434 %@ 978-0-387-94990-1 %G eng %U http://dx.doi.org/10.1007/978-1-4612-2290-3_13 %R 10.1007/978-1-4612-2290-3_13 %0 Book Section %B Modelling Longitudinal and Spatially Correlated Data %D 1997 %T Scaled Link Functions for Heterogeneous Ordinal Response Data* %A Xie, Minge %A Simpson, Douglas G %A Carroll, Raymond J. %E Gregoire, Timothy G. %E Brillinger, David R. %E Diggle, PeterJ. %E Russek-Cohen, Estelle %E Warren, William G. %E Wolfinger, Russell D. %K Aggregated observations %K Generalized likelihood inference %K Marginal modeling approach %K Ordinal regression %X

This paper describes a class ordinal regression models in which the link function has scale parameters that may be estimated along with the regression parameters. One motivation is to provide a plausible model for group level categorical responses. In this case a natural class of scaled link functions is obtained by treating the group level responses as threshold averages of possible correlated latent individual level variables. We find scaled link functions also arise naturally in other circumstances. Our methodology is illustrated through environmental risk assessment data where (correlated) individual level responses and group level responses are mixed.

%B Modelling Longitudinal and Spatially Correlated Data %S Lecture Notes in Statistics %I Springer New York %V 122 %P 23-36 %@ 978-0-387-98216-8 %G eng %U http://dx.doi.org/10.1007/978-1-4612-0699-6_3 %R 10.1007/978-1-4612-0699-6_3 %0 Journal Article %J Atmospheric Environment %D 1996 %T Accounting for Meteorological Effects in Measuring Urban Ozone Levels and Trends %A Bloomfield, Peter %A Royle, Andy %A Steinberg, Laura J. %A Yang, Qing %K median polish %K meteorological adjustment %K nonlinear regression %K nonparametric regression %K Ozone concentration %X

Observed ozone concentrations are valuable indicators of possible health and environmental impacts. However, they are also used to monitor changes and trends in the sources of ozone and of its precursors, and for this purpose the influence of meteorological variables is a confounding factor. This paper examines ozone concentrations and meteorology in the Chicago area. The data are described using least absolute deviations and local regression. The key relationships observed in these analyses are then used to construct a nonlinear regression model relating ozone to meteorology. The model can be used to estimate that part of the trend in ozone levels that cannot be accounted for by trends in meteorology, and to ‘adjust’ observed ozone concentrations for anomalous weather conditions.

%B Atmospheric Environment %V 30 %P 3067–3077 %G eng %N 17 %0 Journal Article %J Journal of Environmental Engineering %D 1996 %T Bayesian Model for Fate and Transport of Polychlorinated Biphenyl in Upper Hudson River %A Steinberg, Laura J. %A Reckhow, Kenneth H. %A Wolpert, Robert L. %B Journal of Environmental Engineering %V 122 %G eng %N 5 %0 Journal Article %J Journal of Environmental Engineering %D 1996 %T Bayesian Model for Fate and Transport of Polychlorinated Biphenyl in Upper Hudson River %A Steinberg, Laura J. %A Reckhow, Kenneth H. %A Wolpert, Robert L. %K Bayesian analysis %K Hudson River %K PCB %K simulation models %K transport phenomena %X

Modelers of contaminant fate and transport in surface waters typically rely on literature values when selecting parameter values for mechanistic models. While the expert judgment with which these selections are made is valuable, the information contained in contaminant concentration measurements should not be ignored. In this full-scale Bayesian analysis of polychlorinated biphenyl (PCB) contamination in the upper Hudson River, these two sources of information are combined using Bayes’ theorem. A simulation model for the fate and transport of the PCBs in the upper Hudson River forms the basis of the likelihood function while the prior density is developed from literature values. The method provides estimates for the anaerobic biodegradation half-life, aerobic biodegradation plus volatilization half-life, contaminated sediment depth, and resuspension velocity of 4,400 d, 3.2 d, 0.32 m, and 0.02 m/yr, respectively. These are significantly different than values obtained with more traditional methods, and are shown to produce better predictions than those methods when used in a cross-validation study.

%B Journal of Environmental Engineering %V 122 %P 341-349 %8 May 1996 %G eng %R http://dx.doi.org/10.1061/(ASCE)0733-9372(1996)122:5(341) %0 Journal Article %J Journal of Agricultural Biological and Environmental Statistics %D 1996 %T Interval Censoring And Marginal Analysis In Ordinal Regression %A Simpson, Douglas G %A Carroll, Raymond %A Xie, Minge %K categorical data %K categorical response %K environmental statistics %X

This paper develops methodology for regression analysis of ordinal response data subject to interval censoring. This work is motivated by the need to analyze data from multiple studies in toxicological risk assessment. Responses are scored on an ordinal severity scale, but not all responses can be scored completely. For instance, in a mortality study, information on nonfatal but adverse outcomes may be missing. In order to address possible within–study correlations we develop a generalized estimating approach to the problem, with appropriate adjustments to uncertainty statements. We develop expressions relating parameters of the implied marginal model to the parameters of a conditional model with random effects, and, in a special case, we note an interesting equivalence between conditional and marginal modeling of ordinal responses. We illustrate the methodology in an analysis of a toxicological data-base.

%B Journal of Agricultural Biological and Environmental Statistics %V 4 %G eng %R 10.2307/1400524 %0 Journal Article %J Transportation Research Record %D 1996 %T Non - response and Urban Travel Models %A Piyushimita Thakuriah %A Sen, Ashish %A Sööt, Siim %A Christopher, Ed J. %B Transportation Research Record %V 1551 %P 82 - 87 %G eng %0 Journal Article %J Journal of Agricultural, Biological, and Environmental Statistics %D 1996 %T Predicting ozone levels and trends with semiparametric modeling %A Gao, Feng %A Jerome Sacks %A Welch, William %B Journal of Agricultural, Biological, and Environmental Statistics %V 1 %P 404-425 %G eng %& 404 %0 Journal Article %J In Transporta tion Research Part C: Emerging Technologies %D 1996 %T Quality of Information given by Advanced Traveler Information Systems %A Piyushimita Thakuriah %A Sen, Ashish %B In Transporta tion Research Part C: Emerging Technologies %V 4 %P 249 - 266 %G eng %0 Journal Article %J Environmental Health Perspectives %D 1995 %T Effect of outdoor airborne particulate matter on daily death count %A P. Styer %A McMillan, N %A Gao, F %A Davis, J %A Jerome Sacks %X

To investigate the possible relationship between airborne particulate matter and mortality, we developed regression models of daily mortality counts using meteorological covariates and measures of outdoor PM10. Our analyses included data from Cook County, Illinois, and Salt Lake County, Utah. We found no evidence that particulate matter < or = 10 microns (PM10) contributes to excess mortality in Salt Lake County, Utah. In Cook County, Illinois, we found evidence of a positive PM10 effect in spring and autumn, but not in winter and summer. We conclude that the reported effects of particulates on mortality are unconfirmed.

%B Environmental Health Perspectives %V 103 %P 490–497 %G eng %0 Journal Article %J Mathematical and Computer Modelling %D 1995 %T Estimation of Static Travel Times in a Dynamic Route Guidance System %A Sen, Ashish %A Piyushimita Thakuriah %K Advanced Travel Information System %K Autonomous route guidance %K Dynamic Route Guidance %K Link travel time estimate %K Link Travel Time Process %X

In an Advanced Traveler Information System where route guidance is provided, a driver chooses a route before he/she actually traverses the links in the route. For such systems, link travel times need to be forecasted. However, information on several thousand links would take a fair amount of time to be conveyed to the driver, and very few drivers would be willing to wait very long to get route information, In the ADVANCE demonstration, to be implemented in suburban Chicago, the in-vehicle unit in each participating vehicle will be provided with the capability of accessing default travel time information, which will offer the vehicle with an autonomous navigation capability. The default estimates will be overwritten by dynamic up-to-the-minute forecasts if such forecasts are different from the default estimates. This paper describes the approach used to compute default travel times estimates.

%B Mathematical and Computer Modelling %V 22 %P 83–101 %G eng %0 Journal Article %J Annual Review of Psychology %D 1995 %T Multiple Hypothesis Testing: A Review %A Shaffer, Juliet Popper %B Annual Review of Psychology %V 46 %P 561-584 %G eng %0 Journal Article %J Atmospheric Environment %D 1995 %T Point process approach to modeling trends in tropospheric ozone based on exceedances of a high threshold %A Smith, R.L. %A Shively, Thomas S. %B Atmospheric Environment %V 29 %P 3489–3499 %G eng %& 3489 %R 10.1016/1352-2310(95)00030-3 %0 Journal Article %J Journal of Geophysical Research: Oceans %D 1994 %T Arctic sea ice variability: Model sensitivities and a multidecadal simulation %A Chapman, W.L. %A Welch, W. %A Bowman, K.P. %A Jerome Sacks %A Walsh, J.E. %K Arctic region %K Climate and interannual variability %K Climate and interannual variability Ice mechanics and air/sea/ice exchange processes %K Ice mechanics and air/sea/ice exchange processes %K Information Related to Geographic Region: Arctic region %K Numerical modeling %X

A dynamic-thermodynamic sea ice model is used to illustrate a sensitivity evaluation strategy in which a statistical model is fit to the output of the ice model. The statistical model response, evaluated in terms of certain metrics or integrated features of the ice model output, is a function of a selected set of d (= 13) prescribed parameters of the ice model and is therefore equivalent to a d-dimensional surface. The d parameters of the ice model are varied simultaneously in the sensitivity tests. The strongest sensitivities arise from the minimum lead fraction, the sensible heat exchange coefficient, and the atmospheric and oceanic drag coefficients. The statistical model shows that the interdependencies among these sensitivities are strong and physically plausible. A multidecadal simulation of Arctic sea ice is made using atmospheric forcing fields from 1960 to 1988 and parametric values from the approximate midpoints of the ranges sampled in the sensitivity tests. This simulation produces interannual variations consistent with submarine-derived data on ice thickness from 1976 and 1987 and with ice extent variations obtained from satellite passive microwave data. The ice model results indicate that (1) interannual variability is a major contributor to the differences of ice thickness and extent over timescales of a decade or less, and (2) the timescales of ice thickness anomalies are much longer than those of ice-covered areas. However, the simulated variations of ice coverage have less than 50% of their variance in common with observational data, and the temporal correlations between simulated and observed anomalies of ice coverage vary strongly with longitude.A dynamic-thermodynamic sea ice model is used to illustrate a sensitivity evaluation strategy in which a statistical model is fit to the output of the ice model. The statistical model response, evaluated in terms of certain metrics or integrated features of the ice model output, is a function of a selected set of d (= 13) prescribed parameters of the ice model and is therefore equivalent to a d-dimensional surface. The d parameters of the ice model are varied simultaneously in the sensitivity tests. The strongest sensitivities arise from the minimum lead fraction, the sensible heat exchange coefficient, and the atmospheric and oceanic drag coefficients. The statistical model shows that the interdependencies among these sensitivities are strong and physically plausible. A multidecadal simulation of Arctic sea ice is made using atmospheric forcing fields from 1960 to 1988 and parametric values from the approximate midpoints of the ranges sampled in the sensitivity tests. This simulation produces interannual variations consistent with submarine-derived data on ice thickness from 1976 and 1987 and with ice extent variations obtained from satellite passive microwave data. The ice model results indicate that (1) interannual variability is a major contributor to the differences of ice thickness and extent over timescales of a decade or less, and (2) the timescales of ice thickness anomalies are much longer than those of ice-covered areas. However, the simulated variations of ice coverage have less than 50% of their variance in common with observational data, and the temporal correlations between simulated and observed anomalies of ice coverage vary strongly with longitude.

%B Journal of Geophysical Research: Oceans %V 99 %P 919-935 %G eng %& 919 %R 10.1029/93JC02564 %0 Journal Article %D 1994 %T Multiworker Household Travel Demand %A Sööt, Siim %A Sen, Ashish %A Marston, J. %A Piyushimita Thakuriah %K Automobile ownership %K Demographics %K Employed %K Highway travel %K Households %K Income %K New products %K Population density %K Travel behavior %K Travel surveys %K Trip generation %K Urban areas %K Vehicle miles of travel %X The purpose of this study is to examine the travel behavior and related characteristics of multiworker households (MWHs) (defined as households with at least two workers) and how they contribute to the ever-increasing demand for transportation services. On average they have incomes which exceed the national household average and often have multiple automobiles and as households they generate a considerable number of trips. The virtual dearth of previous studies of MWHs makes an overview of their characteristics and their travel behavior necessary. This study reveals that the number of MWHs has continued to grow, as has their use of highways; they are found in disproportionate numbers in low density urban areas distant from public transportation. They also have new vehicles, and drive each vehicle more miles than other households. As households, MWHs travel more than do other households. However, an individual worker’s ability and desire to travel is constrained by time factors, among others, and transportation use by MWHs, when calculated on a per worker basis, is relatively low. %I Federal Highway Administration %V 1 %P 30 p %G eng %U http://nhts.ornl.gov/1990/doc/demographic.pdf %0 Generic %D 1993 %T Multivariate Threshold Methods %A RICHARD L. SMITH %G eng %0 Journal Article %D 1993 %T Non - response Bias and Trip Generation Models %A Piyushimita Thakuriah %A Sen, Ashish %A Sööt, Siim %A Christopher, Ed J. %K Bias (Statistics) %K Travel surveys %K Trip generation %X

There is serious concern over the fact that travel surveys often overrepresent smaller households with higher incomes and better education levels and, in general, that nonresponse is nonrandom. However, when the data are used to build linear models, such as trip generation models, and the model is correctly specified, estimates of parameters are unbiased regardless of the nature of the respondents, and the issues of how response rates and nonresponse bias are ameliorated. The more important task then is the complete specification of the model, without leaving out variables that have some effect on the variable to be predicted. The theoretical basis for this reasoning is given along with an example of how bias may be assessed in estimates of trip generation model parameters. Some of the methods used are quite standard, but the manner in which these and other more nonstandard methods have been systematically put together to assess bias in estimates shows that careful model building, not concern over bias in the data, becomes the key issue in developing trip generation and other models.

%I Transportation Research Board %P 64-70 %@ 0309055598 %G eng %0 Journal Article %J Wiley StatsRef: Statistics Reference Online %D 0 %T Combining Estimates from Multiple Surveys %A Elliott, M. R. %A Raghunathan, T. E. %A Schenker, N. %K dual frame %K imputation %K missing data %K non-probability samples %K small-area estimation %K Weighting %X

Combining estimates from multiple surveys can be very useful, especially when the question of interest cannot be addressed well by a single, existing survey. In this paper, we provide a brief review of methodology for combining estimates, with a focus on dual frame, weighting-based, joint-modeling, missing-data, and small-area methods. Many such methods are useful in situations outside the realm of combining estimates from surveys, such as combining information from surveys with administrative data and combining probability-sample data with non-probability sample, or “big” data. We also provide examples of comparability issues that must be kept in mind when information from different sources is being combined.

%B Wiley StatsRef: Statistics Reference Online %G eng %U https://www.niss.org/sites/default/files/Elliott%2C%20Raghunathan%2C%20%26%20Schenker%20for%20Wiley%20StatsRef.pdf %1

https://www.niss.org/sites/default/files/Elliott%2C%20Raghunathan%2C%20%26%20Schenker%20for%20Wiley%20StatsRef.pdf