%0 Journal Article %J Journal of Survey Statistics and Methodology %D 2015 %T Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples %A Schifeling, T. %A Cheng, C. %A Jerome Reiter %A Hillygus, D.C. %B Journal of Survey Statistics and Methodology %V 3 %P 265–295 %8 18 August 2015 %G eng %N 3 %0 Book Section %B Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches. %D 2014 %T Analytical frameworks for data release: A statistical view %A A. F. Karr %A J. P. Reiter %B Confidentiality and Data Access in the Use of Big Data: Theory and Practical Approaches. %I Cambridge University Press %C New York City, NY %G eng %0 Journal Article %J Technomet %D 2013 %T Analysis of high-dimensional structure-activity screening datasets using the optimal bit string Tree %A Zhang K %A Hughes-Oliver JM %A Young SS %K Classification %K Drug discovery %K High throughput screening %K Prediction %K QSAR %K Simulated annealing %X

We propose a new classification method called the Optimal Bit String Tree (OBSTree) to identify quantitative structure-activity relationships (QSARs). The method introduces the concept of a chromosome to describe the presence/absence context of a combination of descriptors. A descriptor set and its optimal chromosome form the splitting variable. A new stochastic searching scheme that contains a weighted sampling scheme, simulated annealing, and a trimming procedure optimizes the choice of splitting variable. Simulation studies and an application to screening monoamine oxidase inhibitors show that OBSTree is advantageous in accurately and effectively identifying QSAR rules and finding different classes of active compounds. Details of the algorithm, SAS code, and simulated and real datasets are available online as supplementary materials.

%B Technomet %V 55 %P 161-173 %G eng %R 10.1080/00401706.2012.760489 %0 Journal Article %J Journal of Clinical Chemistry %D 2010 %T Analytical Validation of Proteomic-Based Multiplex Assays: A Workshop Report by the NCI-FDA Interagency Oncology Task Force on Molecular Diagnostics %A Stephan A. Carr %A Nell Sedransk. %A Henry Rodriguez %A Zivana Tezak %A Mehdi Mesri %A Daniel C. Liebler %A Susan J. Fisher %A Paul Tempst %A Tara Hiltke %A Larry G. Kessler %A Christopher R. Kinsinger %A Reena Philip %A David F. Ransohoff %A Steven J. Skates %A Fred E. Regnier %A N. Leigh Anderson %A Elizabeth Mansfield %A on behalf of the Workshop Participants %X

Clinical proteomics has the potential to enable the early detection of cancer through the development of multiplex assays that can inform clinical decisions. However, there has been some uncertainty among translational researchers and developers as to the specific analytical measurement criteria needed to validate protein-based multiplex assays. To begin to address the causes of this uncertainty, a day-long workshop titled “Interagency Oncology Task Force Molecular Diagnostics Workshop” was held in which members of the proteomics and regulatory communities discussed many of the analytical evaluation issues that the field should address in development of protein-based multiplex assays for clinical use. This meeting report explores the issues raised at the workshop and details the recommendations that came out of the day’s discussions, such as a workshop summary discussing the analytical evaluation issues that specific proteomic technologies should address when seeking US Food and Drug Administration approval.

%B Journal of Clinical Chemistry %V 56 %P 237-243 %G eng %R 10.1373/clinchem.2009.136416 %0 Conference Proceedings %B Proc. ACM SIGSOFT Symposium Foundations of Software Engineering 2005 %D 2005 %T Applying classification techniques to remotely-collected program execution data %A A. F. Karr %A M. Haran %A A. A. Porter %A A. Orso %A A. P. Sanil %B Proc. ACM SIGSOFT Symposium Foundations of Software Engineering 2005 %I ACM %C New York %G eng %0 Journal Article %J Chance %D 2004 %T Analysis of integrated data without data integration %A A. F. Karr %A X. Lin %A J. P. Reiter %A A. P. Sanil %B Chance %V 17 %P 26-29 %G eng %0 Book Section %D 2002 %T Advances in Digital Government %A A. F. Karr %A J. Lee %A A. P. Sanil %A J. Hernandez %A S. Karimi %A K. Litwin %E E. Elmagarmid %E W. M. McIver %X

The Internet provides an efficient mechanism for Federal agencies to distribute their data to the public. However, it is imperative that such data servers have built-in mechanisms to ensure that confidentiality of the data, and the privacy of individuals or establishments represented in the data, are not violated. We describe a prototype dissemination system developed for the National Agricultural Statistics Service that uses aggregation of adjacent geographical units as a confidentiality-preserving technique. We also outline a Bayesian approach to statistical analysis of the aggregated data.

%I Kluwer %C Boston %P 181-196 %@ 978-1-4020-7067-9 %G eng %& Web-based systems that disseminate information from data but preserve confidentiality %R 10.1007/0-306-47374-7_11 %0 Conference Paper %B Bayesian Statistics 7, Proceedings of the Seventh Valencia International Meeting on Bayesian Statistics %D 2002 %T Assessing the Risk of Disclosure of Confidential Categorical Data %A Dobra, A., %A Fienberg, S.E., %A Trottini , M %B Bayesian Statistics 7, Proceedings of the Seventh Valencia International Meeting on Bayesian Statistics %I Oxford Press %G eng %0 Journal Article %J Res. Official Statist %D 2001 %T Analysis of aggregated data in survey sampling with application to fertilizer/pesticide usage surveys %A Jaeyong Lee %A Christopher Holloman %A Alan F. Karr %A Ashish P. Sanil %X

In many cases, the public release of survey or census data at fine geographical resolution (for example, counties) may endanger the confidentiality of respondents. A strategy for such cases is to aggregate neighboring regions into larger units that satisfy confidentiality requirements. An aggregation procedure employed in a prototype system for the US National Agricultural Statistics Service is used as context to investigate the impact of aggregation on statistical properties of the data. We propose a Bayesian simulation approach for the analysis of such aggregated data. As a consequence, we are able to specify the type of additional information (such as certain sample sizes) that needs to be released in order to enable the user to perform meaningful analyses with the aggregated data.

%B Res. Official Statist %V 4 %P 11–6 %G eng %0 Journal Article %J Transportation Research Record %D 2001 %T Assessment of Stochastic Signal Optimization Method Using Microsimulation %A Byungkyu Park %A Nagui M. Rouphail %A Jerome Sacks %X

A stochastic signal optimization method based on a genetic algorithm (GA-SOM) that interfaces with the microscopic simulation program CORSIM is assessed. A network in Chicago consisting of nine signalized intersections is used as an evaluation test bed. Taking CORSIM as the best representation of reality, the performance of the GA-SOM plan sets a ceiling on how good any (fixed) signal plan can be. An important aspect of this approach is its accommodations of variability. Also discussed is the robustness of an optimal plan under changes in demand. This benchmark is used to assess the best signal plan generated by TRANSYT-7F (T7F), Version 8.1, from among 12 reasonable strategies. The performance of the best T7F plan falls short of the benchmark on several counts, reflecting the need to account for variability in the highly stochastic system of traffic operations, which is not possible under the deterministic conditions intrinsic to T7F. As a sidelight, the performance of the GA-SOM plan within T7F is also computed and it is found to perform nearly as well as the optimum T7F plan.

%B Transportation Research Record %V 1748 %P 40-45 %G eng %R 10.3141/1748-05 %0 Book Section %B Molecular Modeling and Prediction of Bioactivity %D 2000 %T Analysis of a Large, High-Throughput Screening Data Using Recursive Partitioning %A Young, S.Stanley %A Jerome Sacks %E Gundertofte, Klaus %E Jørgensen, Flemming Steen %X

As biological drug targets multiply through the human genome project and as the number of chemical compounds available for screening becomes very large, the expense of screening every compound against every target becomes prohibitive. We need to improve the efficiency of the drug screening process so that active compounds can be found for more biological targets and turned over to medicinal chemists for atom-by-atom optimization. We create a method for analysis of the very large, complex data sets coming from high throughput screening, and then integrate the analysis with the selection of compounds for screening so that the structure-activity rules derived from an initial compound set can be used to suggest additional compounds for screening. Cycles of screening and analysis become sequential screening rather than the mass screening of all available compounds. We extend the analysis method to deal with multivariate responses. Previously, a screening campaign might screen hundreds of thousands of compounds; sequential screening can cut the number of compounds screened by up to eighty percent. Sequential screening also gives SAR rules that can be used to mathematically screen compound collections or virtual chemical libraries.

%B Molecular Modeling and Prediction of Bioactivity %I Springer US %P 149-156 %@ 978-1-4613-6857-1 %G eng %U http://dx.doi.org/10.1007/978-1-4615-4141-7_17 %R 10.1007/978-1-4615-4141-7_17 %0 Journal Article %J Atmospheric Environment %D 1996 %T Accounting for Meteorological Effects in Measuring Urban Ozone Levels and Trends %A Bloomfield, Peter %A Royle, Andy %A Steinberg, Laura J. %A Yang, Qing %K median polish %K meteorological adjustment %K nonlinear regression %K nonparametric regression %K Ozone concentration %X

Observed ozone concentrations are valuable indicators of possible health and environmental impacts. However, they are also used to monitor changes and trends in the sources of ozone and of its precursors, and for this purpose the influence of meteorological variables is a confounding factor. This paper examines ozone concentrations and meteorology in the Chicago area. The data are described using least absolute deviations and local regression. The key relationships observed in these analyses are then used to construct a nonlinear regression model relating ozone to meteorology. The model can be used to estimate that part of the trend in ozone levels that cannot be accounted for by trends in meteorology, and to ‘adjust’ observed ozone concentrations for anomalous weather conditions.

%B Atmospheric Environment %V 30 %P 3067–3077 %G eng %N 17 %0 Journal Article %J Atmospheric Environment %D 1996 %T Accounting for meteorological effects in measuring urban ozone levels and trends %A Bloomfield, Peter %A Royle, Andy %A Yang, Qing %K median polish %K meteorological adjustment %K nonlinear regression %K nonparametric regression %K Ozone concentration %X

Observed ozone concentrations are valuable indicators of possible health and environmental impacts. However, they are also used to monitor changes and trends in the sources of ozone and of its precursors, and for this purpose the influence of meteorological variables is a confounding factor. This paper examines ozone concentrations and meteorology in the Chicago area. The data are described using least absolute deviations and local regression. The key relationships observed in these analyses are then used to construct a nonlinear regression model relating ozone to meteorology. The model can be used to estimate that part of the trend in ozone levels that cannot be accounted for by trends in meteorology, and to ‘adjust’ observed ozone concentrations for anomalous weather conditions.

%B Atmospheric Environment %V 30 %P 3067-3077 %G eng %& 3067 %R 10.1016/1352-2310(95)00347-910.1016/1352-2310(95)00347-9 %0 Journal Article %J Environmetrics %D 1995 %T The ability of wet deposition networks to detect temporal trends %A Oehlert, Gary W. %K discrete smoothing %K wet deposition networks %X

We use the spatial/temporal model developed in Oehlert (1993) to estimate the detectability of trends in wet-deposition sulphate. Precipitation volume adjustments of sulphate concentration dramatically improve the detectability and quantifiability of trends. Anticipated decreases in sulphate of about 30 per cent in the Eastern U.S. by 2005 predicted by models should be detectable much earlier, say, 1997, but accurate quantification of the true decrease will require several additional years of monitoring. It is possible to delete a few stations from the East without materially affecting the detectability or quantifiability of trends. Careful siting of new stations can provide substantial improvement to regional trend estimation.

%B Environmetrics %V 6 %P 327–339 %G eng %R 10.1002/env.3170060402 %0 Journal Article %J Journal of Geophysical Research: Oceans %D 1994 %T Arctic sea ice variability: Model sensitivities and a multidecadal simulation %A Chapman, W.L. %A Welch, W. %A Bowman, K.P. %A Jerome Sacks %A Walsh, J.E. %K Arctic region %K Climate and interannual variability %K Climate and interannual variability Ice mechanics and air/sea/ice exchange processes %K Ice mechanics and air/sea/ice exchange processes %K Information Related to Geographic Region: Arctic region %K Numerical modeling %X

A dynamic-thermodynamic sea ice model is used to illustrate a sensitivity evaluation strategy in which a statistical model is fit to the output of the ice model. The statistical model response, evaluated in terms of certain metrics or integrated features of the ice model output, is a function of a selected set of d (= 13) prescribed parameters of the ice model and is therefore equivalent to a d-dimensional surface. The d parameters of the ice model are varied simultaneously in the sensitivity tests. The strongest sensitivities arise from the minimum lead fraction, the sensible heat exchange coefficient, and the atmospheric and oceanic drag coefficients. The statistical model shows that the interdependencies among these sensitivities are strong and physically plausible. A multidecadal simulation of Arctic sea ice is made using atmospheric forcing fields from 1960 to 1988 and parametric values from the approximate midpoints of the ranges sampled in the sensitivity tests. This simulation produces interannual variations consistent with submarine-derived data on ice thickness from 1976 and 1987 and with ice extent variations obtained from satellite passive microwave data. The ice model results indicate that (1) interannual variability is a major contributor to the differences of ice thickness and extent over timescales of a decade or less, and (2) the timescales of ice thickness anomalies are much longer than those of ice-covered areas. However, the simulated variations of ice coverage have less than 50% of their variance in common with observational data, and the temporal correlations between simulated and observed anomalies of ice coverage vary strongly with longitude.A dynamic-thermodynamic sea ice model is used to illustrate a sensitivity evaluation strategy in which a statistical model is fit to the output of the ice model. The statistical model response, evaluated in terms of certain metrics or integrated features of the ice model output, is a function of a selected set of d (= 13) prescribed parameters of the ice model and is therefore equivalent to a d-dimensional surface. The d parameters of the ice model are varied simultaneously in the sensitivity tests. The strongest sensitivities arise from the minimum lead fraction, the sensible heat exchange coefficient, and the atmospheric and oceanic drag coefficients. The statistical model shows that the interdependencies among these sensitivities are strong and physically plausible. A multidecadal simulation of Arctic sea ice is made using atmospheric forcing fields from 1960 to 1988 and parametric values from the approximate midpoints of the ranges sampled in the sensitivity tests. This simulation produces interannual variations consistent with submarine-derived data on ice thickness from 1976 and 1987 and with ice extent variations obtained from satellite passive microwave data. The ice model results indicate that (1) interannual variability is a major contributor to the differences of ice thickness and extent over timescales of a decade or less, and (2) the timescales of ice thickness anomalies are much longer than those of ice-covered areas. However, the simulated variations of ice coverage have less than 50% of their variance in common with observational data, and the temporal correlations between simulated and observed anomalies of ice coverage vary strongly with longitude.

%B Journal of Geophysical Research: Oceans %V 99 %P 919-935 %G eng %& 919 %R 10.1029/93JC02564