<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>10</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Benecha, H.</style></author><author><style face="normal" font="default" size="100%">Abreu, D.</style></author><author><style face="normal" font="default" size="100%">Abernethy, J.</style></author><author><style face="normal" font="default" size="100%">Sartore, L.</style></author><author><style face="normal" font="default" size="100%">Young, L. Y.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%"> Evaluation of a New Approach for Estimating the Number of U.S. Farms</style></title><secondary-title><style face="normal" font="default" size="100%">JSM 2017</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Agriculture</style></keyword><keyword><style  face="normal" font="default" size="100%">Area-frame</style></keyword><keyword><style  face="normal" font="default" size="100%">BigData</style></keyword><keyword><style  face="normal" font="default" size="100%">Capture-Recapture</style></keyword><keyword><style  face="normal" font="default" size="100%">List Frame</style></keyword><keyword><style  face="normal" font="default" size="100%">Logistic Regression</style></keyword><keyword><style  face="normal" font="default" size="100%">Misclassification Error</style></keyword><keyword><style  face="normal" font="default" size="100%">NASS</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">Submitted</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">https://www.niss.org/sites/default/files/Benecha_Estim_Farms_20170929.pdf</style></url></web-urls></urls><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;USDA’s National Agricultural Statistics Service (NASS) employs the June Area Survey (JAS) to produce annual&amp;nbsp;estimates of U.S. farm numbers. The JAS is an area-frame-based survey conducted every year during the first two&amp;nbsp;weeks of June. NASS also publishes an independent estimate of the number of farms from the quinquennial Census&amp;nbsp;of Agriculture. Studies conducted by NASS have shown that farm number estimates from the JAS can be biased,&amp;nbsp;mainly due to misclassification of agricultural tracts during the pre-screening and data collection processes. To adjust&amp;nbsp;for the bias, NASS has developed a capture-recapture model that uses NASS’s list frame as the second sample, where&amp;nbsp;estimation is performed based on records in the JAS with matches in the list frame. In the current paper, we describe&amp;nbsp;an alternative capture-recapture approach that uses all available data from the JAS and the Census of Agriculture to&amp;nbsp;correct for biases due to misclassification and to produce more stable farm number estimates.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">P. A. Rudnick</style></author><author><style face="normal" font="default" size="100%">X. Wang</style></author><author><style face="normal" font="default" size="100%">E. Yan</style></author><author><style face="normal" font="default" size="100%">Sedransk, N.</style></author><author><style face="normal" font="default" size="100%">S. E. Stein</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Improved Normalization of Systematic Biases Affecting Ion Current Measurements in Label-free Proteomics Data</style></title><secondary-title><style face="normal" font="default" size="100%">Molecular &amp; Cellular Proteomics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2014</style></year></dates><volume><style face="normal" font="default" size="100%">13</style></volume><pages><style face="normal" font="default" size="100%">1341-1351</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><issue><style face="normal" font="default" size="100%">5</style></issue></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Abbatiello, S.</style></author><author><style face="normal" font="default" size="100%">Feng, X.</style></author><author><style face="normal" font="default" size="100%">Sedransk, N.</style></author><author><style face="normal" font="default" size="100%">Mani, DR</style></author><author><style face="normal" font="default" size="100%">Schilling, B</style></author><author><style face="normal" font="default" size="100%">Maclean, B</style></author><author><style face="normal" font="default" size="100%">Zimmerman, LJ</style></author><author><style face="normal" font="default" size="100%">Cusack, MP</style></author><author><style face="normal" font="default" size="100%">Hall, SC</style></author><author><style face="normal" font="default" size="100%">Addona, T</style></author><author><style face="normal" font="default" size="100%">Allen, S</style></author><author><style face="normal" font="default" size="100%">Dodder, NG</style></author><author><style face="normal" font="default" size="100%">Ghosh, M</style></author><author><style face="normal" font="default" size="100%">Held, JM</style></author><author><style face="normal" font="default" size="100%">Hedrick, V</style></author><author><style face="normal" font="default" size="100%">Inerowicz, HD</style></author><author><style face="normal" font="default" size="100%">Jackson, A</style></author><author><style face="normal" font="default" size="100%">Keshishian, H</style></author><author><style face="normal" font="default" size="100%">Kim, JW</style></author><author><style face="normal" font="default" size="100%">Lyssand, JS</style></author><author><style face="normal" font="default" size="100%">Riley, CP</style></author><author><style face="normal" font="default" size="100%">Rudnick, P</style></author><author><style face="normal" font="default" size="100%">Sadowski, P</style></author><author><style face="normal" font="default" size="100%">Shaddox, K</style></author><author><style face="normal" font="default" size="100%">Smith, D</style></author><author><style face="normal" font="default" size="100%">Tomazela, D</style></author><author><style face="normal" font="default" size="100%">Wahlander, A</style></author><author><style face="normal" font="default" size="100%">Waldemarson, S</style></author><author><style face="normal" font="default" size="100%">Whitwell, CA</style></author><author><style face="normal" font="default" size="100%">You, J</style></author><author><style face="normal" font="default" size="100%">Zhang, S</style></author><author><style face="normal" font="default" size="100%">Kinsinger, CR</style></author><author><style face="normal" font="default" size="100%">Mesri, M</style></author><author><style face="normal" font="default" size="100%">Rodriguez, H</style></author><author><style face="normal" font="default" size="100%">Borchers, CH</style></author><author><style face="normal" font="default" size="100%">Buck, C</style></author><author><style face="normal" font="default" size="100%">Fisher, SJ</style></author><author><style face="normal" font="default" size="100%">Gibson, BW</style></author><author><style face="normal" font="default" size="100%">Liebler, D</style></author><author><style face="normal" font="default" size="100%">Maccoss, M</style></author><author><style face="normal" font="default" size="100%">Neubert, TA</style></author><author><style face="normal" font="default" size="100%">Paulovich, A</style></author><author><style face="normal" font="default" size="100%">Regnier, F</style></author><author><style face="normal" font="default" size="100%">Skates, SJ</style></author><author><style face="normal" font="default" size="100%">Tempst, P</style></author><author><style face="normal" font="default" size="100%">Wang, M</style></author><author><style face="normal" font="default" size="100%">Carr, SA</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Design, Implementation and Multisite Evaluation of a System Suitability Protocol for the Quantitative Assessment of Instrument Performance in Liquid Chromatography-Multiple Reaction Monitoring-MS (LC-MRM-MS)</style></title><secondary-title><style face="normal" font="default" size="100%">Molecular and Cellular Proteomics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2013</style></year></dates><volume><style face="normal" font="default" size="100%">12</style></volume><pages><style face="normal" font="default" size="100%">2623-2639</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Multiple reaction monitoring (MRM) mass spectrometry coupled with stable isotope dilution (SID) and liquid chromatography (LC) is increasingly used in biological and clinical studies for precise and reproducible quantification of peptides and proteins in complex sample matrices. Robust LC-SID-MRM-MS-based assays that can be replicated across laboratories and ultimately in clinical laboratory settings require standardized protocols to demonstrate that the analysis platforms are performing adequately. We developed a system suitability protocol (SSP), which employs a predigested mixture of six proteins, to facilitate performance evaluation of LC-SID-MRM-MS instrument platforms, configured with nanoflow-LC systems interfaced to triple quadrupole mass spectrometers. The SSP was designed for use with low multiplex analyses as well as high multiplex approaches when software-driven scheduling of data acquisition is required. Performance was assessed by monitoring of a range of chromatographic and mass spectrometric metrics including peak width, chromatographic resolution, peak capacity, and the variability in peak area and analyte retention time (RT) stability. The SSP, which was evaluated in 11 laboratories on a total of 15 different instruments, enabled early diagnoses of LC and MS anomalies that indicated suboptimal LC-MRM-MS performance. The observed range in variation of each of the metrics scrutinized serves to define the criteria for optimized LC-SID-MRM-MS platforms for routine use, with pass/fail criteria for system suitability performance measures defined as peak area coefficient of variation &amp;lt;0.15, peak width coefficient of variation &amp;lt;0.15, standard deviation of RT &amp;lt;0.15 min (9 s), and the RT drift &amp;lt;0.5min (30 s). The deleterious effect of a marginally performing LC-SID-MRM-MS system on the limit of quantification (LOQ) in targeted quantitative assays illustrates the use and need for a SSP to establish robust and reliable system performance. Use of a SSP helps to ensure that analyte quantification measurements can be replicated with good precision within and across multiple laboratories and should facilitate more widespread use of MRM-MS technology by the basic biomedical and clinical laboratory research communities.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Sedransk, N.</style></author><author><style face="normal" font="default" size="100%">Young, L.</style></author><author><style face="normal" font="default" size="100%">Spiegelman, C.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Data, Statistics and Controversy: Making Scientific Data Intelligible</style></title><secondary-title><style face="normal" font="default" size="100%">Statistics, Politics and Policy</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">data availability</style></keyword><keyword><style  face="normal" font="default" size="100%">Daubert rule</style></keyword><keyword><style  face="normal" font="default" size="100%">inference verification</style></keyword><keyword><style  face="normal" font="default" size="100%">meta-data</style></keyword><keyword><style  face="normal" font="default" size="100%">proprietary data</style></keyword><keyword><style  face="normal" font="default" size="100%">publication bias</style></keyword><keyword><style  face="normal" font="default" size="100%">reuse of data</style></keyword><keyword><style  face="normal" font="default" size="100%">secondary analysis</style></keyword><keyword><style  face="normal" font="default" size="100%">synthetic data</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2012</style></year></dates><number><style face="normal" font="default" size="100%">1</style></number><volume><style face="normal" font="default" size="100%">3</style></volume><pages><style face="normal" font="default" size="100%">1-20</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Making published, scientific research data publicly available can benefit scientists and policy makers only if there is sufficient information for these data to be intelligible. Thus the necessary meta-data go beyond the scientific, technological detail and extend to the statistical approach and methodologies applied to these data. The statistical principles that give integrity to researchers’ analyses and interpretations of their data require documentation. This is true when the intent is to verify or validate the published research findings; it is equally true when the intent is to utilize the scientific data in conjunction with other data or new experimental data to explore complex questions; and it is profoundly important when the scientific results and interpretations are taken outside the world of science to establish a basis for policy, for legal precedent or for decision-making. When research draws on already public data bases, e.g., a large federal statistical data base or a large scientific data base, selection of data for analysis, whether by selection (subsampling) or by aggregating, is specific to that research so that this (statistical) methodology is a crucial part of the meta-data. Examples illustrate the role of statistical meta-data in the use and reuse of these public datasets and the impact on public policy and precedent.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Nell Sedransk</style></author><author><style face="normal" font="default" size="100%">Lawrence H. Cox</style></author><author><style face="normal" font="default" size="100%">Deborah Nolan</style></author><author><style face="normal" font="default" size="100%">Keith Soper</style></author><author><style face="normal" font="default" size="100%">Cliff Spiegelman</style></author><author><style face="normal" font="default" size="100%">Linda J. Young</style></author><author><style face="normal" font="default" size="100%">Katrina L. Kelner</style></author><author><style face="normal" font="default" size="100%">Robert A. Moffitt</style></author><author><style face="normal" font="default" size="100%">Ani Thakar</style></author><author><style face="normal" font="default" size="100%">Jordan Raddick</style></author><author><style face="normal" font="default" size="100%">Edward J. Ungvarsky</style></author><author><style face="normal" font="default" size="100%">Richard W. Carlson</style></author><author><style face="normal" font="default" size="100%">Rolf Apweiler</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Make research data public? - Not always so simple: A Dialogue for statisticians and science editors</style></title><secondary-title><style face="normal" font="default" size="100%">Statistical Science</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2011</style></year></dates><number><style face="normal" font="default" size="100%">1</style></number><volume><style face="normal" font="default" size="100%">5</style></volume><pages><style face="normal" font="default" size="100%">41-50</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Putting data into the public domain is not the same thing as making those data accessible for intelligent analysis. A distinguished group of editors and experts who were already engaged in one way or another with the issues inherent in making research data public came together with statisticians to initiate a dialogue about policies and practicalities of requiring published research to be accompanied by publication of the research data. This dialogue carried beyond the broad issues of the advisability, the intellectual integrity, the scientific exigencies to the relevance of these issues to statistics as a discipline and the relevance of statistics, from inference to modeling to data exploration, to science and social science policies on these issues.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Fogel, P.</style></author><author><style face="normal" font="default" size="100%">Gobinet, C.</style></author><author><style face="normal" font="default" size="100%">Young, S.S.</style></author><author><style face="normal" font="default" size="100%">Zugaj, D.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Evaluation of unmixing methods for the separation of Quantum Dot sources</style></title><secondary-title><style face="normal" font="default" size="100%">Hyperspectral Image and Signal Processing: Evolution in Remote Sensing, 2009. WHISPERS ’09. First Workshop on</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Bayesian methods</style></keyword><keyword><style  face="normal" font="default" size="100%">Bayesian positive source separation</style></keyword><keyword><style  face="normal" font="default" size="100%">BPSS</style></keyword><keyword><style  face="normal" font="default" size="100%">cadmium compounds</style></keyword><keyword><style  face="normal" font="default" size="100%">CdSe</style></keyword><keyword><style  face="normal" font="default" size="100%">consensus nonnegative matrix factorization</style></keyword><keyword><style  face="normal" font="default" size="100%">Fluorescence</style></keyword><keyword><style  face="normal" font="default" size="100%">hyperspectral images</style></keyword><keyword><style  face="normal" font="default" size="100%">Hyperspectral imaging</style></keyword><keyword><style  face="normal" font="default" size="100%">hyperspectral system</style></keyword><keyword><style  face="normal" font="default" size="100%">ICA</style></keyword><keyword><style  face="normal" font="default" size="100%">II-VI semiconductors</style></keyword><keyword><style  face="normal" font="default" size="100%">independent component analysis</style></keyword><keyword><style  face="normal" font="default" size="100%">Nanobioscience</style></keyword><keyword><style  face="normal" font="default" size="100%">Nanocrystals</style></keyword><keyword><style  face="normal" font="default" size="100%">nanometer dimensions</style></keyword><keyword><style  face="normal" font="default" size="100%">NMF</style></keyword><keyword><style  face="normal" font="default" size="100%">Photonic crystals</style></keyword><keyword><style  face="normal" font="default" size="100%">Probes</style></keyword><keyword><style  face="normal" font="default" size="100%">quantum dot sources</style></keyword><keyword><style  face="normal" font="default" size="100%">Quantum dots</style></keyword><keyword><style  face="normal" font="default" size="100%">semiconductor crystals</style></keyword><keyword><style  face="normal" font="default" size="100%">semiconductor quantum dots</style></keyword><keyword><style  face="normal" font="default" size="100%">Source separation</style></keyword><keyword><style  face="normal" font="default" size="100%">spatial localization</style></keyword><keyword><style  face="normal" font="default" size="100%">ultraviolet spectra</style></keyword><keyword><style  face="normal" font="default" size="100%">unmixing methods</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2009</style></year></dates><pages><style face="normal" font="default" size="100%">1-4</style></pages><isbn><style face="normal" font="default" size="100%">978-1-4244-4686-5</style></isbn><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Quantum Dots (QDs) are semiconductor crystals with nanometer dimensions, which have fluorescence properties that can be adjusted through controlling their diameter. Under ultraviolet light excitation, these nanocrystals re-emit photons in the visible spectrum, with a wavelength ranging from red to blue as their size diminishes. We created an experiment to evaluate unmixing methods for hyperspectral images. The wells of a matrix [3 times 3] were filled with individual or up to three of five QDs. The matrix was imaged by a hyperspectral system (Photon Etc., Montreal, QC, CA) and a data ldquocuberdquo of 512 rows times 512 columns times 63 wavelengths was generated. For unmixing, we tested three approaches: independent component analysis (ICA), Bayesian positive source separation (BPSS) and our new consensus non-negative matrix factorization (CNFM) method. For each of these methods, we assessed the ability to separate the different sources from both spectral and spatial localization points of view. In this situation, we showed that BPSS and CNMF model estimates were very close to the original design of our experiment and were better than the ICA results. However, the time needed for the BPSS model to converge is substantially higher than CNMF. In addition, we show how the CNMF coefficients can be used to provide reasonable bounds for the number of sources, a key issue for unmixing methods, and allow for an effective segmentation of the spatial signal.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Michael Last</style></author><author><style face="normal" font="default" size="100%">Gheorghe Luta</style></author><author><style face="normal" font="default" size="100%">Alex Orso</style></author><author><style face="normal" font="default" size="100%">Adam Porter</style></author><author><style face="normal" font="default" size="100%">Stan Young</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Pooled ANOVA</style></title><secondary-title><style face="normal" font="default" size="100%">Computational Statistics &amp; Data Analysis</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2008</style></year></dates><volume><style face="normal" font="default" size="100%">52</style></volume><pages><style face="normal" font="default" size="100%">5215</style></pages><language><style face="normal" font="default" size="100%">eng</style></language></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Wang, X. S.</style></author><author><style face="normal" font="default" size="100%">Salloum, G.A.</style></author><author><style face="normal" font="default" size="100%">Chipman, H.A.</style></author><author><style face="normal" font="default" size="100%">Welch, W.J.</style></author><author><style face="normal" font="default" size="100%">Young, S.S.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Exploration of cluster structure-activity relationship analysis in efficient high-throughput screening</style></title><secondary-title><style face="normal" font="default" size="100%">Journal of Chemical Information and Modeling</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2007</style></year></dates><volume><style face="normal" font="default" size="100%">47</style></volume><pages><style face="normal" font="default" size="100%">1206-1214</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Sequential screening has become increasingly popular in drug discovery. It iteratively builds quantitative structure-activity relationship (QSAR) models from successive high-throughput screens, making screening more effective and efficient. We compare cluster structure-activity relationship analysis (CSARA) as a QSAR method with recursive partitioning (RP), by designing three strategies for sequential collection and analysis of screening data. Various descriptor sets are used in the QSAR models to characterize chemical structure, including high-dimensional sets and some that by design have many variables not related to activity. The results show that CSARA outperforms RP. We also extend the CSARA method to deal with a continuous assay measurement.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Fogel, P.</style></author><author><style face="normal" font="default" size="100%">Young, S.S.</style></author><author><style face="normal" font="default" size="100%">Hawkins, D.M.</style></author><author><style face="normal" font="default" size="100%">Ledirac, N</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Inferential, robust non-negative matrix factorization analysis of microarray data</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2007</style></year></dates><volume><style face="normal" font="default" size="100%">23</style></volume><pages><style face="normal" font="default" size="100%">44-49</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Motivation: Modern methods such as microarrays, proteomics and metabolomics often produce datasets where there are many more predictor variables than observations. Research in these areas is often exploratory; even so, there is interest in statistical methods that accurately point to effects that are likely to replicate. Correlations among predictors are used to improve the statistical analysis. We exploit two ideas: non-negative matrix factorization methods that create ordered sets of predictors; and statistical testing within ordered sets which is done sequentially, removing the need for correction for multiple testing within the set. Results: Simulations and theory point to increased statistical power. Computational algorithms are described in detail. The analysis and biological interpretation of a real dataset are given. In addition to the increased power, the benefit of our method is that the organized gene lists are likely to lead better understanding of the biology. Availability: An SAS JMP executable script is available from http://www.niss.org/irMF&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Young, S.S.</style></author><author><style face="normal" font="default" size="100%">Fogel, P.</style></author><author><style face="normal" font="default" size="100%">Hawkins, D.M.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Clustering Scotch Whiskies using Non-Negative Matrix Factorization</style></title><secondary-title><style face="normal" font="default" size="100%">Q&amp;SPES News</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2006</style></year></dates><volume><style face="normal" font="default" size="100%">14</style></volume><pages><style face="normal" font="default" size="100%">11-13</style></pages><language><style face="normal" font="default" size="100%">eng</style></language></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Alan F. Karr</style></author><author><style face="normal" font="default" size="100%">Fulp, WJ</style></author><author><style face="normal" font="default" size="100%">F. Vera</style></author><author><style face="normal" font="default" size="100%">Young, S.S.</style></author><author><style face="normal" font="default" size="100%">X. Lin</style></author><author><style face="normal" font="default" size="100%">J. P. Reiter</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Secure, privacy-preserving analysis of distributed databases</style></title><secondary-title><style face="normal" font="default" size="100%">Technometrics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2006</style></year></dates><volume><style face="normal" font="default" size="100%">48</style></volume><pages><style face="normal" font="default" size="100%">133-143</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;There is clear value, in both industrial and government settings, derived from performing statistical analyses that, in effect, integrate data in multiple, distributed databases. However, the barriers to actually integrating the data can be substantial or even insurmountable. Corporations may be unwilling to share proprietary databases such as chemical databases held by pharmaceutical manufacturers, government agencies are subject to laws protecting confidentiality of data subjects, and even the sheer volume of the data may preclude actual data integration. In this paper, we show how tools from modern information technology?specifically, secure multiparty computation and networking?can be used to perform statistically valid analyses of distributed databases. The common characteristic of the methods we describe is that the owners share sufficient statistics computed on the local databases in a way that protects each owner from the others. That is, while each owner can calculate the ?complement ? of its contribution to the analysis, it cannot discern which other owners contributed what to that complement. Our focus is on horizontally partitioned data: the data records rather than the data attributes are spread among the owners. We present protocols for secure regression, contingency tables, maximum likelihood and Bayesian analysis. For low-risk situations, we describe a secure data integration protocol that integrates the databases but prevents owners from learning the source of data records other than their own. Finally, we outline three current research directions: a software system implementing the protocols, secure EM algorithms, and partially trusted third parties, which reduce incentives to owners not to be honest.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">A. F. Karr</style></author><author><style face="normal" font="default" size="100%">J. Feng</style></author><author><style face="normal" font="default" size="100%">X. Lin</style></author><author><style face="normal" font="default" size="100%">J. P. Reiter</style></author><author><style face="normal" font="default" size="100%">A. P. Sanil</style></author><author><style face="normal" font="default" size="100%">Young, S.S.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Data dissemination and disclosure limitation in a world without microdata: A risk-utility framework for remote access analysis servers</style></title><secondary-title><style face="normal" font="default" size="100%">Statistical Science</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2005</style></year></dates><number><style face="normal" font="default" size="100%">2</style></number><volume><style face="normal" font="default" size="100%">20</style></volume><pages><style face="normal" font="default" size="100%">163-177</style></pages><language><style face="normal" font="default" size="100%">eng</style></language></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Liu, J.</style></author><author><style face="normal" font="default" size="100%">J. Feng</style></author><author><style face="normal" font="default" size="100%">Young, S.S.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">PowerMV: A Software Environment for Molecular Viewing, Descriptor Generation, Data Analysis and Hit Evaluation</style></title><secondary-title><style face="normal" font="default" size="100%">Journal of Chemical Information and Modeling</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2005</style></year></dates><volume><style face="normal" font="default" size="100%">45</style></volume><pages><style face="normal" font="default" size="100%">515-522</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Ideally, a team of biologists, medicinal chemists and information specialists will evaluate the hits from high throughput screening. In practice, it often falls to nonmedicinal chemists to make the initial evaluation of HTS hits. Chemical genetics and high content screening both rely on screening in cells or animals where the biological target may not be known. There is a need to place active compounds into a context to suggest potential biological mechanisms. Our idea is to build an operating environment to help the biologist make the initial evaluation of HTS data. To this end the operating environment provides viewing of compound structure files, computation of basic biologically relevant chemical properties and searching against biologically annotated chemical structure databases. The benefit is to help the nonmedicinal chemist, biologist and statistician put compounds into a potentially informative biological context. Although there are several similar public and private programs used in the pharmaceutical industry to help evaluate hits, these programs are often built for computational chemists. Our program is designed for use by biologists and statisticians.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Zaykin, D.V.</style></author><author><style face="normal" font="default" size="100%">Young, S.S.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Recursive partitioning as a tool for pharmcogenetic studies of complex diseases: II. Statistical considerations</style></title><secondary-title><style face="normal" font="default" size="100%">Pharmacogenomics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2005</style></year></dates><volume><style face="normal" font="default" size="100%">6</style></volume><pages><style face="normal" font="default" size="100%">77-89</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Identifying genetic variations predictive of important phenotypes, such as disease susceptibility, drug efficacy, and adverse events, remains a challenging task. There are individual polymorphisms that can be tested one at a time, but there is the more difficult problem of the identification of combinations of polymorphisms or even more complex interactions of genes with environmental factors. Diseases, drug responses or side effects can result from different mechanisms. Identification of subgroups of people where there is a common mechanism is a problem for diagnosis and prescribing of treatment. Recursive partitioning (RP) is a simple statistical tool for segmenting a population into non-overlapping groups where the response of interest, disease susceptibility, drug efficacy and adverse events are more homogeneous within the segments. We suggest that the use of RP is not only more technically feasible than other search methods but it is less susceptible to multiple-testing problems. The numbers of combinations of gene?gene and gene?environment interactions is potentially astronomical and RP greatly reduces the effective search and inference space. Moreover, the certain reliance of RP on the presence of marginal effects is justifiable as was found by using analytical and numerical arguments. In the context of haplotype analysis, results suggest that the analysis of individual SNPs is likely to be successful even when susceptibilities are determined by haplotypes. Retrospective clinical studies where cases and controls are collected will be a common design. This report provides methods that can be used to adjust the RP analysis to reflect the population incidence of the response of interest. Confidence limits on the incidence of the response in the segmented subgroups are also discussed. RP is a straightforward way to create realistic subgroups, and prediction intervals for the within-subgroup disease incidence are easily obtained.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Alan F. Karr</style></author><author><style face="normal" font="default" size="100%">Jun Feng</style></author><author><style face="normal" font="default" size="100%">Xiaodong Lin</style></author><author><style face="normal" font="default" size="100%">Ashish P. Sanil</style></author><author><style face="normal" font="default" size="100%">S. Stanley Young</style></author><author><style face="normal" font="default" size="100%">Jerome P. Reiter</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Secure analysis of distributed chemical databases without data integration</style></title><secondary-title><style face="normal" font="default" size="100%">J. Computer-Aided Molecular Design</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2005</style></year><pub-dates><date><style  face="normal" font="default" size="100%">November</style></date></pub-dates></dates><number><style face="normal" font="default" size="100%">9-10</style></number><volume><style face="normal" font="default" size="100%">19</style></volume><pages><style face="normal" font="default" size="100%">739-747</style></pages><language><style face="normal" font="default" size="100%">eng</style></language></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Jennifer Pittman Clarke</style></author><author><style face="normal" font="default" size="100%">Jerome Sacks</style></author><author><style face="normal" font="default" size="100%">S. Stanley Young</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">The construction and assessment of a statistical model for the prediction of protein assay data</style></title><secondary-title><style face="normal" font="default" size="100%">Journal of Chemical Information and Computer Science</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2002</style></year></dates><volume><style face="normal" font="default" size="100%">42</style></volume><pages><style face="normal" font="default" size="100%">729-741</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Jennifer Pittman Clarke</style></author><author><style face="normal" font="default" size="100%">Jerome Sacks</style></author><author><style face="normal" font="default" size="100%">S. Stanley Young</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">The construction and assessment of a statistical model for the prediction of protein assay data</style></title><secondary-title><style face="normal" font="default" size="100%">Journal of Chemical Information and Computer Science</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2002</style></year></dates><volume><style face="normal" font="default" size="100%">42</style></volume><pages><style face="normal" font="default" size="100%">729-741</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>5</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Young, S.Stanley</style></author><author><style face="normal" font="default" size="100%">Jerome Sacks</style></author></authors><secondary-authors><author><style face="normal" font="default" size="100%">Gundertofte, Klaus</style></author><author><style face="normal" font="default" size="100%">Jørgensen, Flemming Steen</style></author></secondary-authors></contributors><titles><title><style face="normal" font="default" size="100%">Analysis of a Large, High-Throughput Screening Data Using Recursive Partitioning</style></title><secondary-title><style face="normal" font="default" size="100%">Molecular Modeling and Prediction of Bioactivity</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2000</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">http://dx.doi.org/10.1007/978-1-4615-4141-7_17</style></url></web-urls></urls><publisher><style face="normal" font="default" size="100%">Springer US</style></publisher><pages><style face="normal" font="default" size="100%">149-156</style></pages><isbn><style face="normal" font="default" size="100%">978-1-4613-6857-1</style></isbn><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;As biological drug targets multiply through the human genome project and as the number of chemical compounds available for screening becomes very large, the expense of screening every compound against every target becomes prohibitive. We need to improve the efficiency of the drug screening process so that active compounds can be found for more biological targets and turned over to medicinal chemists for atom-by-atom optimization. We create a method for analysis of the very large, complex data sets coming from high throughput screening, and then integrate the analysis with the selection of compounds for screening so that the structure-activity rules derived from an initial compound set can be used to suggest additional compounds for screening. Cycles of screening and analysis become sequential screening rather than the mass screening of all available compounds. We extend the analysis method to deal with multivariate responses. Previously, a screening campaign might screen hundreds of thousands of compounds; sequential screening can cut the number of compounds screened by up to eighty percent. Sequential screening also gives SAR rules that can be used to mathematically screen compound collections or virtual chemical libraries.&lt;/p&gt;
</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>5</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Susan Paddock</style></author><author><style face="normal" font="default" size="100%">Michael West</style></author><author><style face="normal" font="default" size="100%">S. Stanley Young</style></author><author><style face="normal" font="default" size="100%">M. Clyde</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Bayesian Mixture Models in Exploration of Structure-Activity Relationships in Drug Design</style></title><secondary-title><style face="normal" font="default" size="100%">Statistics in Science and Technology: Case Studies 4</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">1998</style></year></dates><publisher><style face="normal" font="default" size="100%">Springer-Verlag</style></publisher><language><style face="normal" font="default" size="100%">eng</style></language></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Bloomfield, Peter</style></author><author><style face="normal" font="default" size="100%">Royle, Andy</style></author><author><style face="normal" font="default" size="100%">Yang, Qing</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Accounting for meteorological effects in measuring urban ozone levels and trends</style></title><secondary-title><style face="normal" font="default" size="100%">Atmospheric Environment</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">median polish</style></keyword><keyword><style  face="normal" font="default" size="100%">meteorological adjustment</style></keyword><keyword><style  face="normal" font="default" size="100%">nonlinear regression</style></keyword><keyword><style  face="normal" font="default" size="100%">nonparametric regression</style></keyword><keyword><style  face="normal" font="default" size="100%">Ozone concentration</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">1996</style></year></dates><volume><style face="normal" font="default" size="100%">30</style></volume><pages><style face="normal" font="default" size="100%">3067-3077</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Observed ozone concentrations are valuable indicators of possible health and environmental impacts. However, they are also used to monitor changes and trends in the sources of ozone and of its precursors, and for this purpose the influence of meteorological variables is a confounding factor. This paper examines ozone concentrations and meteorology in the Chicago area. The data are described using least absolute deviations and local regression. The key relationships observed in these analyses are then used to construct a nonlinear regression model relating ozone to meteorology. The model can be used to estimate that part of the trend in ozone levels that cannot be accounted for by trends in meteorology, and to ‘adjust’ observed ozone concentrations for anomalous weather conditions.&lt;/p&gt;
</style></abstract><section><style face="normal" font="default" size="100%">3067</style></section></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Bloomfield, Peter</style></author><author><style face="normal" font="default" size="100%">Royle, Andy</style></author><author><style face="normal" font="default" size="100%">Steinberg, Laura J.</style></author><author><style face="normal" font="default" size="100%">Yang, Qing</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Accounting for  Meteorological Effects in Measuring Urban Ozone Levels and Trends</style></title><secondary-title><style face="normal" font="default" size="100%">Atmospheric Environment</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">median polish</style></keyword><keyword><style  face="normal" font="default" size="100%">meteorological adjustment</style></keyword><keyword><style  face="normal" font="default" size="100%">nonlinear regression</style></keyword><keyword><style  face="normal" font="default" size="100%">nonparametric regression</style></keyword><keyword><style  face="normal" font="default" size="100%">Ozone concentration</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">1996</style></year></dates><volume><style face="normal" font="default" size="100%">30</style></volume><pages><style face="normal" font="default" size="100%">3067–3077</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Observed ozone concentrations are valuable indicators of possible health and environmental impacts. However, they are also used to monitor changes and trends in the sources of ozone and of its precursors, and for this purpose the influence of meteorological variables is a confounding factor. This paper examines ozone concentrations and meteorology in the Chicago area. The data are described using least absolute deviations and local regression. The key relationships observed in these analyses are then used to construct a nonlinear regression model relating ozone to meteorology. The model can be used to estimate that part of the trend in ozone levels that cannot be accounted for by trends in meteorology, and to ‘adjust’ observed ozone concentrations for anomalous weather conditions.&lt;/p&gt;
</style></abstract><issue><style face="normal" font="default" size="100%">17</style></issue></record></records></xml>