NISS-NASS Cross-Sector Research in Residence Program

Research Project

haystacks

The National Institute of Statistical Sciences (NISS) established a Cross-Sector Research in Residence Program in partnership with the National Agricultural Statistics Service (NASS), the survey and estimation arm of the U.S. Department of Agriculture. This new collaborative venture by NISS and the USDA was the first project of a NISS initiative to host academic-government research teams focused on specific federal agency objectives.

Each team of five people comprised of a faculty researcher in statistics, a NASS researcher, a NISS mentor, a postdoctoral fellow and a graduate student who worked intensively together at NISS during the summers of 2009 and 2010 to solve research questions posed by NASS.

Varied projects focused on advances in statistical methodology for implementation in USDA surveys and analysis of survey results.

[call_to_action]

Multivariate Imputation Mechanisms and Valid Mean Squared Error Estimation: Agricultural Resource Management Survey – Phase III

One of the objectives of the Agricultural Resource Management Survey – Phase III was to allow statisticians and economists to conduct multivariate statistical analyses of the farm economy with valid estimates for the potential error in model estimates and forecasts. NASS has been using a univariate approach to both imputation and mean-squared-error estimates, but multivariate approaches were needed to support multiple estimates and simultaneous forecasts for multiple crops. This team worked on developing a multiple-imputation scheme that could handle the complexities associated with heterogeneous data, and also the semi-continuous nature of agricultural data. The second challenge was to determine the validity of the method when the prediction models underlying its imputation fail.

New Design and Estimation Methodologies for Biased Self-Exclusion (Under-coverage): Estimation of Small Farms from Census Mail List

NASS accounts for the incompleteness of its Census Mail List (CML) by adjusting the weights of Census respondents to capture the estimated number of farms identified on the area-frame, but not on the CML. When the 2007 Census was processed, NASS also identified several valid farms that were not found in the area-frame, even though they were located in sampled area segments. This poses the question of how many farms are missed by both sources, CML and area-frame. The challenge was to develop statistical procedures to measure the number of farms missing from both frames and to incorporate these into Census weights. Cognitive issues were also addressed since many qualifying small farms do not necessarily consider themselves farms and hence fail to return the survey forms.

New Statistical Editing and Imputation Methods That Preserve Data Quality: Quarterly Agricultural Survey

NASS utilizes data cleaning/editing procedures in many of its surveys that are based on an expert opinion/analysis review process and manual intervention to correct identified data values outside of normally expected ranges. This manual editing process is time consuming and is not consistent. It can lead to edit effects that are not reflected in the measurement error process. The objective for this project was to create automated statistical/selective editing and imputation strategies that could reduce the non sampling errors and lower the survey cost by reducing the extensive staff resources currently used in the data cleaning process.

Statistical Multi-Source Predictive Models and Error Estimates: Major USDA Crop Protection Forecasts and Estimates

The USDA produces multiple forecasts of crop protection throughout the growing season and estimates production at the end-of-season or after harvest. Information is collected from multiple sources (USDA surveys and administrative/auxiliary information, including weather and remotely sensed data) and then synthesized by a panel of experts in USDA’s Agricultural Statistics Board (ASB) resulting in the official forecasts/estimates that are published. These forecasts are compared to the utilization of the crops and assessed for accuracy. Subsequently, when the actual yields are known, can improvements be made to this process via increased use of data modeling or through other approaches? How can these models or other techniques be validated during the short time period analysts have to review the inputs and publish the time sensitive official estimates?

The research teams examined these various focus areas over two consecutive summer periods. The program started in the summer of 2009 when the complete teams met at NISS.The postdoctoral fellow and graduate student spent the summer at NISS working on the project under NISS mentorship, with periodic meetings with the faculty member and the NASS researcher. During the academic year, the postdoctoral fellow resided at the USDA, continuing the work with the NASS researcher. In the summer of 2010, the teams met again at NISS and completed their work.

Postdoctoral Fellows: Patricia Gunning, Michael Robbins and Jianqiang Wang

Project Goal: 

Help the National Agricultural Statistics Service come up with more efficient ways to better count farms across the United States.

Research Team: 

Team One: Sujit K. Ghosh,North Carolina State University (NCSU);Barry Goodwin,NCSU; Darcy Mille, NASS; Tim Keller, NASS; Peter Quan, NASS; Kirk White, USDA Economic Research Service; Michael Robbins, NISS; Joshua Harbinger, NISS
Team Two: Linda Young, University of Florida; Pam Arroway, North Carolina State University; Andrea Lamas, NASS; Denise Abreu, NASS; Patricia Gunning, NISS and Kenneth Lopiano, NISS
Team Three:  Balgobin Nandram, Worcester Polytechnic Institute; Scott Holan, University of Missouri;Wendy Barboza, NASS; Edwin Anderson, NASS; Jianqiang (Jay) Wang, NISS; Criselda Toto, NISS

 

Research Presentations: 
Habiger, J., Robbins, M., and Ghosh, S (2010). An Assessment of imputation methods for the USDA’s Area Resource Management Survey.Proceedings of the Joint Statistical Meetings, www.amstat.org/membersonly/proceedings/2010Robbins, M., Ghosh, S. and Habiger, J.; (2010). Innovative imputation techniques designed for the Agricultural Resource Management Survey.Proceedings of the Joint Statistical Meetings, www.amstat.org/membersonly/proceedings/2010Miller, D., Robbins, M. and Habiger, J.(2010). Examining the challenges of missing data analysis in Phase Three of the Agricultural Resource Management Survey.Proceedings of the Joint Statistical Meetings, www.amstat.org/membersonly/proceedings/2010Ghosh, S., Robbins, M., Habiger, J. and Miller, D. (2010). Multivariate imputation methods for the Agricultural Resource Management Survey (ARMS) Data.Proceedings of the Joint Statistical Meetings, www.amstat.org/membersonly/proceedings/2010Abreu, D., Lamas, A.C., Sang, H.,Lopiano, K.K., Arroway, P., and Young, L.J. On the Feasibility of Using NASS's Sampling List Frame to Evaluate Misclassification Errors of the June Area Survey,"NASS Research Report #RDD-11-01. Internally Reviewed.Sang, H., Arroway, P.,Lopiano, K.K., Abreu, D., Lamas, A.C., and Young, L.J. Annual Land Utilization Survey (ALUS): Design and Methodology,"USDA NASS Research Report, #RDD-11-02,Internally ReviewedLopiano, K.K., Abreu, D., Lamas, A.C., Sang, H., Arroway, P., and Young, L.J. Adjusting the June Area Survey Estimate of the Number of U.S. Farms for Misclassi cation and Non-response," USDA NASS Research Report, #RDD-11-04, Internally Reviewed