Proposed Research Topics for NISS-EIA Project
Development and Testing of Alternative Editing and Imputation Methods for EIA Establishment Surveys
Research Question Can new model-based or other approaches to microdata editing and imputation improve the accuracy of EIA’s micro data and/or simplify EIA’s data processing systems?
Background: For some of EIA’s monthly surveys, regression, time series and other model-based imputation techniques have been used to impute individual respondent data that are missing due either to non-selection in the sample, non-response, or edit failures. Many other editing and imputation techniques are used within the Federal Statistical System. Alternative model-based or non-model-based approaches to imputation should be tested with the goals of improving accuracy and /or simplifying the imputation processes.
Short Description: EIA will provide monthly data at respondent level from the largest (sampled) companies, and annual data at respondent level from the census of all companies. This research project has two parts:
- Study the data sets provided and identify a minimum of three alternative imputation techniques that may be appropriate for imputing certain key variables. Some of the variables of interest will have seasonal patterns. All the proposed imputation methods should be appropriate for implementation in the context of a large-scale statistical production system.
- Empirically test the proposed methods and evaluate their accuracy relative to each other and to the methods currently used by EIA. Prepare a detailed report describing the research and the results.
Data: EIA data sets will most likely contain data collected from establishments in the electrical power, oil and/or natural gas industries. In addition to the datasets that EIA prepares specifically for this project, the researcher may identify and use other publicly available data, as appropriate.
Methodology: A wide variety of statistical editing and imputation techniques are discussed in the statistics literature. Many involve regression and/or time series modeling techniques. Other, simpler methods (e.g., hot deck) are commonly used by statistical agencies but may require creative modification to be suitable for new applications. The researcher should be familiar with standard imputation methods, survey estimation techniques and techniques for testing alternative imputation procedures.
Selected References:
- Rubin, Donald B., Multiple Imputation for Nonresponse in Surveys, Wiley, 1987
- Kish, Leslie, Survey Sampling, Wiley, 1965
- Kirkendall, N. and Sedransk, J., Data Analysis for the EIA-826: Test Results, presented at the 2005 Spring Meeting of the American Statistical Association
Combined Heat and Power Plant Fuel Allocation Methodology
Research Question: How to partition fuel used at Combined Heat and Power plants (CHP's) into fuel used for generating electricity and fuel used for other purposes (such as process heat). The issue arises because the cogeneration environments used in industry or for some commercial applications are efficient because they produce both electricity and heat. EIA wishes to capture the total amount of fuel used for electricity generation alone. Companies can and do report good data for total electricity generation, and for total fuel used.
Background: Initially, EIA collected data from CHPS concerning total fuel use, and "useful thermal output" measured in BTU. The data were used in a model to estimate the fuel used to produce the useful thermal output. In about 2002, EIA modelers stated that the data were not accurate, and that they understated the fuel used to generate electricity. In 2003, EIA changed its survey forms to ask respondents to estimate the percent of fuel that is used for electricity generation. In 2005, EIA modelers observed that data collected in this new way tended to overstate the amount of fuel used to generate electricity. EIA is in need of a better way to estimate the amount of fuel used to generate electricity. What data should be collected, and what model should be used?
Short Description: The researcher will develop, test, and evaluate a new methodology for allocating the reported fuel consumption by CHP plants between fuel for electricity and fuel for useful thermal output. The new methodology must be designed to eliminate the persistent data collections problems EIA has encountered in this area. Specific tasks will include:
- Review current and past Electric Power Division (EPD) approaches taken to this problem.
- Become familiar with EPD data collection instruments and systems.
- Perform a literature review on allocation methods and issues.
- Recommend a new methodology for performing the allocation, as well as respondent level data that would be needed.
Data: Form EIA-920 - Monthly Nonutility Power Plant Report, which contains monthly data on electricity generation and fuel use by type of fuel. Also available is information about the plant equipment, and boiler units. What data would we give them? Could take a few observations that illustrated some things in Kaplan's talk.
Methodology:
Selected References:
- Aviel Verbruggen, Combined Heat and Power (CHP) essentials, International Journal of Energy Technology and Policy 2007 - Vol. 5, No.1 pp. 1 - 16.
- Stan Kaplan, briefing to the ASA Committee on Energy Statistics, "Making Adjustments to Survey Data When Collected Data Do Not Meet Expectations," April 2006. http://www.eia.doe.gov/smg/asa_meeting_2006/spring/files/adjsurveyd.ppt
Potential Use of Cointegration Analysis in EIA Projection Models
Research Question: Are there cointegration relationships between energy price series (oil, gas, coal, electricity, etc.). If there are such relationships, what are they and how should (can) the Energy Information Administration adapt its short and/or medium term models to accommodate these relationships?
Background: West Texas Intermediate (WTI) oil spot prices and Henry Hub (HH) natural gas spot prices appear to have moved together in the past 15 years for substantial amounts of time; they have also moved in opposite directions for other 3-6 month periods. Some energy experts argue that oil and gas prices must be related in the longer term and that they tend to converge toward some long term Btu price relationship which is confused or obfuscated by the variability of weather or other unusual shocks affecting supply and demand. Others argue that the relationship is far more complex and involves all energy forms and possibly technological alternatives that may convert one form of energy to another (like coal to liquids for transportation or coal to electricity in generation).
Previous EIA analysis [7] suggested that oil and gas were cointegrated. Two major concerns about the original EIA analysis were the need to use (a) a trend term and (b) many seasonal dummy variables to smooth out some of the data points. Although a trend term is common in most time-series data, it is unsatisfying simply to say that a series will continue increasing exponentially over 20 years. The possible explanation in this case is that there may be missing critical variables, like the price of coal. Finally, the number of deleted data points, treated by using dummy variables for them, seemed uncharacteristically large.
Short Description: This project has two parts: 1) fitting cointegration models to energy, and possibly related non-energy, price series to establish long term and short term price relationships and to determine whether these have changed over time; and 2) advising EIA concerning how identified relationships should (could) be used in EIA’s short and midterm models to enhance their projection capabilities. Part 1 involves considering frequency of data (daily, weekly, monthly, annual) and modeling details such as advantages and disadvantages of including trend terms and other economic variables or commodities. Part 2 could involve work with appropriate EIA staff, possibly on location to develop sufficient understanding of EIA models to recommend how the findings could be incorporated into the short term and midterm models.
Data: EIA can provide (monthly?) time series for xxx that have been used in previous analysis, and the daily data that were used to construct them. Researcher will identify and use other publicly available time series, as appropriate.
Methodology: Cointegration analysis is a method of analysis for non-stationary time series that identifies short and long-term relationships between series. The second part of the project will require reviewing EIA’s short and long-term modeling documentation to assess how the cointegration results might be used.
Selected References:
- Alexander, Carol, Market Models, A Guide to Financial Data Analysis, Wiley, 2001
- Chan, Ngai Hang, Time Series Applications to Finance, Wiley, 2002
- Hendry, David and Juselius, Katarina, The Energy Journal, “Explaining Cointegration Analysis: Part I,” Volume 21, number 1, 200, pages 1-42.
- Hendry, David and Juselius, Katarina, The Energy Journal, “Explaining Cointegration Analysis: Part II,” Volume 22, Number 1, 2001, pages 75-120
- EIA Documentation for mid-term model, xxx module of the NEMS
- EIA Documentation for Short Term Energy Model
- Villar, Jose and Joutz, Fredrick, “The Relationship between Crude Oil and Natural Gas Prices,” feature article in EIAs National Gas Monthly, 2006 (submitted for publication), available at http://www.deia.doe.gov/pub/oil_gas/natural_gas/feature_articles/2006/re...
EIA Contact:
Dr. Andy S. Kydes
202-586-0883
akydes@eia.doe.gov
