Friday, September 24, 2021 (9 am – 12:00 Noon ET)
TSE for Data Collections about COVID
Speaker 1: Carina Cornesse, SFB 884, University of Mannheim;
Title: Errors of Representation in the Mannheim Corona Study
Authors: Carina Cornesse, SFB 884, University of Mannheim; Ulrich Krieger, SFB 884, University of Mannheim; The MCS Research Group, University of Mannheim
Abstract: The outbreak of COVID-19 has sparked a sudden demand for fast, frequent, and accurate data on the societal impact of the pandemic. At the onset of the pandemic, our interdisciplinary research group at the University of Mannheim therefore set up the Mannheim Corona Study (MCS), a rotating panel survey with daily data collection, on the basis of the long-standing probability-based online panel infrastructure of the German Internet Panel (GIP). In a team effort, our research group was able to inform political decision makers and the general public with key information to understand the social and economic developments from as early as March 2020 as well as advance social scientific knowledge through in-depth interdisciplinary research. Given the high visibility and the importance of the research topic we closely monitored data quality during fieldwork.
This presentation provides insights into potential errors of representation in the MCS. Questions we aim to address include: How much of the error in the MCS may be attributable to errors in the GIP as its underlying data infrastructure? How much of the error may be introduced due to the online mode of data collection? Do the rotating daily subsamples in the MCS differ in their errors? If so, is this attributable to the sampling process or rather due to systematic differences in nonresponse? Overall, we will explore potential errors of representation in the MCS data, link them back to TSE error components, and discuss paths for future research into errors of representation in flexible probability-based online panel infrastructures.
Speaker 2: Jason Fields, U. S. Census Bureau
Title: Revisiting the Balance: The Household Pulse Survey in a Total Survey Quality Framework
Authors: Jason Fields, U. S. Census Bureau
Abstract: The Household Pulse Survey (HPS) was developed in March, 2020 to measure the impacts of the coronavirus pandemic on households within all fifty states, the District of Columbia, and the fifteen largest metropolitan statistical areas (Fields et al., 2020). All data collection was done virtually, with sample units contacted by text message and/or email. Overall response rates were expected to be low (and have been so), which raised concerns about nonresponse bias and led to the accompanying release of the initial HPS nonresponse bias analysis (Peterson, Toribio, Farber, & Hornick, 2021). This work walks through the components of the development, implementation, and Total Survey Quality (TSQ) of the HPS. We identify specific areas to target for improvement and outline a roadmap for a rapid response data system that could grow out of the experience gained through the HPS.
Nonresponse bias, coverage errors, item nonresponse, and the absence of a comprehensive set of edits and imputations are components of the HPS which certainly raise data quality concerns within the U.S. Census Bureau, and are critical characteristics which keep the current household pulse design as an experimental survey. Biemer (2010) discusses the exchange between quality from the data collection perspective and from the data user perspective as aspects of TSQ. The HPS consciously made exchanges in those domains as methods, available resources, and systems were incorporated to create the HPS. A major priority was to emphasize speed, content, and transparency to provide data that met high data user need. This was a dramatic example of the exchanges in the TSQ framework, and tipped heavily toward the data user needs over customary cautiousness and data collection quality focus. As we revisit the design and consider modifications to maintain speed and utility, we will also be looking to increase the overall TSQ by revisiting procedures which can help reduce the error component of TSQ.
Biemer, P. P. (2010). Total Survey Error: Design, Implementation, and Evaluation. Public Opinion Quarterly, 74(5), 817-848. https://doi.org/10.1093/poq/nfq058
Fields, J., Hunter-Childs, J., Tersine, A., Sisson, J., Parker, E., Velkoff, V., Shin, H. (2020). Design and Operation of the 2020 Household Pulse Survey. Washington, DC: U.S. Census Bureau. https://www2.census.gov/programs-surveys/demo/technical-documentation/hhp/2020_HPS_Background.pdf
Peterson, S., Toribio, N., Farber, J., & Hornick, D. (2021). Nonresponse Bias Report for the 2020 Household Pulse Survey. Washington, DC: U.S. Census Bureau. https://www2.census.gov/programs-surveys/demo/technical-documentation/hhp/2020_HPS_NR_Bias_Report-final.pdf
Speaker 3: Jamie C. Moore, Institute of Social and Economic Research, University of Essex, UK
Title: Bias Prevention and Bias Adjustment in a National Longitudinal COVID-19 Survey
Authors: Jamie C. Moore, Institute of Social and Economic Research, University of Essex, UK; Michaela Benzeval, Institute of Social and Economic Research, University of Essex, UK; Jon Burton, Institute of Social and Economic Research, University of Essex, UK; Thomas F. Crossley , European University Institute, Florence, Italy; Paul Fisher, Institute of Social and Economic Research, University of Essex, UK: Colin Gardiner, Ipsos MORI Social Research Institute, UK; Annette Jӓckle, Institute of Social and Economic Research, University of Essex, UK
Abstract: The 2020 COVID-19 pandemic has created a number of challenges for survey practitioners. Perhaps foremost of these has been to develop and implement new survey designs with unprecedented speed whilst still seeking to maximise survey dataset quality. The need to understand the impacts of the pandemic on human populations has led policy makers to demand more timely information than ever before. Moreover, in many regions the restrictions placed on human contact in response to the pandemic have prevented face to face interviewing, previously a critical mode of survey data collection.
In the UK, Understanding Society – the UK Household Longitudinal Study (UKHLS) is the nation’s largest household panel survey. It has sampled the UK population concerning a range of health, economic and social topics, using a combination of face to face, web and telephone interviewing, since 2009. However, from April 2020, survey participants were also invited to a series of primarily web surveys designed to capture higher frequency information than the yearly UKHLS (the Understanding Society COVID-19 Study). In this paper, we describe and evaluate the impacts of the strategy employed in the COVID-19 Study to minimise a key component of total survey error, non-response error (bias). Our strategy includes both bias prevention measures aimed at reducing biases during data collection (use of a telephone follow up for web non-respondents), and bias adjustment measures aimed at reducing remaining biases (nonresponse weights). We evaluate the efficacy of our bias prevention measures by quantifying the socio-demographic characteristics of different types of respondents. We evaluate the efficacy of our bias adjustment measures by using a novel statistical test to compare weighted UKHLS main survey and COVID-19 study estimates. We find that both sets of measures reduce biases. We then discuss the implications of our findings for the design of web (and other) surveys.
Speaker 4: Andrew Phelps, U. K. Office for National Statistics
Title: Quick but Not Dirty: Balancing the Requirement to Set Up UK COVID-19-related Primary Data Collections at Pace with Maximising Data Quality for Official Statistics
Authors: Andrew Phelps, U. K. Office for National Statistics; James Scruton, U. K. Office for National Statistics
Abstract: In the Spring of 2020, there was an urgent need to stand up primary data collection activity in order to understand the spread of Covid-19 in the UK, and the social impacts on the population. Two key collections were stood up at an unprecedented pace. The Covid-19 infection survey aims to estimate infection rates and antibody response within private UK households by collecting a series of nose and throat swabs and blood samples from study participants. The survey of social impacts aims to understand how the coronavirus (COVID-19) pandemic is affecting life in Great Britain and is a weekly collection information on people’s experiences and opinions relating to the pandemic.
This presentation reflects on the activities involved in the early days of both the survey of social impacts and the Covid-19 Infection Survey that aimed to minimize measurement, coverage, sampling and non-response error in a very short period of time.
Discussant: Paul J. Lavrakas, (Private consultant)
Floor Discussion: 30 minutes
Friday, October 1, 2021 (9 am – 12:00 Noon ET)
Effects of COVID on Other Data Collection
Speaker 1: Jan van den Brakel, Methodology Department, Statistics Netherlands and Department of Quantitative Economics, Maastricht University School of Business and Economics, Netherlands
Title: Official Statistics Based on the Dutch Health Survey during the COVID-19 Pandemic
Authors: Jan van den Brakel, Methodology Department, Statistics Netherlands and Department of Quantitative Economics, Maastricht University School of Business and Economics, Netherlands; Marc Smeet, Methodology Department, Statistics Netherlands
Abstract: The Dutch Health Survey (DHS), conducted by Statistics Netherlands, is designed to produce reliable direct estimates about health, medical contacts, lifestyle and preventive behaviour of the Dutch population at an annual frequency. Data collection is based on a sequential mixed-mode design where a combination of internet participation (CAWI – Computer-assisted web interviewing) and face-to-face interviewing (CAPI - Computer-assisted personal interviewing) is applied. The COVID-19 pandemic resulted in two problems. Due to the lockdown measures face-to-face interviewing partially stopped in the years 2020 and 2021. It can be expected that this results in a sudden change in measurement and selection effects in the survey outcomes of the DHS. On top of that, this crisis made painfully clear that the production of annual data about the effect of CODID-19 on health-related themes with a delay of about one year heavily compromises the relevance of this survey. In the second quarter of 2020, there was indeed a strong external demand for more timely figures of the DHS. The sample size of the DHS, however, doesn’t allow the production of sufficiently precise direct estimates for shorter reference periods.
Both issues are solved by developing a model-based inference method to estimate quarterly figures for the eight most important key variables of the DHS. The method is based on a bivariate structural time series model. The input series are quarterly direct estimates based on the complete response of CAPI and CAWI and a series of direct estimates based on the CAWI response only. During the lockdown, the direct estimates for the complete response are missing and the time series model provides an optimal nowcast for this figure based on the observed relation in the past between both series and the observed direct estimates based on the CAWI response during the lockdown. The sample size of the DHS is to small too produce quarterly figures with a direct estimator. The aforementioned time series model is therefore used as a form of small area estimation that borrows sample information observed in previous reference periods to improve the precision of the quarterly direct estimates. In this way timely and relevant statistics that describe the effects of the corona crisis on the development of health, medical contacts, lifestyle and preventive behaviour in the Netherlands are published. Another advantage of quarterly figures is that the period of the corona crisis can be better delineated. As a result, the effect of this crisis on health is more readily apparent in the published figures.
In the presentation the time series model underlying this approach is explained and results including estimates for the bias due to the loss of CAPI will be presented.
Speaker 2: Martina Helme, U.K. Office for National Statistics
Title: Mitigating the Impact of the Pandemic on Data Quality in ONS Social Surveys, with a Specific Focus on the Labour Force Survey
Authors: Martina Helme, U.K. Office for National Statistics; Salah Merad, U.K. Office for National Statistics
Abstract: With the increased public awareness of COVID-19, survey response rates started to be affected in the UK with the beginning of March 2020. With the onset of lockdown shortly after, face-to-face data collection was ceased by the end of March, and a number of measures were put in place across ONS social surveys to change to telephone data collection, including several ways of obtaining phone contact details for the selected samples.
The switch to telephone mode resulted in response bias in that certain parts of the population that are generally harder to reach (e.g. respondents living in rented accommodation, or non-UK-born respondents) were represented to a lesser extent in the achieved sample.
In October 2020 an initial adjustment to the weights was made by introducing tenure in the weight calibration, which mitigated the issue to some extent. An additional refinement will be applied as part of the reweighting in 2021 to improve LFS estimates further. This refinement is based on administrative data from the HM Revenue and Customs (HMRC) Real Time Information system, which is the source of monthly employee payroll statistics. Linked to the Migrant Worker Scan, the data helps infer the nationality of employees. It showed smaller falls in employment by non-UK nationals than the LFS estimates suggest. To further address this issue, we are introducing an additional control in the weighting in relation to the structure of the population by country of birth.
Speaker 3: Duncan Elliott and Louisa Blackwell, U. K. Office for National Statistics
Title: Estimating How Levels of International Migration to and from the UK in 2020 Have Been Affected by the Coronavirus (COVID-19) Pandemic
Authors: Duncan Elliott, U. K. Office for National Statistics; Nicky Rogers, U. K. Office for National Statistics; Louisa Blackwell, U. K. Office for National Statistics
Abstract: Traditionally, data from the International Passenger Survey are the main source for estimating international migration to and from the UK. The data primarily give estimates of immigration, emigration and net migration by citizenship and reason for migration. These estimates feed into the annual Mid-Year Population estimates for England and Wales that are published in June each year. In March 2020 the IPS was suspended due to the coronavirus pandemic. The pandemic has disrupted our assumptions, methods and data sources around migration, necessitating innovative use of other data and modelling methods to produce timely measures. We have, therefore, used a combination of survey and administrative data in timeseries models to estimate international migration for March – July 2020.
In this presentation we describe the development and use of multivariate state space models to estimate international migration. We describe how our error framework for longitudinal administrative data sources supported the use of non-survey data in our models. Estimating international migration using non-survey data sources is complex and challenging, even pre-COVID-19. We started to see changes in migrant behaviour because of Brexit. Definitionally, 12 months or more need to pass before we define someone who has travelled to or from the UK as a migrant. These data sources are often not designed to measure international migration flows: often there are coverage issues, they are less timely and have inherent time lags due to (a) when an individual interacts with a service for example and, (b) the definitional constraints for international migration. At best we are repurposing and using these data as a proxy for international migration.
Discussant: John Eltinge, (US Census Bureau)
Floor Discussion: 30 minutes
Friday, October 8, 2021 (9 am – 12:00 Noon ET)
Total Quality for Censuses
Population census data are an essential component of national statistics because they are used for forming policies, distributing funds, calibrating surveys and a many other important uses. Errors in the census, if uncontrolled, can defeat these purposes and may even destroy public trust in official statistics. Regardless of the methodology to ensure the highest quality, the census is never perfect and there will always be errors that affect quality. In 2020, there were additional challenges from COVID19 that may have further increased the error risks. In addition, there is increasing use of administrative data to mitigate errors or even replace census data and this leads to other types of quality concerns. The focus of this session is the evaluation of census errors to understand how errors may affect census fitness for use.
Introductory Comments: Session Chair: TBA
Speaker 1 - J. David Brown, U.S. Census Bureau
Title: Using Administrative Records to Evaluate the Quality of the 2020 U.S. Census
Author: J. David Brown, U.S. Census Bureau
Abstract: We examine the quality of the 2020 U.S. Census through comparison to administrative data. We document the extent and characteristics of persons apparently counted multiple times, not alive, and not residents on the reference date. Persons in the administrative data, but not the 2020 Census, are studied to understand the magnitude and characteristics of potential omissions. We show how discrepancies between the sources vary by survey response mode.
Speaker 2 - Owen Abbott, Office for National Statistics, UK
Title: Busting the Census Count Myth
Authors: Owen Abbott, Jon Wroth-Smith and Cal Ghee, Office for National Statistics, U.K.
Abstract: The 2021 Census in England and Wales, like its predecessors in 2011 and 2001, does not produce a count of the population. All of the outputs are estimates. The estimates are derived from a series of statistical methods designed to mitigate against any errors that result from the way in which the population respond to the census collection process. This includes item imputation to deal with non-response to individual questions, identification and removal of duplicates, measurement of under-coverage using a Census Coverage Survey and imputation of households (and persons) to correct for missed households (and persons). The principle is to reduce the bias due to errors, turning the collected data into approximately unbiased statistical estimates which are subject to confidence intervals. This presentation will outline the series of methods used to produce the 2021 Census estimates and the additional challenges faced due to the COVID-19 pandemic. Administrative data is playing a larger role in 2021 than in previous censuses, and we will outline where this is being used. Lastly, to educate census users in the use of the estimates, we will describe efforts to measure the variability introduced by the statistical processes and how these will be disseminated.
Speaker 3 - Stefano Falorsi, Italian National Institute of Statistics Speaker
Title: The Italian Experience for the Population Census in the Year of COVID
Authors: Stefano Falorsi, Danila Filipponi, Silvia Loriga and Marco Di Zio.
Abstract: The paper will talk about the problem of the census estimation for the2020 Census round, in which the census master sample has not been carried out due to Covid19 outbreak. In particular, the contribution will describe how the census estimation will be done using administrative register information and data coming from 2018 and 2019 surveys. A reduced set of hypercubes will be produced for 2020 referring to variables arising mainly from administrative information: Demographic figures + level of education. Furthermore a particular focus will be devoted to the estimation of coverage rates of the Population Register exploiting administrative ‘signals of life’ more extensively than previous years.
Speaker 4 - Christiane Laperrière - Statistics Canada
Title: Quality-driven collection operations during the 2021 Canadian Census of Population
Author: Christiane Laperrière – Chief Methodologist of Census Operations, Statistics Canada
Abstract: The Canadian Census of Population involves a large-scale collection operation which covers a vast territory with different regional characteristics that calls for various collection methodologies. Many efforts were undertaken to ensure the success of the Census operations amid the COVID 19 pandemic – self-response was strongly encouraged (in particular via the electronic questionnaire) and adjustments were made to the follow-up operations to minimise in-person contacts to protect the health and safety of Canadians. Furthermore, the use of administrative data was explored in the context of a statistical contingency. This presentation will highlight the strategies that were implemented to reduce errors and to ensure high quality data for the 2021 Census. Some of these operations include telephone follow-ups to resolve ambiguities related to coverage, when respondents were uncertain about who to include on their questionnaire, and field visits to verify the occupancy status of a sample of dwellings. Field enumeration also took place, when self-response could not be obtained. During these non-response follow-up activities, quality indicators were monitored to measure progress and identify errors (such as dwelling occupancy classification errors) while there were still field enumerators available to make verifications and corrections. This monitoring also ensured that high response rates were achieved uniformly across the country. In this presentation, we will introduce these quality-driven operations and provide preliminary results based on the recent Canadian Census experience.
Discussant: Nancy Potok
Floor Discussion: 30 minutes