Record Linkage Workshop

September 20, 2019

Explore how subject matter experts in four distinct fields of work use machine learning methods to link databases together. What works? What doesn’t? Leverage their collective knowledge and discover new approaches for overcoming common methodological challenges.

This event will include four interactive presentations, a Q&A panel with all speakers, and a poster competition awarding $250 to one winner. 

Poster Competition

Participants are invited to submit a poster proposal for presentation at the workshop. Submissions (due August 1, 2019) will be reviewed by the planning committee and those who are invited to present at the workshop will receive free admission to the event, free poster printing, and a chance at winning $250. Submissions should address research that uses machine learning methods to link records. Information about the poster competition including deadlines and judging criteria can be found on the conference website.


9:30 a.m. | Introduction 

Ansu Chatterjee, Institute for Research in Statistics and its Applications

9:45 - 10:30 a.m. | Linking Individuals Over Time Using Historical Census Data 

Jonas Helgertz, Minnesota Population Center

A presentation outlining methods of record linkage developed within the Multigenerational Longitudinal Panel project, aiming to link men and women across U.S. full-count censuses between 1850 and 1940.

10:30 - 11:15 a.m. | Crowdsourcing Structured Data with the Zooniverse Project Builder

Samantha Blickhan, Zooniverse, Adler Planetarium

This talk will consider the opportunities of crowdsourcing in relation to record linkage, from generating data via tasks like text transcription or tagging, to verifying initial attempts of machine learning models. Specific focus will be on the Zooniverse Project Builder, a free platform where anyone can create their own crowdsourcing project.

11:30 a.m. - 12:45 p.m. | Poster Competition and Lunch (provided)

1:00 - 1:45 p.m. | Entity Resolution with Societal Impacts in Machine Learning

Rebecca C. Steorts, Duke University 

A discussion of the methods for combining datasets into one database using entity resolution (also known as record linkage or de-duplication). Using the El Salvadorian conflict as a case study, we review the benefits and the challenges associated with this process. 

1:45 - 2:30 p.m. | Personalization, Beyond Recommenders

Edward Chenard, Cyberian Data

Personalization is the most powerful method for customer engagement today, but recommender systems used by traditional data science teams provide limited returns. Learn why new, effective methods for personalization involve collaboration with behavioral scientists and behavioral economists.

2:30 p.m. | Coffee Break

2:45 - 3:30 p.m. | Panel 

All presenters

3:30 | Closing Statements 

Rob Warren, Minnesota Population Center


Event Type


University of Minnesota, Institute for Research in Statistics and its Applications


Minnesota Population Center
National Institute of Statistical Sciences
The Commons for Research in the Social Sciences
The Life Course Center


University of Minnesota
United States
Record Linkage Workshop