Explore how subject matter experts in four distinct fields of work use machine learning methods to link databases together. What works? What doesn’t? Leverage their collective knowledge and discover new approaches for overcoming common methodological challenges.
This event will include four interactive presentations, a Q&A panel with all speakers, and a poster competition awarding $250 to one winner.
Participants are invited to submit a poster proposal for presentation at the workshop. Submissions (due August 1, 2019) will be reviewed by the planning committee and those who are invited to present at the workshop will receive free admission to the event, free poster printing, and a chance at winning $250. Submissions should address research that uses machine learning methods to link records. Information about the poster competition including deadlines and judging criteria can be found on the conference website.
9:30 a.m. | Introduction
Ansu Chatterjee, Institute for Research in Statistics and its Applications
9:45 - 10:30 a.m. | Linking Individuals Over Time Using Historical Census Data
Jonas Helgertz, Minnesota Population Center
A presentation outlining methods of record linkage developed within the Multigenerational Longitudinal Panel project, aiming to link men and women across U.S. full-count censuses between 1850 and 1940.
10:30 - 11:15 a.m. | Crowdsourcing Structured Data with the Zooniverse Project Builder
Samantha Blickhan, Zooniverse, Adler Planetarium
This talk will consider the opportunities of crowdsourcing in relation to record linkage, from generating data via tasks like text transcription or tagging, to verifying initial attempts of machine learning models. Specific focus will be on the Zooniverse Project Builder, a free platform where anyone can create their own crowdsourcing project.
11:30 a.m. - 12:45 p.m. | Poster Competition and Lunch (provided)
1:00 - 1:45 p.m. | Entity Resolution with Societal Impacts in Machine Learning
Rebecca C. Steorts, Duke University
A discussion of the methods for combining datasets into one database using entity resolution (also known as record linkage or de-duplication). Using the El Salvadorian conflict as a case study, we review the benefits and the challenges associated with this process.
1:45 - 2:30 p.m. | Personalization, Beyond Recommenders
Edward Chenard, Cyberian Data
Personalization is the most powerful method for customer engagement today, but recommender systems used by traditional data science teams provide limited returns. Learn why new, effective methods for personalization involve collaboration with behavioral scientists and behavioral economists.
2:30 p.m. | Coffee Break
2:45 - 3:30 p.m. | Panel
3:30 | Closing Statements
Rob Warren, Minnesota Population Center