Record Linkage in Python

December 8, 2017 @ 2:00 pm - 4:00 pm

Modern Languages Building (MLB), Room 2001B

Record Linkage is defined as the task of finding records within or between data sources that refer to the same entity. This workshop will introduce participants to the Python Record Linkage toolkit. We’ll start with a brief explanation of Anaconda and the Jupyter notebook environment (although not required for the participant, the instructor will be using these tools). Making use of several examples, we’ll demonstrate: 1) how to clean and standardize the data; 2) how to make pairs of records with smart indexing methods; 3) compare records with a large number of comparison and similarity measures for different types of variables; 4) supervised and unsupervised classification algorithms and 5) common record linkage evaluation tools. Although not required, we recommend all participants to have a basic knowledge of Python and Pandas, a flexible data analysis library for Python.


