Data Matching Research at the Australian National University



Raum P702 (Paulinum)



Techniques for matching, linking, and integrating data from different sources are becoming increasingly important in many application areas, including health, census, taxation, immigration, social welfare, in crime and fraud detection, in the assembly of national security intelligence, for businesses and in bibliometrics, as well as in the social sciences.

Today, data matching (also known as entity resolution, duplicate detection, and data or record linkage) not only faces computational challenges due to the increasing size of data collections and their complexity, but also operational challenges as many applications move from static environments into real-time processing and analysis of potentially large and fast data streams, where real-time matching of records is required. Finally, with the growing concerns by the public of the use of their data, privacy and confidentiality often need to be considered when personal information is being linked and shared between organisations.

In this talk I will present a short introduction to data matching, describe these above discussed challenges, and provide an overview of three areas of research currently conducted in data matching at the Australian National University:

1) Scalable real-time entity resolution on dynamic databases

2) Scalable privacy-preserving record linkage techniques

3) Efficient matching of historical census data across time


Peter Christen is an Associate Professor at the Research School of Computer Science at the Australian National University. Before moving to Australia, he received his Diploma in Computer Science Engineering from ETH Zurich in 1995 and his PhD in Computer Science from the University of Basel in 1999 (both in Switzerland). His research interests are in data mining and data matching (record linkage). He has published over 80 articles in these areas, including in 2012 the book ’Data Matching’ published by Springer. He is the principle developer of the Febrl (Freely Extensible Biomedical Record Linkage) open source data cleaning, deduplication and record linkage system. He has served on the program committees of various data mining related conferences and workshops, and has been on the organisation committee for the Australasian Data Mining (AusDM) conferences since 2006. He has also served as reviewer for a variety of top-tier international journals and books, and as assessor for the Australian and Canadian Research Councils.