Browse > Article
http://dx.doi.org/10.5351/KJAS.2021.34.5.849

A study on the probabilistic record linkage and its application  

Choi, Yeonok (Statistics Korea, and Department of Information and Statistics, Chungnam National University)
Lee, Sangin (Department of Information and Statistics, Chungnam National University)
Publication Information
The Korean Journal of Applied Statistics / v.34, no.5, 2021 , pp. 849-861 More about this Journal
Abstract
This paper aims to introduce the basic concept of probabilistic record linkage and its statistical framework, and describe the specific process and principle of performing it using a real example from Statistics Korea. First, we briefly describe the deterministic record linkage and compare it with probabilistic record linkage. We introduce the Fellegi-Sunter model framework for record linkage and the related paprameters: m-probability, u-probability, matched weight and decision rule. Finally, we show the detailed process of record linkage under Fellegi-Sunter model framework and evaluate the record linkage results, using sample data from the registered-based census and Population and Housing Census survey in Statistics Korea.
Keywords
probabilistic record linkage; Fellegi-Sunter model; m-probability; u-probabililty; match weight;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Christen P (2007). A two-step classification approach to unsupervised record linkage. In Proceedings of the Sixth Australasian Conference on Data Mining and Analytics, 70 111-119.
2 Jaro MA (1989) Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida, Journal of the American Statistical Association, 84, 414-420, Taylor & Francis Group   DOI
3 Newcombe HB, Kennedy JM, Axford SJ, and James AP (1959). Automatic linkage of vital records, Science, 130, 954-959, JSTOR.   DOI
4 Winkler WE (1993). Improved decision rules in the fellegi-sunter model of record linkage, 56, Citeseer
5 Winkler WE (2000). Using the EM algorithm for weight computation in the Fellegi-Sunter model of record linkage. In Proceedings of the Section on Survey Research Methods,US Bureau of the Census Washington, DC.
6 Winkler WE (1995). Matching and record linkage, Business Survey Methods, 1, 355-384, New York.
7 Winkler WE (1990). String Comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In Proceeding of the Section on Survey Research Methods,US. ERIC.
8 Christen P and Goiser K (2007). Quality and complexity measures for data linkage and deduplication, Quality Measures in Data Mining, 127-151, Springer.
9 Dunn HL (1946). Record linkage, American Journal of Public Health and the Nations Health, 36, 1412-1416, American Public Health Association.   DOI
10 Dempster AP, Laird NM, and Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B (Methodological), 39, 1-22, Wiley Online Library.   DOI
11 Feigenbaum JJ (2016). Automated census record linking: A machine learning approach(Working Paper), Harvard University, US.
12 Elfeky MG, Verykios VS,Elmagarmid AK, Ghanem TM and Huwait AR (2003). Record linkage: A machine learning approach, a toolbox, and a digital government web service,Citeseer.
13 Goeken R, Huynh L, Lynch TA and Vick R (2011). New methods of census record linking, Historical methods, 44, 7-14, Taylor & Francis.   DOI
14 Hand D and Christen P (2018). A note on using the F-measure for evaluating record linkage algorithms, Statistics and Computing, 28, 539-547, Springer.   DOI
15 Winkler WE and Thibaudeau Y (1991). An application of the Fellegi-Sunter model of record linkage to the 1990 US decennial census(Working Paper), United States Census Bureau.
16 Fellegi IP and Sunter AB (1969). A theory for record linkage, Journal of the American Statistical Association, 64, 1183-1210, Taylor & Francis.   DOI
17 Herzog TN, Scheuren FJ, and Winkler WE (2007). Data Quality and Record Linkage Techniques, Springer Science & Business Media, New York.