Browse > Article

Name Disambiguation using Cycle Detection Algorithm Based on Social Networks  

Shin, Dong-Wook (한양대학교 컴퓨터공학과)
Kim, Tae-Hwan (한양대학교 컴퓨터공학과)
Jeong, Ha-Na (한양대학교 컴퓨터공학과)
Choi, Joong-Min (한양대학교 컴퓨터공학과)
Abstract
A name is a key feature for distinguishing people, but we often fail to discriminate people because an author may have multiple names or multiple authors may share the same name. Such name ambiguity problems affect the performance of document retrieval, web search and database integration. Especially, in bibliography information, a number of errors may be included since there are different authors with the same name or an author name may be misspelled or represented with an abbreviation. For solving these problems, it is necessary to disambiguate the names inputted into the database. In this paper, we propose a method to solve the name ambiguity by using social networks constructed based on the relations between authors. We evaluated the effectiveness of the proposed system based on DBLP data that offer computer science bibliographic information.
Keywords
Social Networks; Name Disambiguation; Identity Uncertainty; DBLP; Cycle Detection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 O. Zaiane, J. Chen, and R. Goebel, DBconnect:. Mining research community on dblp data. In. Proceedings of WebKDD/SNAKDD 2007   DOI
2 D. Bitton, D. .J. DeWitt, Duplicate Record Elimination in Large Data Files. ACM Transactions on Database Systems, pp. 255-265, 1983
3 W. W. Cohen, H. A. Kautz, D. A. McAllester, Hardening soft information sources. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 255-259, 2000
4 Amit Bagga, Coreference, cross-document coreference, and information extraction methodologies. Computer science, Information Systems, pp. 4234, 1999
5 L. Karl Branting, A Comparative Evaluation of Name-Matching Algorithms. ICAIL'03, pp. 224-232, 2003   DOI
6 Dongwook Shin, Jinbeom Kang, Joongmin Choi, Jaeyoung Yang, Detecting Collaborative Fields Using Social Networks, 2008 Fourth International Conference on Networked Computing and Advanced Information Management, pp. 325-328, 2008   DOI
7 Dunn, H. L, Record Linkage. American Journal of Public Health 36, pp. 1412-1416, 1946   DOI   PUBMED
8 Jorge Gracia, Vanesa Lopez, Mathieu d'Aquin, Marta Sabou, Enrico Motta, Eduardo Mena, Solving Semantic Ambiguity to Improve Semantic Web based Ontology Matching. International Semantic Web Conference ISWC-2007, 2007
9 H. Han, C. L. Giles, H. Zha, C. Li, K. Tsioutsiouliklis, Two supervised learning approaches for name disambiguation in author citations. In proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296-305, 2004   DOI
10 D. B. Johnson, Finding all the elementary circuits of a directed graph, SIAM J. Comput. Vol 4. pp. 77-84, 1975   DOI
11 Y. Chen, J. Martin, Towards Robust Unsupervised Personal Name Disambiguation. EMNLP and CNLP, pp. 190-198, 2007
12 G. V. Cormack, O. Lhotak, C. R. Palmer, Estimating precision by random sampling, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 273-274, 1999   DOI
13 Krzysztof J. Cios, Witold Pedrycz, Roman W. Swiniarski, Lukasz A. Kurgan, Data Mining: A Knowledge Discovery Approach. 2003
14 M. Hernandez, S. Stolfo, The merge/purge problem for large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 127-138, 1995   DOI