Name Disambiguation using Cycle Detection Algorithm Based on Social Networks

사회망 기반 순환 탐지 기법을 이용한 저자명 명확화 기법

  • 신동욱 (한양대학교 컴퓨터공학과) ;
  • 김태환 (한양대학교 컴퓨터공학과) ;
  • 정하나 (한양대학교 컴퓨터공학과) ;
  • 최중민 (한양대학교 컴퓨터공학과)
  • Published : 2009.04.15

Abstract

A name is a key feature for distinguishing people, but we often fail to discriminate people because an author may have multiple names or multiple authors may share the same name. Such name ambiguity problems affect the performance of document retrieval, web search and database integration. Especially, in bibliography information, a number of errors may be included since there are different authors with the same name or an author name may be misspelled or represented with an abbreviation. For solving these problems, it is necessary to disambiguate the names inputted into the database. In this paper, we propose a method to solve the name ambiguity by using social networks constructed based on the relations between authors. We evaluated the effectiveness of the proposed system based on DBLP data that offer computer science bibliographic information.

이름은 사람을 구별하기 위한 특징이지만 여러 사람이 하나의 이름을 공유하는 경우와 한 사람이 여러 이름을 사용하는 경우 때문에 이름만으로는 사람을 명확히 구별할 수 없다. 이러한 문제는 정보 검색 분야에서 문서 검색이나 웹 검색, 데이터베이스 통합 등에 영향을 미친다. 특히 서지 정보에는 저자들 중 동명이인이 존재하거나 한 저자가 축약된 이름 혹은 잘못된 철자를 사용하기도 하기 때문에 에러정보가 많이 포함되어 있다. 이러한 문제를 해결하기 위해 데이터베이스에 입력된 자료 중 이름에 대한 정보를 명확하게 해야 한다. 본 논문에서는 저자간의 관계로부터 구축된 사회망을 이용해 이름의 모호성을 해결하는 방법을 제안하고 컴퓨터 과학 서지정보를 제공하는 DBLP(Digital Bibliography & Library Project) 데이터를 기반한 실험을 통해 제안한 시스템의 성능의 효율성을 평가하였다.

Keywords

References

  1. Dunn, H. L, Record Linkage. American Journal of Public Health 36, pp. 1412-1416, 1946 https://doi.org/10.2105/AJPH.36.12.1412
  2. D. Bitton, D. .J. DeWitt, Duplicate Record Elimination in Large Data Files. ACM Transactions on Database Systems, pp. 255-265, 1983
  3. M. Hernandez, S. Stolfo, The merge/purge problem for large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 127-138, 1995 https://doi.org/10.1145/223784.223807
  4. Krzysztof J. Cios, Witold Pedrycz, Roman W. Swiniarski, Lukasz A. Kurgan, Data Mining: A Knowledge Discovery Approach. 2003
  5. W. W. Cohen, H. A. Kautz, D. A. McAllester, Hardening soft information sources. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 255-259, 2000
  6. Amit Bagga, Coreference, cross-document coreference, and information extraction methodologies. Computer science, Information Systems, pp. 4234, 1999
  7. L. Karl Branting, A Comparative Evaluation of Name-Matching Algorithms. ICAIL'03, pp. 224-232, 2003 https://doi.org/10.1145/1047788.1047837
  8. Y. Chen, J. Martin, Towards Robust Unsupervised Personal Name Disambiguation. EMNLP and CNLP, pp. 190-198, 2007
  9. H. Han, C. L. Giles, H. Zha, C. Li, K. Tsioutsiouliklis, Two supervised learning approaches for name disambiguation in author citations. In proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296-305, 2004 https://doi.org/10.1145/996350.996419
  10. O. Zaiane, J. Chen, and R. Goebel, DBconnect:. Mining research community on dblp data. In. Proceedings of WebKDD/SNAKDD 2007 https://doi.org/10.1145/1348549.1348558
  11. Jorge Gracia, Vanesa Lopez, Mathieu d'Aquin, Marta Sabou, Enrico Motta, Eduardo Mena, Solving Semantic Ambiguity to Improve Semantic Web based Ontology Matching. International Semantic Web Conference ISWC-2007, 2007
  12. Dongwook Shin, Jinbeom Kang, Joongmin Choi, Jaeyoung Yang, Detecting Collaborative Fields Using Social Networks, 2008 Fourth International Conference on Networked Computing and Advanced Information Management, pp. 325-328, 2008 https://doi.org/10.1109/NCM.2008.80
  13. D. B. Johnson, Finding all the elementary circuits of a directed graph, SIAM J. Comput. Vol 4. pp. 77-84, 1975 https://doi.org/10.1137/0204007
  14. G. V. Cormack, O. Lhotak, C. R. Palmer, Estimating precision by random sampling, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 273-274, 1999 https://doi.org/10.1145/312624.312692