DOI QR코드

DOI QR Code

Review of Author Name Disambiguation Techniques for Citation Analysis

인용분석에서의 모호한 저자명 식별을 위한 방법들에 관한 고찰

  • 김현정 (서울여자대학교 사회과학대학 문헌정보학과)
  • Received : 2012.09.18
  • Accepted : 2012.09.24
  • Published : 2012.09.30

Abstract

In citation analysis, author names are often used as the unit of analysis and some authors are indexed under the same name in bibliographic databases where the citation counts are obtained from. There are many techniques for author name disambiguation, using supervised, unsupervised, or semisupervised learning algorithms. Unsupervised approach uses machine learning algorithms to extract necessary bibliographic information from large-scale databases and digital libraries, while supervised approaches use manually built training datasets for clustering author groups for combining them with learning algorithms for author name disambiguation. The study examines various techniques for author name disambiguation in the hope for finding an aid to improve the precision of citation counts in citation analysis, as well as for better results in information retrieval.

서지 데이터베이스를 이용한 인용분석연구를 진행하기 이전에 이루어져야 할 과정 중 하나가 모호한 저자명의 식별이라고 할 수 있다. 대부분 서지 데이터베이스에는 저자의 성(姓)과 이름의 이니셜만을 표기하는 경우가 많은데, 중국이나 한국 등 아시아 국가 출신의 연구자들은 같은 성을 가진 사람이 매우 많고, 이름의 이니셜까지 같은 경우도 상당히 많아서 이름검색만으로 찾고자 하는 저자를 식별해내기가 쉽지 않기 때문이다. 아시아 국가 출신의 학자들이 유난히 많은 연구분야들에서는 이러한 문제들이 더더욱 큰 문제가 되며, 인용분석 뿐만 아니라 일반적인 정보검색에서도 매우 중요한 요인이 될 수 있다. 모호한 저자명을 식별해내는 방법에는 자동화된 알고리듬을 이용하여 각각의 저자를 식별해내는 방법과 저자 클러스터링을 얻어내기 위해 일일이 수작업으로 데이터셋을 구축하는 방법, 그리고 두 가지 방법을 혼용한 반자동화된 방법 등이 있다. 본 연구는 "모호한 저자명 식별"을 위해 개발된 여러 가지 방법들을 고찰해보기로 한다.

Keywords

References

  1. 강인수. 2008. 한글 저자명 중의성 해소를 위한 기계학습 기법의 적용. 정보관리학회지, 25(3): 27-39.
  2. 강인수, 김평, 이승우, 정한민, 류범종. 2009a. 저자 식별을 위한 대용량 평가셋 구축. 한국콘텐츠학회논문지, 9(11): 455-464.
  3. 강인수, 이승우, 정한민, 김평, 구희관, 이미경, 성원경, 박동인. 2008. 저자 식별을 위한 자질 비교. 한국콘텐츠학회논문지, 8(2): 41-47.
  4. 김제민, 박영택. 2009. 저자명 모호성 해결을 위한 개념망 기반 카테고리 유틸리티. 정보처리학회논문지, 16B(3): 225-232.
  5. 김태홍, 정한민, 성원경, 김평. 2011. 대표속성을 이용한 저자개체 식별. 한국콘텐츠학회논문지, 12(1): 17-29.
  6. Cota, Ricardo G., Anderson A. Ferreira, Christiano Nascimento, Marcos Andrew Goncalves, and Alberto H.F. Laender. 2010. "An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations." Journal of the American Society for Information Science and Technology, 61(9): 1853-1870. https://doi.org/10.1002/asi.21363
  7. Dai, Andrew M. and Amos J. Storkey. 2009. "Author disambiguation: A nonparametric topic and co-authorship model." NIPS Workshop on Applications for Topic Models: Text and Beyond, December 11, 2009, Whistler, Canada.
  8. D'Angelo, Ciriaco Andrea, Christiano Giuffrida, and Giovanni Abramo. 2010. "A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments." Journal of the American Society for Information Science and Technology, 62(2): 257-269.
  9. Ferreira, Anderson A., Marcos Andre Goncalves, and Alberto H.F. Laender. 2012a. "A brief survey of automatic methods for author name disambiguation." SIGMOD Record, 41(2): 15-26. https://doi.org/10.1145/2350036.2350040
  10. Ferreira, Anderson A., Marcos Andre Goncalves, Jussara M. Almeida, Alberto H.F. Laender, and Adriano Veloso. 2012b. "A tool for generating synthetic authorship records for evaluating author name disambiguation methods." Information Sciences, 206: 42-62. https://doi.org/10.1016/j.ins.2012.04.022
  11. Ferreira, Anderson A., Adriano Veloso, Marcos Andre Goncalves, and Alberto H.F. Laender. 2010. Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 2011 Joint International Conference on Digital Libraries (JCDL '10). New York: ACM Press.
  12. Han, Hui, Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. 2004. Two supervised learning approaches for name disambiguation in author citations, In Proceedings of the 2004 Joint International Conference on Digital Libraries (JCDL '04), June 7-11, Tucson, AZ, USA.
  13. Kang, In-su, Seung-Hoon Na, Seungwoo Lee, Hanmin Jung, Pyung Kim, Won-Kyung Sung, and Jong-Hyeok Lee. 2009b. "On co-authorship for author disambiguation." Information Processing and Management, 45: 84-97. https://doi.org/10.1016/j.ipm.2008.06.006
  14. Kim, Hyunjung. 2008. Author cocitation analysis using social network analysis. In AMCIS 2008 Proceedings, Americas Conference on Information Systems, August 2008, Toronto, Canada.
  15. Levin, Michael, Stefan Krawczyk, Steven Bethard, and Dan Jurafsky. 2012. "Citation-based bootstrapping for large-scale author disambiguation." Journal of the American Society for Information Science and Technology, 63(5): 1030- 1047. https://doi.org/10.1002/asi.22621
  16. Masada, Tomonari, Atsuhiro Takasu, and Jun Adachi. 2007. Citation data clustering for author name disambiguation. The Second International Conference on Scalable Information Systems (INFOSCALE '07), June 6-8, Suzhou, China.
  17. Onodera, Natsuo, Mariko Iwasawa, Nobuyuki Midorikawa, Fuyuki Yoshikane, Kou Amano, Yutaka Ootani, Tadashi Kodama, Hiroyuki Tsunoda, and Shizuka Yamazaki. 2011. "A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search." Journal of the American Society for Information Science and Technology, 62(4): 677-690. https://doi.org/10.1002/asi.21491
  18. Qian,Yanan, Yunhua Hu, Jianling Cui, Qinghua Zheng, and Zaiqing Nie. 2011. Combining machine learning and human judgment in author disambiguation, In Proceedings of the 20th ACM international conference on information and knowledge management (CIKM '11), 1241-1246.
  19. Smalheiser, Neil R. and Vetle I. Torvik. 2009. "Author name disambiguation." Annual Review of Information Science and Technology, 43: 287-313.
  20. Song, Yang, Ian Huang, Isaac G. Councill, Jia Li, and C. Lee Giles. 2007. Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '07), 342-351.
  21. Strotmann, Andreas, and Danzhi Zhao. 2012. "Author name disambiguation: What difference does it make in author-based citation analysis?" Journal of the American Society for Information Science and Technology, 63(9): 1820-1833. https://doi.org/10.1002/asi.22695
  22. Treeratpituk, Pucktada and C. Lee Giles. 2009. Disambiguating authors in academic publications using random forests. In Proceedings of the 2009 Joint International Conference on Digital Libraries (JCDL '09). New York: ACM Press.
  23. Veloso, Adriano, Anderson A. Ferreira, Marcos Andre Goncalves, Alberto H.F. Laender, and Wagner Meira Jr. 2011. "Cost-effective on-demand associative author name disambiguation." Information Processing and Management, 48: 680-697.
  24. White, Howard D. and Belver C. Griffith. 1981. "Author cocitation: A literature measure of intellectual structure." Journal of the American Society for Information Science, 32(3): 163-171. https://doi.org/10.1002/asi.4630320302
  25. White, Howard D. and Katherine W. McCain. 1998. "Visualizing a discipline: An author co-citation analysis of information science, 1972-1995." Journal of the American Society for Information Science, 49(4): 327-355.
  26. Yang, Kai-Hsiang, Hsin-Tsung Peng, Jian-Yi Jiang, Hahn-Ming Lee, and Jan-Ming Ho. 2008. Author name disambiguation for citations using topic and web correlation. In Proceedings of the European conference on research and advanced technology for digital libraries, 14-19.

Cited by

  1. ORCID 기반의 학술 연구 결과물 저자명 식별 시스템 구축 방안에 관한 연구 vol.24, pp.1, 2012, https://doi.org/10.14699/kbiblia.2013.24.1.045