• Title/Summary/Keyword: Test Set for Author Disambiguation

Search Result 4, Processing Time 0.016 seconds

A Large-scale Test Set for Author Disambiguation (저자 식별을 위한 대용량 평가셋 구축)

  • Kang, In-Su;Kim, Pyung;Lee, Seung-Woo;Jung, Han-Min;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.455-464
    • /
    • 2009
  • To overcome article-oriented search functions and provide author-oriented ones, a namesake problem for author names should be solved. Author disambiguation, proposed as its solution, assigns identifiers of real individuals to author name entities. Although recent state-of-the-art approaches to author disambiguation have reported above 90% performance, there are few academic information services which adopt author-resolving functions. This paper describes a large-scale test set for author disambiguation which was created by KISTI to foster author resolution researches. The result of these researches can be applied to academic information systems and make better service. The test set was constructed from DBLP data through web searches and manual inspection, Currently it consists of 881 author names, 41,673 author name entities, and 6,921 person identifiers.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seung-Woo;Jung, Han-Min;Kim, Pyung;Koo, Hee-Kwan;Lee, Mi-Kyung;Sung, Won-Kyung;Park, Dong-In
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF

Email Extraction and Utilization for Author Disambiguation (저자 식별을 위한 전자메일의 추출 및 활용)

  • Kang, In-Su
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2008
  • An author of a paper is represented as his/her personal name in a bibliographic record. However, the use of names to indicate authors may deteriorate recall and precision of paper and/or author search, since the same name can be shared by many different individuals and a person can write his/her name in different forms. To solve this problem, it is required to disambiguate same-name author names into different persons. As features for author resolution, previous studies have exploited bibliographic attributes such as co-authors, titles, publication information, etc. This study attempts to apply email addresses of authors to disambiguate author names. For this, we first handle the extraction of email addresses from full-text papers, and then evaluate and analyze the effect of email addresses on author resolution using a large-scale test set.