• Title/Summary/Keyword: Author Name Disambiguation

Search Result 15, Processing Time 0.022 seconds

The Impact of Name Ambiguity on Properties of Coauthorship Networks

  • Kim, Jinseok;Kim, Heejun;Diesner, Jana
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.2
    • /
    • pp.6-15
    • /
    • 2014
  • Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.

Name Disambiguation using Cycle Detection Algorithm Based on Social Networks (사회망 기반 순환 탐지 기법을 이용한 저자명 명확화 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Jeong, Ha-Na;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.306-319
    • /
    • 2009
  • A name is a key feature for distinguishing people, but we often fail to discriminate people because an author may have multiple names or multiple authors may share the same name. Such name ambiguity problems affect the performance of document retrieval, web search and database integration. Especially, in bibliography information, a number of errors may be included since there are different authors with the same name or an author name may be misspelled or represented with an abbreviation. For solving these problems, it is necessary to disambiguate the names inputted into the database. In this paper, we propose a method to solve the name ambiguity by using social networks constructed based on the relations between authors. We evaluated the effectiveness of the proposed system based on DBLP data that offer computer science bibliographic information.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seung-Woo;Jung, Han-Min;Kim, Pyung;Koo, Hee-Kwan;Lee, Mi-Kyung;Sung, Won-Kyung;Park, Dong-In
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF

A Comparison of Author Name Disambiguation Performance through Topic Modeling (토픽모델링을 통한 저자명 식별 성능 비교)

  • Kim, Ha Jin;Jung, Hyo-jung;Song, Min
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2014.08a
    • /
    • pp.149-152
    • /
    • 2014
  • 본 연구에서는 저자명 모호성 해소를 위해 토픽모델링 기법을 사용하여 저자명을 식별 하였다. 기존의 토픽모델링은 용어 자질만을 고려하였지만 본 연구에서는 제 3의 메타데이터 자질을 활용하여 ACT(Author-Conference Topic Model) 모델과 DMR(Dirichlet-multinomial Regression) 토픽모델링을 대상으로 저자명 식별 성능을 평가, 비교하였다. 또한 수작업으로 저자 식별 작업을 한 데이터셋을 기반으로 저자 당 논문 수와 토픽 수에 차이를 두고 연구를 진행하였다. 그 결과 저자명 식별에 있어 ACT 모델보다 DMR 토픽모델링의 성능이 더 우수한 것을 알 수 있었다.

  • PDF