• Title/Summary/Keyword: 저자식별 평가셋

Search Result 9, Processing Time 0.024 seconds

A Large-scale Test Set for Author Disambiguation (저자 식별을 위한 대용량 평가셋 구축)

  • Kang, In-Su;Kim, Pyung;Lee, Seung-Woo;Jung, Han-Min;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.455-464
    • /
    • 2009
  • To overcome article-oriented search functions and provide author-oriented ones, a namesake problem for author names should be solved. Author disambiguation, proposed as its solution, assigns identifiers of real individuals to author name entities. Although recent state-of-the-art approaches to author disambiguation have reported above 90% performance, there are few academic information services which adopt author-resolving functions. This paper describes a large-scale test set for author disambiguation which was created by KISTI to foster author resolution researches. The result of these researches can be applied to academic information systems and make better service. The test set was constructed from DBLP data through web searches and manual inspection, Currently it consists of 881 author names, 41,673 author name entities, and 6,921 person identifiers.

A Comparison of Author Name Disambiguation Performance through Topic Modeling (토픽모델링을 통한 저자명 식별 성능 비교)

  • Kim, Ha Jin;Jung, Hyo-jung;Song, Min
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2014.08a
    • /
    • pp.149-152
    • /
    • 2014
  • 본 연구에서는 저자명 모호성 해소를 위해 토픽모델링 기법을 사용하여 저자명을 식별 하였다. 기존의 토픽모델링은 용어 자질만을 고려하였지만 본 연구에서는 제 3의 메타데이터 자질을 활용하여 ACT(Author-Conference Topic Model) 모델과 DMR(Dirichlet-multinomial Regression) 토픽모델링을 대상으로 저자명 식별 성능을 평가, 비교하였다. 또한 수작업으로 저자 식별 작업을 한 데이터셋을 기반으로 저자 당 논문 수와 토픽 수에 차이를 두고 연구를 진행하였다. 그 결과 저자명 식별에 있어 ACT 모델보다 DMR 토픽모델링의 성능이 더 우수한 것을 알 수 있었다.

  • PDF

Email Extraction and Utilization for Author Disambiguation (저자 식별을 위한 전자메일의 추출 및 활용)

  • Kang, In-Su
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2008
  • An author of a paper is represented as his/her personal name in a bibliographic record. However, the use of names to indicate authors may deteriorate recall and precision of paper and/or author search, since the same name can be shared by many different individuals and a person can write his/her name in different forms. To solve this problem, it is required to disambiguate same-name author names into different persons. As features for author resolution, previous studies have exploited bibliographic attributes such as co-authors, titles, publication information, etc. This study attempts to apply email addresses of authors to disambiguate author names. For this, we first handle the extraction of email addresses from full-text papers, and then evaluate and analyze the effect of email addresses on author resolution using a large-scale test set.

Author Entity Identification using Representative Properties in Linked Data (대표 속성을 이용한 저자 개체 식별)

  • Kim, Tae-Hong;Jung, Han-Min;Sung, Won-Kyung;Kim, Pyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.17-29
    • /
    • 2012
  • In recent years, Linked Data that is published under an open license shows increased growth rate and comes into the spotlight due to its interoperability and openness especially in government of developed countries. However there are relatively few out-links compared with its entire number of links and most of links refer a few hub dataset. These occur because of absence of technology that identifies entities in Linked data. In this paper, we present an improved author entity resolution method that using representative properties. To solve problems of previous methods that utilizes relation with other entities(owl:sameAs, owl:differentFrom and so on) or depends on Curation, we design and evaluate an automated realtime resolution process based on multi-ontologies that respects entity's type and its logical characteristics so as to verify entities consistency. The evaluation of author entity resolution shows positive results (The average of K measuring result is 0.8533.) with 29 author information that has obtained confirmation.

A Method for Same Author Name Disambiguation in Domestic Academic Papers (국내 학술논문의 동명이인 저자명 식별을 위한 방법)

  • Shin, Daye;Yang, Kiduk
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.28 no.4
    • /
    • pp.301-319
    • /
    • 2017
  • The task of author name disambiguation involves identifying an author with different names or different authors with the same name. The author name disambiguation is important for correctly assessing authors' research achievements and finding experts in given areas as well as for the effective operation of scholarly information services such as citation indexes. In the study, we performed error correction and normalization of data and applied rules-based author name disambiguation to compare with baseline machine learning disambiguation in order to see if human intervention could improve the machine learning performance. The improvement of over 0.1 in F-measure by the corrected and normalized email-based author name disambiguation over machine learning demonstrates the potential of human pattern identification and inference, which enabled data correction and normalization process as well as the formation of the rule-based diambiguation, to complement the machine learning's weaknesses to improve the author name disambiguation results.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seung-Woo;Jung, Han-Min;Kim, Pyung;Koo, Hee-Kwan;Lee, Mi-Kyung;Sung, Won-Kyung;Park, Dong-In
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF

Performance Assessment of Machine Learning and Deep Learning in Regional Name Identification and Classification in Scientific Documents (머신러닝을 이용한 과학기술 문헌에서의 지역명 식별과 분류방법에 대한 성능 평가)

  • Jung-Woo Lee;Oh-Jin Kwon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.2
    • /
    • pp.389-396
    • /
    • 2024
  • Generative AI has recently been utilized across all fields, achieving expert-level advancements in deep data analysis. However, identifying regional names in scientific literature remains a challenge due to insufficient training data and limited AI application. This study developed a standardized dataset for effectively classifying regional names using address data from Korean institution-affiliated authors listed in the Web of Science. It tested and evaluated the applicability of machine learning and deep learning models in real-world problems. The BERT model showed superior performance, with a precision of 98.41%, recall of 98.2%, and F1 score of 98.31% for metropolitan areas, and a precision of 91.79%, recall of 88.32%, and F1 score of 89.54% for city classifications. These findings offer a valuable data foundation for future research on regional R&D status, researcher mobility, collaboration status, and so on.

A Study on the Design of Metadata Elements in Textbooks (교과서 메타데이터 요소 설계에 관한 연구)

  • Euikyung Oh
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.401-408
    • /
    • 2023
  • The purpose of this study is to design textbook metadata as a basic task for building a textbook database. To this end, reading textbooks were defined as a category of textbooks, and a metadata development methodology was established through previous research. In order to ensure that bibliographically essential elements are not omitted, the catalog description elements of institutions that collect, accumulate, and service textbooks such as the National Library of Korea were investigated. The elements of Dublin Core, MODS, and KEM were mapped to derive elements suitable for describing textbooks. Finally, a set of textbook metadata elements consisting of 14 elements in three categories - bibliography, context, and textbook characteristics were presented by adding publication type, genre, and curriculum period elements. The 14 elements are titles, authors, publications, formats, identification sign, languages, locations, subject names, annotation, genres, table of contents, subjects, curriculum period, and curriculum information. In this study, we contributed to this field by discussing how to organize textbook resources with national knowledge resources, and in future studies, we proposed to evaluate usability by applying metadata elements to actual textbooks and revise and supplement them according to the evaluation results.