• Title/Summary/Keyword: 저자 식별자

Search Result 29, Processing Time 0.027 seconds

A Large-scale Test Set for Author Disambiguation (저자 식별을 위한 대용량 평가셋 구축)

  • Kang, In-Su;Kim, Pyung;Lee, Seung-Woo;Jung, Han-Min;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.455-464
    • /
    • 2009
  • To overcome article-oriented search functions and provide author-oriented ones, a namesake problem for author names should be solved. Author disambiguation, proposed as its solution, assigns identifiers of real individuals to author name entities. Although recent state-of-the-art approaches to author disambiguation have reported above 90% performance, there are few academic information services which adopt author-resolving functions. This paper describes a large-scale test set for author disambiguation which was created by KISTI to foster author resolution researches. The result of these researches can be applied to academic information systems and make better service. The test set was constructed from DBLP data through web searches and manual inspection, Currently it consists of 881 author names, 41,673 author name entities, and 6,921 person identifiers.

Application of Machine Learning Techniques for Resolving Korean Author Names (한글 저자명 중의성 해소를 위한 기계학습기법의 적용)

  • Kang, In-Su
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.3
    • /
    • pp.27-39
    • /
    • 2008
  • In bibliographic data, the use of personal names to indicate authors makes it difficult to specify a particular author since there are numerous authors whose personal names are the same. Resolving same-name author instances into different individuals is called author resolution, which consists of two steps: calculating author similarities and then clustering same-name author instances into different person groups. Author similarities are computed from similarities of author-related bibliographic features such as coauthors, titles of papers, publication information, using supervised or unsupervised methods. Supervised approaches employ machine learning techniques to automatically learn the author similarity function from author-resolved training samples. So far however, a few machine learning methods have been investigated for author resolution. This paper provides a comparative evaluation of a variety of recent high-performing machine learning techniques on author disambiguation, and compares several methods of processing author disambiguation features such as coauthors and titles of papers.

A Study on Utilization of ORCID based Author Identifier at National Level (국가 차원의 ORCID 기반 저자 식별자 활용에 관한 연구)

  • Kim, Eun-Jeong;Noh, Kyung-Ran
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.28 no.3
    • /
    • pp.151-174
    • /
    • 2017
  • The diffusion of the internet, the advancement of ICT technology, and digital diffusion have facilitated the streamlining and acceleration of scholarly communication and speeding up research, and the paradigm of scholarly information dissemination is changing. This study introduces the ORCID, a unique author identifier, and examines the ORCID organization's activities, the advantages given to researchers and research institutes, and the membership status. In addition, this paper examines adoptions and utilizations of ORCID in major countries including USA, UK, Italy, and China. Based on this, this paper suggests the necessary considerations for utilizing ORCID in terms of governance, system elements, policy and institutional aspects in an effort to identify authors at national level.

Author Graph Generation based on Author Disambiguation (저자 식별에 기반한 저자 그래프 생성)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.42 no.1
    • /
    • pp.47-62
    • /
    • 2011
  • While an ideal author graph should have its nodes to represent authors, automatically-generated author graphs mostly use author names as their nodes due to the difficulty of resolving author names into individuals. However, employing author names as nodes of author graphs merges namesakes, otherwise separate nodes in the author graph, into the same node, which may distort the characteristics of the author graph. This study proposes an algorithm which resolves author ambiguities based on co-authorship and then yields an author graph consisting of not author name nodes but author nodes. Scientific collaboration relationship this algorithm depends on tends to produce the clustering results which minimize the over-clustering error at the expense of the under-clustering error. In experiments, the algorithm is applied to the real citation records where Korean namesakes occur, and the results are discussed.

Email Extraction and Utilization for Author Disambiguation (저자 식별을 위한 전자메일의 추출 및 활용)

  • Kang, In-Su
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2008
  • An author of a paper is represented as his/her personal name in a bibliographic record. However, the use of names to indicate authors may deteriorate recall and precision of paper and/or author search, since the same name can be shared by many different individuals and a person can write his/her name in different forms. To solve this problem, it is required to disambiguate same-name author names into different persons. As features for author resolution, previous studies have exploited bibliographic attributes such as co-authors, titles, publication information, etc. This study attempts to apply email addresses of authors to disambiguate author names. For this, we first handle the extraction of email addresses from full-text papers, and then evaluate and analyze the effect of email addresses on author resolution using a large-scale test set.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seung-Woo;Jung, Han-Min;Kim, Pyung;Koo, Hee-Kwan;Lee, Mi-Kyung;Sung, Won-Kyung;Park, Dong-In
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

The attacker group feature extraction framework : Authorship Clustering based on Genetic Algorithm for Malware Authorship Group Identification (공격자 그룹 특징 추출 프레임워크 : 악성코드 저자 그룹 식별을 위한 유전 알고리즘 기반 저자 클러스터링)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.1-8
    • /
    • 2020
  • Recently, the number of APT(Advanced Persistent Threats) attack using malware has been increasing, and research is underway to prevent and detect them. While it is important to detect and block attacks before they occur, it is also important to make an effective response through an accurate analysis for attack case and attack type, these respond which can be determined by analyzing the attack group of such attacks. Therefore, this paper propose a framework based on genetic algorithm for analyzing malware and understanding attacker group's features. The framework uses decompiler and disassembler to extract related code in collected malware, and analyzes information related to author through code analysis. Malware has unique characteristics that only it has, which can be said to be features that can identify the author or attacker groups of that malware. So, we select specific features only having attack group among the various features extracted from binary and source code through the authorship clustering method, and apply genetic algorithm to accurate clustering to infer specific features. Also, we find features which based on characteristics each group of malware authors has that can express each group, and create profiles to verify that the group of authors is correctly clustered. In this paper, we do experiment about author classification using genetic algorithm and finding specific features to express author characteristic. In experiment result, we identified an author classification accuracy of 86% and selected features to be used for authorship analysis among the information extracted through genetic algorithm.

Extraction of Author Identification Elements of Overseas Academic Papers on Authority Data System for Science and Technology (과학기술 전거데이터 시스템에서의 해외 학술논문 저자 식별요소 추출)

  • Choi, Hyunmi;Lee, Seokhyoung;Kim, Kwangyoung;Kim, Hwanmin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.711-713
    • /
    • 2013
  • Various human resource information of the world can be found according to spread of social network such as facebook and twitter. There are an amounts of researcher information on the science and technology area but it is difficult to find a suitable researcher for research or business such as research partner, because researcher information is not systematically arranged. To solver this problem, we are constructing authority data system for science and technology based on authority information of overseas academic papers. In this paper, in order to construct the authority data, we extracts author identification elements from millions of overseas academic papers, which are published from 1994 to 2012. There are more than 50 author identification elements such as author name, affiliation, paper title, publisher, year, keywords, co-author, co-author's affiliation in Korean, English, Chinese, and Japanese. We construct the element database by extracting and storing an author identification information based on the elements from overseas academic papers. Future works includes that the authority database for overseas academic papers is constructed by storing an academic activities of researchers after author clustering with these extracted elements. The authority data is used to improve the researcher information utilization and activate community to find a suitable research partner or a business examiner.

  • PDF

A Study on Improvement for Identification of Original Authors in Online Academic Information Service (온라인 학술정보 서비스 상 원저작자 식별 개선 방안 연구)

  • Jung-Wan Yeom;Song-Hwa Hong;Sang-Hyun Joo;Sam-Hyun Chun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.3
    • /
    • pp.133-138
    • /
    • 2024
  • In the modern academic research environment, the advancement of digital technology provides researchers with increasingly diverse and rich access to information, but at the same time, the issue of author identification has emerged as a new challenge. The problem of author identification is a major factor that undermines the transparency and accuracy of academic communication, potentially causing confusion in the accurate attribution of research results and the construction of research networks. In response, identifier systems such as the International Standard Name Identifier (ISNI) and Open Researcher and Contributor ID (ORCID) have been introduced, but still face limitations due to low participation by authors and inaccurate entry of information. This study focuses on researching information management methods for identification from the moment author information is first entered into the system, proposing ways to improve the accuracy of author identification and maximize the efficiency of academic information services. Through this, it aims to renew awareness of the issue of author identification within the academic community and present concrete measures that related institutions and researchers can take to solve this problem.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF