• Title/Summary/Keyword: 저자 식별

Search Result 95, Processing Time 0.025 seconds

Author Graph Generation based on Author Disambiguation (저자 식별에 기반한 저자 그래프 생성)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.42 no.1
    • /
    • pp.47-62
    • /
    • 2011
  • While an ideal author graph should have its nodes to represent authors, automatically-generated author graphs mostly use author names as their nodes due to the difficulty of resolving author names into individuals. However, employing author names as nodes of author graphs merges namesakes, otherwise separate nodes in the author graph, into the same node, which may distort the characteristics of the author graph. This study proposes an algorithm which resolves author ambiguities based on co-authorship and then yields an author graph consisting of not author name nodes but author nodes. Scientific collaboration relationship this algorithm depends on tends to produce the clustering results which minimize the over-clustering error at the expense of the under-clustering error. In experiments, the algorithm is applied to the real citation records where Korean namesakes occur, and the results are discussed.

A Survey on Machine Learning-Based Code Authorship Identification (머신 러닝 기반 코드 작성자 식별 기술에 대한 조망)

  • Kim, Hyun-Jun;Ahn, Sun-woo;Ahn, Seong-gwan;Nam, Kevin;Paek, Yun-Heung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.128-131
    • /
    • 2021
  • 본 논문에서는 특정 코드를 분석하여 해당 코드를 작성한 저자가 누구인지 식별할 수 있는 머신 러닝 기반 코드 저자 식별 기술에 대해 소개한다. 먼저 소스 코드를 분석하여 저자를 확인하는 기법들에 알아볼 것이다. 또한 저자를 식별할 수 있는 정보가 다소 소실된 바이너리 코드를 분석하여 저자를 확인하는 기법을 살펴본 다음, 저자 식별 기법의 향후 연구 방향에 대해 탐색하고자 한다.

A Study on the Construction Methods for Author Identification System of Research Outcome based on ORCID (ORCID 기반의 학술 연구 결과물 저자명 식별 시스템 구축 방안에 관한 연구)

  • Cho, Jane
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.24 no.1
    • /
    • pp.45-62
    • /
    • 2013
  • Since research articles aren't be fixed in particular system, author name of articles have been handled not by authority control but by unique identifier. While unique identifier has a tendency that moves toward open-transparent global link system, lately ORCID has been initiated. ORCID is not only link diverse existing ID system as a partner, for example, publisher's ID systems, university's research assessment systems, manuscript tracking systems but also let researchers self-claim by themselves to identify their own articles. Though, since ORCID run based on overseas' publications, it is not appropriate to adapt ORCID shortly to Korean researcher's identification. So this study suggests the direction of domestic researcher's identification system by applying ORCID.

Author Entity Identification using Representative Properties in Linked Data (대표 속성을 이용한 저자 개체 식별)

  • Kim, Tae-Hong;Jung, Han-Min;Sung, Won-Kyung;Kim, Pyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.17-29
    • /
    • 2012
  • In recent years, Linked Data that is published under an open license shows increased growth rate and comes into the spotlight due to its interoperability and openness especially in government of developed countries. However there are relatively few out-links compared with its entire number of links and most of links refer a few hub dataset. These occur because of absence of technology that identifies entities in Linked data. In this paper, we present an improved author entity resolution method that using representative properties. To solve problems of previous methods that utilizes relation with other entities(owl:sameAs, owl:differentFrom and so on) or depends on Curation, we design and evaluate an automated realtime resolution process based on multi-ontologies that respects entity's type and its logical characteristics so as to verify entities consistency. The evaluation of author entity resolution shows positive results (The average of K measuring result is 0.8533.) with 29 author information that has obtained confirmation.

Authorship Attribution Framework Using Survival Network Concept : Semantic Features and Tolerances (서바이벌 네트워크 개념을 이용한 저자 식별 프레임워크: 의미론적 특징과 특징 허용 범위)

  • Hwang, Cheol-Hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1013-1021
    • /
    • 2020
  • Malware Authorship Attribution is a research field for identifying malware by comparing the author characteristics of unknown malware with the characteristics of known malware authors. The authorship attribution method using binaries has the advantage that it is easy to collect and analyze targeted malicious codes, but the scope of using features is limited compared to the method using source code. This limitation has the disadvantage that accuracy decreases for a large number of authors. This study proposes a method of 'Defining semantic features from binaries' and 'Defining allowable ranges for redundant features using the concept of survival network' to complement the limitations in the identification of binary authors. The proposed method defines Opcode-based graph features from binary information, and defines the allowable range for selecting unique features for each author using the concept of a survival network. Through this, it was possible to define the feature definition and feature selection method for each author as a single technology, and through the experiment, it was confirmed that it was possible to derive the same level of accuracy as the source code-based analysis with an improvement of 5.0% accuracy compared to the previous study.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF

A Study on the Construction for Name Authority Data of the Korean Academic Papers (국내 학술논문 저자명 전거데이터 구축 방안에 관한 연구)

  • Lee, Seok-Hyoung;Kwak, Seung-Jin
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.21 no.1
    • /
    • pp.105-118
    • /
    • 2010
  • In this paper, we proposed the effectively method for constructing of name authority data in korean academic papers and designed the authority database system that is applied the method. For these, we analyze the requisite for identifying the author name and suggest the author identification method. Because construction of name authority record costs time and effort, and considering frequently period of large-scale acquisitions of academic papers, our suggestion includes the system that be able to manage and construct the name authority database, and that is tightly connected with the academic paper management and service systems.

A Study on the Identification Algorithm for Organization's Name of Author of Korean Science & Technology Contents (국내 과학기술콘텐츠 저자의 소속기관명 식별을 위한 소속기관명 자동 식별 알고리즘에 관한 연구)

  • Kim, Jinyoung;Lee, Seok-Hyong;Suh, Dongjun;Kim, Kwang-Young;Yoon, Jungsun
    • Journal of Digital Contents Society
    • /
    • v.18 no.2
    • /
    • pp.373-382
    • /
    • 2017
  • As the number of scientific and technical contents increases, services that support efficient search of scientific and technical contents are required. When an author's affiliation is used as a keyword, not only the contents produced by the affiliation can be searched, but also the identification rate of the search result using the author and the term as keyword can be improved. Because of the ambiguity and vagueness of the data used as a search keyword, the search result may include false negative or false positive. However, the previous research on the control through identification of the search keyword is mainly focused on the author data and terminology data. In this paper, we propose the algorithm to identify affiliations and experiment with show the experiment with scientific and technological contents held by the Korea Institute of Science and Technology Information.

Extraction of Author Identification Elements of Overseas Academic Papers on Authority Data System for Science and Technology (과학기술 전거데이터 시스템에서의 해외 학술논문 저자 식별요소 추출)

  • Choi, Hyunmi;Lee, Seokhyoung;Kim, Kwangyoung;Kim, Hwanmin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.711-713
    • /
    • 2013
  • Various human resource information of the world can be found according to spread of social network such as facebook and twitter. There are an amounts of researcher information on the science and technology area but it is difficult to find a suitable researcher for research or business such as research partner, because researcher information is not systematically arranged. To solver this problem, we are constructing authority data system for science and technology based on authority information of overseas academic papers. In this paper, in order to construct the authority data, we extracts author identification elements from millions of overseas academic papers, which are published from 1994 to 2012. There are more than 50 author identification elements such as author name, affiliation, paper title, publisher, year, keywords, co-author, co-author's affiliation in Korean, English, Chinese, and Japanese. We construct the element database by extracting and storing an author identification information based on the elements from overseas academic papers. Future works includes that the authority database for overseas academic papers is constructed by storing an academic activities of researchers after author clustering with these extracted elements. The authority data is used to improve the researcher information utilization and activate community to find a suitable research partner or a business examiner.

  • PDF

Survival network based Android Authorship Attribution considering overlapping tolerance (중복 허용 범위를 고려한 서바이벌 네트워크 기반 안드로이드 저자 식별)

  • Hwang, Cheol-hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.13-21
    • /
    • 2020
  • The Android author identification study can be interpreted as a method for revealing the source in a narrow range, but if viewed in a wide range, it can be interpreted as a study to gain insight to identify similar works through known works. The problem found in the Android author identification study is that it is an important code on the Android system, but it is difficult to find the important feature of the author due to the meaningless codes. Due to this, legitimate codes or behaviors were also incorrectly defined as malicious codes. To solve this, we introduced the concept of survival network to solve the problem by removing the features found in various Android apps and surviving unique features defined by authors. We conducted an experiment comparing the proposed framework with a previous study. From the results of experiments on 440 authors' identified apps, we obtained a classification accuracy of up to 92.10%, and showed a difference of up to 3.47% from the previous study. It used a small amount of learning data, but because it used unique features without duplicate features for each author, it was considered that there was a difference from previous studies. In addition, even in comparative experiments with previous studies according to the feature definition method, the same accuracy can be shown with a small number of features, and this can be seen that continuously overlapping meaningless features can be managed through the concept of a survival network.