• Title/Summary/Keyword: 서지 자질

Search Result 9, Processing Time 0.028 seconds

Construction of Research Fronts Using Factor Graph Model in the Biomedical Literature (팩터그래프 모델을 이용한 연구전선 구축: 생의학 분야 문헌을 기반으로)

  • Kim, Hea-Jin;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.1
    • /
    • pp.177-195
    • /
    • 2017
  • This study attempts to infer research fronts using factor graph model based on heterogeneous features. The model suggested by this study infers research fronts having documents with the potential to be cited multiple times in the future. To this end, the documents are represented by bibliographic, network, and content features. Bibliographic features contain bibliographic information such as the number of authors, the number of institutions to which the authors belong, proceedings, the number of keywords the authors provide, funds, the number of references, the number of pages, and the journal impact factor. Network features include degree centrality, betweenness, and closeness among the document network. Content features include keywords from the title and abstract using keyphrase extraction techniques. The model learns these features of a publication and infers whether the document would be an RF using sum-product algorithm and junction tree algorithm on a factor graph. We experimentally demonstrate that when predicting RFs, the FG predicted more densely connected documents than those predicted by RFs constructed using a traditional bibliometric approach. Our results also indicate that FG-predicted documents exhibit stronger degrees of centrality and betweenness among RFs.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seung-Woo;Jung, Han-Min;Kim, Pyung;Koo, Hee-Kwan;Lee, Mi-Kyung;Sung, Won-Kyung;Park, Dong-In
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF

Library employment for catalogers in the United States during the period 1970-1995 (정리부서 사서직의 임용조건: 1970년부터 1995년까지 미국을 중심으로)

  • 정연경
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1995.08a
    • /
    • pp.143-146
    • /
    • 1995
  • 도서관에서 기술의 진보와 표준화로 인해 분류 목록 업무가 근본적으로 달라지진 않았지만 목록 작성 방법이나 이용자에게로의 제공 방법은 분명히 변하였다. 따라서 이러한 업무를 책임져야하는 정리 부서 담당 사서는 전에 비해 여러가지 자질을 필요로 하게 되었다. 본 연구에서는 American Libraries라는 미국 도서관계 잡지의 사서직 구인 광고를 바탕으로 1970년도부터 1995년도에 이르기까지 변화한 목록사서의 임용조건을 직책명, 목록경력, 업무감독 경력, 서지 유틸리티, 기계가독형 목록 경험, 분류체계, 기타 경력, 인성, 언어, 주제 분야 등의 요소로 나누어 분석하여 관련 교육 방향을 제시하고자 한다.

  • PDF

Email Extraction and Utilization for Author Disambiguation (저자 식별을 위한 전자메일의 추출 및 활용)

  • Kang, In-Su
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2008
  • An author of a paper is represented as his/her personal name in a bibliographic record. However, the use of names to indicate authors may deteriorate recall and precision of paper and/or author search, since the same name can be shared by many different individuals and a person can write his/her name in different forms. To solve this problem, it is required to disambiguate same-name author names into different persons. As features for author resolution, previous studies have exploited bibliographic attributes such as co-authors, titles, publication information, etc. This study attempts to apply email addresses of authors to disambiguate author names. For this, we first handle the extraction of email addresses from full-text papers, and then evaluate and analyze the effect of email addresses on author resolution using a large-scale test set.

Multi-class Classification System Based on Multi-loss Linear Combination for Word Spacing and Sentence Boundary Detection (띄어쓰기 및 문장 경계 인식을 위한 다중 손실 선형 결합 기반의 다중 클래스 분류 시스템)

  • Kim, GiHwan;Seo, Jisu;Lee, Kyungyeol;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.185-188
    • /
    • 2018
  • 띄어쓰기와 문장 경계 인식은 그 성능에 따라 자연어 분석 단계에서 오류를 크게 전파하기 때문에 굉장히 중요한 문제로 인식되고 있지만 각각 서로 다른 자질을 사용하는 문제 때문에 각각 다른 모델을 사용해 순차적으로 해결하였다. 그러나 띄어쓰기와 문장 경계 인식은 완전히 다른 문제라고는 볼 수 없으며 두 모델의 순차적 수행은 앞선 모델의 오류가 다음 모델에 전파될 뿐만 아니라 시간 복잡도가 높아진다는 문제점이 있다. 본 논문에서는 띄어쓰기와 문장 경계 인식을 하나의 문제로 보고 한 번에 처리하는 다중 클래스 분류 시스템을 통해 시간 복잡도 문제를 해결하고 다중 손실 선형 결합을 사용하여 띄어쓰기와 문장 경계 인식이 서로 다른 자질을 사용하는 문제를 해결했다. 최종 모델은 띄어쓰기와 문장 경계 인식 기본 모델보다 각각 3.98%p, 0.34%p 증가한 성능을 보였다. 시간 복잡도 면에서도 단일 모델의 순차적 수행 시간보다 38.7% 감소한 수행 시간을 보였다.

  • PDF

Author Graph Generation based on Author Disambiguation (저자 식별에 기반한 저자 그래프 생성)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.42 no.1
    • /
    • pp.47-62
    • /
    • 2011
  • While an ideal author graph should have its nodes to represent authors, automatically-generated author graphs mostly use author names as their nodes due to the difficulty of resolving author names into individuals. However, employing author names as nodes of author graphs merges namesakes, otherwise separate nodes in the author graph, into the same node, which may distort the characteristics of the author graph. This study proposes an algorithm which resolves author ambiguities based on co-authorship and then yields an author graph consisting of not author name nodes but author nodes. Scientific collaboration relationship this algorithm depends on tends to produce the clustering results which minimize the over-clustering error at the expense of the under-clustering error. In experiments, the algorithm is applied to the real citation records where Korean namesakes occur, and the results are discussed.

A Study on Evaluating the Practicalness of Library and Information Courses in Korea (한국 문헌정보학 교과목의 실용성 평가에 관한 연구)

  • Noh, Young-He;Ahn, In-Ja;Choi, Sang-Ki
    • Journal of Korean Library and Information Science Society
    • /
    • v.42 no.4
    • /
    • pp.5-29
    • /
    • 2011
  • This study proposed to assess courses which are currently offered in the Department of Library and Information Science, and to explore directions for improvement. Based on field librarians' needs and opinions about the courses, we suggested separating the required, core, and elective courses. We proposed six courses including 'Internship', 'Introduction to Library and Information Science', 'Cataloging and Classification', 'Library Management', and 'Information Retrieval' as required courses, and 5 courses including 'Practice in Cataloging and Classification', 'Information Resource and Service', 'Collection Development', 'Digital Library System', 'Introduction to Bibliography' and 'Records Management and Archives' as core courses. Finally, the remaining courses were recommended as selective courses which each department could select depending on their circumstances and faculty. The important components for substantial LIS courses are as follows: timeliness of training topics, expertise of educational contents, professionalism and qualifications of faculty, specialized educational materials, and increasing the major correlation between courses and professors.

A Study on Automatic Classification of Subject Headings Using BERT Model (BERT 모형을 이용한 주제명 자동 분류 연구)

  • Yong-Gu Lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.2
    • /
    • pp.435-452
    • /
    • 2023
  • This study experimented with automatic classification of subject headings using BERT-based transfer learning model, and analyzed its performance. This study analyzed the classification performance according to the main class of KDC classification and the category type of subject headings. Six datasets were constructed from Korean national bibliographies based on the frequency of the assignments of subject headings, and titles were used as classification features. As a result, classification performance showed values of 0.6059 and 0.5626 on the micro F1 and macro F1 score, respectively, in the dataset (1,539,076 records) containing 3,506 subject headings. In addition, classification performance by the main class of KDC classification showed good performance in the class General works, Natural science, Technology and Language, and low performance in Religion and Arts. As for the performance by the category type of the subject headings, the categories of plant, legal name and product name showed high performance, whereas national treasure/treasure category showed low performance. In a large dataset, the ratio of subject headings that cannot be assigned increases, resulting in a decrease in final performance, and improvement is needed to increase classification performance for low-frequency subject headings.