• Title/Summary/Keyword: 문헌분류

Search Result 1,231, Processing Time 0.026 seconds

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Examining Suicide Tendency Social Media Texts by Deep Learning and Topic Modeling Techniques (딥러닝 및 토픽모델링 기법을 활용한 소셜 미디어의 자살 경향 문헌 판별 및 분석)

  • Ko, Young Soo;Lee, Ju Hee;Song, Min
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.3
    • /
    • pp.247-264
    • /
    • 2021
  • This study aims to create a deep learning-based classification model to classify suicide tendency by suicide corpus constructed for the present study. Also, to analyze suicide factors, the study classified suicide tendency corpus into detailed topics by using topic modeling, an analysis technique that automatically extracts topics. For this purpose, 2,011 documents of the suicide-related corpus collected from social media naver knowledge iN were directly annotated into suicide-tendency documents or non-suicide-tendency documents based on suicide prevention education manual issued by the Central Suicide Prevention Center, and we also conducted the deep learning model(LSTM, BERT, ELECTRA) performance evaluation based on the classification model, using annotated corpus data. In addition, one of the topic modeling techniques, LDA identified suicide factors by classifying thematic literature, and co-word analysis and visualization were conducted to analyze the factors in-depth.

The Meanings of Genre Classification in Library Classification: The Case of American Public Libraries (장르 분류의 사례를 통해 본 도서관 분류의 의미 - 북미 공공도서관을 중심으로 -)

  • Rho, Jee-Hyun
    • Journal of Korean Library and Information Science Society
    • /
    • v.41 no.4
    • /
    • pp.151-170
    • /
    • 2010
  • There is a growing interest in user-centered classification or reader-interest classification, as questions have arisen from the meanings and the effects of traditional library classification. American public libraries have used fiction genre classification called bookstore model as an alternative to the traditional classification schemes. As a result, accessibility to the collection was promoted and library service for their users was improved. This study intends to make a comprehensive inquiry about the philosophical background and functional features of genre classification. To the end, literature survey and interviews or e-mails with librarians in American public libraries were conducted.

  • PDF

Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation (기계번역을 이용한 교차언어 문서 범주화의 분류 성능 분석)

  • Lee, Yong-Gu
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.1
    • /
    • pp.313-332
    • /
    • 2009
  • Cross-language text categorization(CLTC) can classify documents automatically using training set from other language. In this study, collections appropriated for CLTC were extracted from KTSET. Classification performance of various CLTC methods were compared by SVM classifier using machine translation. Results showed that the classification performance in the order of poly-lingual training method, training-set translation and test-set translation. However, training-set translation could be regarded as the most useful method among CLTC, because it was efficient for machine translation and easily adapted to general environment. On the other hand, low performance was shown to be due to the feature reduction or features with no subject characteristics, which occurred in the process of machine translation of CLTC.

A Study on Time & Space Division in Literature Classification (문헌분류법의 시.공간 전개체계에 관한 연구)

  • Kim, Ja-Hoo
    • Journal of Korean Library and Information Science Society
    • /
    • v.42 no.3
    • /
    • pp.5-24
    • /
    • 2011
  • The purpose of this study aims to provide possible suggestions for the improvement KDC 5th ed. as a system. After analyzing and evaluating time & space devices of KDC 5th ed.(including DDC 22th ed. and NDC 9th ed.). such as main schedules, common auxiliary tables, internal tables and notes, suggestions for the improvement were proposed. If above suggestions are adopted, effective literature classification scheme which is suited to domestic circumstances will be certainly prepared.

A Study on Collaboration in Classification System Development Practice (분류시스템 개발과정에서의 협력에 대한 연구)

  • Park, Ok-Nam
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.42 no.4
    • /
    • pp.181-199
    • /
    • 2008
  • This study presents an empirical study of classification system design focused upon an image design team within an organizational setting. It aims to understand collaboration during design practice. Data was collected through on-site interviews, observations, and document and email reviews. This study uses social process model as a conceptual framework. The study revealed type of collaboration, factors influencing collaboration, influences of collaboration on design practice.

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

A Study on the Main Classes of DDC (DDC 주류구분법에 관한 연구)

  • Nam, Tae-Woo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.1
    • /
    • pp.27-56
    • /
    • 2009
  • The purpose of this study is to analyze on the main classes of DDC. The DDC is a general classification system which aims to classify documents of all kinds falling in any knowledge domain. At best, the order of the main classes represents a mix of Baconian and Hegelian philosophy adulterated by the practical exigencies of organization a collection of books. Each of the main classes have been subdivided further into what are technically known as divisions. This division of knowledge into the nine main classes mirrors the educational consensus of the late nineteen-century Western academic world. The DDC thus scatters subjects by discipline, and the subjects are subordinated to discipline. The DDC has been criticised for its rigidity of division by ten at every step of its division. Division by the decimal classification has been likened to the Procrustean bed.

A Research on Citation Order of Classification Scheme and Its' Application (분류체계 인용순 및 적용에 대한 연구)

  • Kim, Sungwon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.101-118
    • /
    • 2016
  • For the effective classification of complex subjects, a library classification scheme should adopt multiple division principles (or facets). Each of the multiple principles adopted for the division of complex subjects is sequentially applied at each stage of division. The order of application of these multiple principles during the process of division of complex subjects is called citation order. In order for a classification scheme to be consistent and logical, the citation order of division principles applied to classify complex subjects should be concrete and consistent. Especially, in case of enumerative classification system, decisions on citation order to represent complex subjects significantly affect the structure and organization of the classification system. There are basic principles and theoretical canons of the classification theory on the citation order and its application, but they cannot be applied solidly in the process of classification system development for practical reasons. Therefore, this paper first reviews previous works on classification theories regarding citation order, then explores the conditions and circumstances for the application of citation order.

Comparative Analysis on the Classification of the Special Areas of Sociology in KDC4 and DDC21 (KDC 제4판과 DDC 제21판의 특수사회학 관련 주제에 관한 비교분석)

  • 배영활;오동근
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.4
    • /
    • pp.53-76
    • /
    • 2002
  • This study compares and analyzes the classes in the major special areas in the sociology, called “branch sociology,” included in the Korean Decimal Classification 4th edition and Dewey Decimal Classification 21st edition. Especially it analyzes the related classes of specified areas (branch sociology) of sociology including those of arts and sports, sciences, languages, society, region, etc. class by class. In this analysis two systems show many differences in the classes included and in the locations of some classes. This analysis can be useful for the future revision of KDC.