• Title/Summary/Keyword: 주제분류

Search Result 978, Processing Time 0.024 seconds

Document Embedding and Image Content Analysis for Improving News Clustering System (뉴스 클러스터링 개선을 위한 문서 임베딩 및 이미지 분석 자질의 활용)

  • Kim, Siyeon;Kim, Sang-Bum
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.104-108
    • /
    • 2015
  • 많은 양의 뉴스가 생성됨에 따라 이를 효과적으로 정리하는 기법이 최근 활발히 연구되어왔다. 그 중 뉴스클러스터링은 두 뉴스가 동일사건을 다루는지를 판정하는 분류기의 성능에 의존적인데, 대부분의 경우 BoW(Bag-of-Words)기반 벡터유사도를 사용하고 있다. 본 논문에서는 BoW기반의 벡터유사도 뿐 아니라 두 문서에 포함된 사진들의 유사성 및 주제의 관련성을 측정, 이를 분류기의 자질로 추가하여 두 뉴스가 동일사건을 다루는지 판정하는 분류기의 성능을 개선하는 방법을 제안한다. 사진들의 유사성 및 주제의 관련성은 최근 각광을 받는 딥러닝기반 CNN과 신경망기반 문서임베딩을 통해 측정하였다. 실험결과 기존의 BoW기반 벡터유사도에 의한 분류기의 성능에 비해 제안하는 두 자질을 사용하였을 경우 3.4%의 성능 향상을 보여주었다.

  • PDF

Dataset construction and Automatic classification of Department information appearing in Domestic journals (국내 학술지 출현 학과정보 데이터셋 구축 및 자동분류)

  • Byungkyu Kim;Beom-Jong You;Hyoung-Seop Shim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.343-344
    • /
    • 2023
  • 과학기술 문헌을 활용한 계량정보분석에서 학과정보의 활용은 매유 유용하다. 본 논문에서는 한국과학기술인용색인데이터베이스에 등재된 국내 학술지 논문에 출현하는 대학기관 소속 저자의 학과정보를 추출하고 데이터 정제 및 학과유형 분류 처리를 통해 학과정보 데이터셋을 구축하였다. 학과정보 데이터셋을 학습데이터와 검증데이터로 이용하여 딥러닝 기반의 자동분류 모델을 구현하였으며, 모델 성능 평가 결과는 한글 학과정보 기준 98.6%와 영문 학과정보 기준 97.6%의 정확률로 측정되었다. 향후 과학기술 분야별 지적관계 분석 및 논문 주제분류 등에 학과정보 자동분류 처리기의 활용이 기대된다.

  • PDF

Examining Suicide Tendency Social Media Texts by Deep Learning and Topic Modeling Techniques (딥러닝 및 토픽모델링 기법을 활용한 소셜 미디어의 자살 경향 문헌 판별 및 분석)

  • Ko, Young Soo;Lee, Ju Hee;Song, Min
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.3
    • /
    • pp.247-264
    • /
    • 2021
  • This study aims to create a deep learning-based classification model to classify suicide tendency by suicide corpus constructed for the present study. Also, to analyze suicide factors, the study classified suicide tendency corpus into detailed topics by using topic modeling, an analysis technique that automatically extracts topics. For this purpose, 2,011 documents of the suicide-related corpus collected from social media naver knowledge iN were directly annotated into suicide-tendency documents or non-suicide-tendency documents based on suicide prevention education manual issued by the Central Suicide Prevention Center, and we also conducted the deep learning model(LSTM, BERT, ELECTRA) performance evaluation based on the classification model, using annotated corpus data. In addition, one of the topic modeling techniques, LDA identified suicide factors by classifying thematic literature, and co-word analysis and visualization were conducted to analyze the factors in-depth.

A Study on Improving a Classification for Health Categories of Internet Bookstores in Korea (국내 인터넷 서점의 건강 분야 분류체계 개선 방안에 관한 연구)

  • Choi, Ye-Jin;Chung, Yeon-Kyoung
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.3
    • /
    • pp.49-70
    • /
    • 2013
  • The purposes of this study are to analyze and compare current state of subject directories of the health field in eight internet bookstores in domestic and abroad. A comparative analysis was carried out for KDC and DDC, using names of subdivisions within the health field of internet bookstore. Also, user interviews to find their information needs about the health field in internet bookstore were conducted. And then, based upon the findings, this study proposed a design principle and a new classification for the health field of internet bookstores. With evaluations of the experts from the field, a final classification schedule (1 class, 11 division, 60 subdivision, and 16 section) was suggested. The results of this study can be used for a foundation of classifying health resources efficiently in internet bookstores and other web sites.

A Study for the Improvement of the Classification Number as the Search Device on the Library Homepage (도서관 홈페이지에서 분류기호 탐색장치의 개선방안 연구)

  • Kim, Ja-Hoo
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.215-235
    • /
    • 2008
  • The purpose of this study aims to provide possible suggestions for the improvement of the literature classification number system as the search and browsing device on the library homepage. After analyzing and evaluating literature classification number system as the search and browsing device on the homepage of library adopting DDC, suggestions for the improvement were proposed. For the purpose of maximizing the effectiveness of literature classification number system as the browsing device, DDC third summary(the thousand section) which is suited to domestic circumstances was prepared.

  • PDF

An Automated Topic Specific Web Crawler Calculating Degree of Relevance (연관도를 계산하는 자동화된 주제 기반 웹 수집기)

  • Seo Hae-Sung;Choi Young-Soo;Choi Kyung-Hee;Jung Gi-Hyun;Noh Sang-Uk
    • Journal of Internet Computing and Services
    • /
    • v.7 no.3
    • /
    • pp.155-167
    • /
    • 2006
  • It is desirable if users surfing on the Internet could find Web pages related to their interests as closely as possible. Toward this ends, this paper presents a topic specific Web crawler computing the degree of relevance. collecting a cluster of pages given a specific topic, and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. In the experiments, we tested our topic specific crawler in terms of the accuracy of its classification, crawling efficiency, and crawling consistency. First, the classification accuracy using the set of rules compiled by CN2 was the best, among those of C4.5 and back propagation learning algorithms. Second, we measured the classification efficiency to determine the best threshold value affecting the degree of relevance. In the third experiment, the consistency of our topic specific crawler was measured in terms of the number of the resulting URLs overlapped with different starting URLs. The experimental results imply that our topic specific crawler was fairly consistent, regardless of the starting URLs randomly chosen.

  • PDF

Analysis of Research Trend by Technical Field of Construction Management Using Subject Classification Code (주제분류코드에 의한 국내외 건설사업관리(CM) 기술 분야별 연구 현황분석)

  • Kang, Leen-Seok;Park, Ho-Byung;Kim, Min-Ji;Moon, Hyoun-Seok
    • Korean Journal of Construction Engineering and Management
    • /
    • v.11 no.1
    • /
    • pp.48-58
    • /
    • 2010
  • Recently, the application of construction management system is being increased and various research activities by each specific field are going on many universities and research institutes. It is necessary to understand the latest research trends by each subject for keeping up excellent products in CM field. This study suggests a subject breakdown structure for classifying detailed technologies of CM field. And over 2,000 domestic and international papers in the last five years are analyzed for analogizing research trends by each subject. The analyzed results include research trends by year and strong and weak fields by each research subject of CM technology. Finally, this study suggests an improved countermeasure for guaranteeing sustainable research activation in CM field.

An Analysis on Curriculum of Library and Information Science in U.S. (미국 문헌정보학 교과과정 주제에 대한 분석 연구)

  • Choi, Sanghee;Ha, YooJin
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.53-71
    • /
    • 2019
  • Since new issues and topics are emerging in the information and library science fields, diverse needs are identified to enhance the curriculum of library and information science education. This study investigated curriculum of library and information science in US and identified the topics of classes in the curriculum by the three aspects such as competency areas, scientific and technology category, and research fields. Consequently, topics related various information technology including system design and implement are the most popular topics in all analyses. Library and information center management and user service are also major topics of the curriculum.

Towards the Development of a Reading Material Classification Scheme Based on a Combination of Book Use Facets (도서이용 속성 조합에 기반한 독서자료 분류체계 설계)

  • Jiyoung, Shim
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.4
    • /
    • pp.347-373
    • /
    • 2022
  • In this study, in order to expand the access points of reading materials, a reading material classification (RMC) system based on the facets of book use was devised. The facets of books that can be considered by book users in the reading situation were content-analyzed. Also, through network analysis, subject headings adjacent to one subject heading were grouped into related subject headings. The RMC developed in this study can be used as a tool that provides various access points to help book users search in the library OPAC and other reading information systems.

A Study on the Development of an Integrated Classification System for Archives of May 18th Democratic Uprising (5·18민주화운동 기록물 통합분류체계 개발 연구)

  • Park, Seong-Woo;Jeong, Dae-Keun
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.2
    • /
    • pp.373-403
    • /
    • 2017
  • The purpose of this study is to establish the classification principle of archives for the May 18th democratic uprising in terms of preservation and utilization of it and to develop an integrated classification system for it. For this purpose, it was carried out by the previous research on the classification of records and institutional case analysis. Also, we developed an integrated provenance-based classification system based on the practical analysis on the data held in 3 representative institutions in Gwangju. This classification system was proposed by facets of 'provenance-material-period-media-subject' type. We also proposed the collection-based integrated classification system that reflects on the expansion of archivists' role and the trend of times.