• Title/Summary/Keyword: 주제분류

Search Result 978, Processing Time 0.024 seconds

A Method of Classifying Tweet by subject using features (특징추출을 이용한 트위터 메시지 주제 분류 방법)

  • Song, Ji-min;Kim, Han-woo;Kim, Dong-joo;Jung, Sung-hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.905-907
    • /
    • 2014
  • Twitter is the special place that people in the world can freely share their information and opinion. There are tries to utilize a vast amount of information made from twitter. The study on classification of tweets by subject is actively conducted. Twitter is a service for sharing information with short 140-characters text message. The short message including brief content makes extracting a variety of information hard. In the paper, we suggests the method to classify tweet by subject. The method uses both tweet and subject features. In order to conduct experiments to verify the proposed method, we collected 10,000 tweet messages with the Twitter API. Through the experimental results, we will show that the performance of our proposed method is better than those of previous methods.

  • PDF

A Study on the Improvement of the Classification System on Archives and Records Management Studies in KDC (한국십진분류법 기록관리학 분야 분류체계 개선에 관한 연구)

  • Park, Su-Hyun;Lee, Myoung-Gyu
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.27 no.3
    • /
    • pp.25-50
    • /
    • 2016
  • Archives and Records Management Studies is being developed its own independent domains. However, the existing library classification scheme such as the KDC don't properly reflect the characteristics of Archives and Records Management Studies. This classification scheme has the irrational part of the arrangement of the subject items and should be required to rearrange subdivision of the subject areas. In this study, According to the characteristics of Archives and Records Management Studies, It is set up 8 subject areas, Records Management (General), the law and polices of records management, the collection and appraisal of the records, the documentary organization, recording information services, preservation of the records, archives management, archives and records center, etc. After analyzing the major contemporary library classification system such as KDC, DDC, NDC, UDC, LCC, then It is suggested that improvement measures through analyzing classification status and keywords of the Archives and Records Management data contained in Korean National Bibliography. In Archives and records management studies, The contents of the eight subject areas related to the field are changed to allow integration with KDC 028.

A Genre-based Classification of Digital Documents by using Deviation Statistic of Genre-revealing Term and Subject-revealing Term (장르와 주제 범주간 용어 편차정보를 이용한 디지털 문서의 장르기반 분류)

  • 이용배;맹성현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1062-1071
    • /
    • 2003
  • A genre-based classification means classifying documents by the purpose for which they were written, not by the semantics or subject areas. Most genre classifying methods in the past were based on the existing documents categorization algorithms and ineffective for feature selections, resulting in low quality classification results. In this research, we propose a new method for automatic classification of digital documents by genre. The genre classifier we developed uses the deviation statistic between the genre-revealing term frequencies and between the subject-revealing term frequencies within a genre. We collected Web documents to evaluate the proposed genre classification method. The experimental results show that the proposed method outperforms a direct application of a kai-square feature selection and bayesian classifier often used for subject classification by proving an excellent accuracy of about 30 percent.

A Study on the Classification Scheme for the Design of Directory Kids Search Engines (초등학생용 주제별 검색을 위한 효율적인 카테고리 분류 방법)

  • Jeong, Boo-Hyun;Kim, Kap-Su
    • 한국정보교육학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.577-586
    • /
    • 2004
  • 인터넷을 통해 전달되는 교육자료의 양은 나날이 증가하고 있지만 정보 생산자들이 일정한 표준 없이 자의적인 기준에 의해 정보를 분류하여 구성하기 때문에 이용자가 필요한 정보를 정확하게 찾아내기란 매우 어려운 실정이다. 따라서 털 연구는 국내 주제별 검색엔진인 Yahoo Korea와 Naver, Hanmir, Empas의 초등학생용 검색엔진의 분류체계를 비교 분석하여 주제별 검색을 위한 효율적인 카테고리 분류 방법을 제시함으로써 정보접근에 익숙하지 않은 초등학생에게 쉽게, 빠르게, 정확하게 교육자료에 접근할 수 있는 분류체계를 제시하고자 한다.

  • PDF

Deep learning-based Answer Type Classifier Considering Topicality in Korean Question Answering (한국어 질의 응답에서의 화제성을 고려한 딥러닝 기반 정답 유형 분류기)

  • Cho, Seung Woo;Choi, DongHyun;Kim, EungGyun
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.103-108
    • /
    • 2019
  • 한국어 질의 응답의 입력 질문에 대한 예상 정답 유형을 단답형 또는 서술형으로 이진 분류하는 방법에 대해 서술한다. 일반적인 개체명 인식으로 확인할 수 없는 질의 주제어의 화제성을 반영하기 위하여, 검색 엔진 쿼리를 빈도수로 분석한다. 분석된 질의 주제어 정보와 함께, 정답의 범위를 제약할 수 있는 속성 표현과 육하원칙 정보를 입력 자질로 사용한다. 기존 신경망 분류 모델과 비교한 실험에서, 추가 자질을 적용한 모델이 4% 정도 향상된 분류 성능을 보이는 것을 확인할 수 있었다.

  • PDF

Web Document Classification Based on Hangeul Morpheme and Keyword Analyses (한글 형태소 및 키워드 분석에 기반한 웹 문서 분류)

  • Park, Dan-Ho;Choi, Won-Sik;Kim, Hong-Jo;Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.19D no.4
    • /
    • pp.263-270
    • /
    • 2012
  • With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

Semi-Automatic Management of Classification Scheme with Interoperability (상호운용적 분류체계 관리를 위한 반자동 분류체계 관리방안)

  • Lee, Won-Goo;Shin, Sung-Ho;Kim, Kwang-Young;Jeon, Do-Heon;Yoon, Hwa-Mook;Sung, Won-Kyung;Lee, Min-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.466-474
    • /
    • 2011
  • Under the knowledge-based economy in 21C, the convergence and complexity in science and technology are being more active. Therefore, we have science and technology are classified properly, make not easy to construct the system to new next generation area. Thus we suggest the systematic solution method to flexibly extend classification scheme in order for content management and service organizations. In this way, we expect that the difficult of classification scheme management is minimized and the expense of it is spared.

Analysis of Municipal Ordinances for Smart Cities of Municipal Governments: Using Topic Modeling (지방자치단체의 스마트시티 조례 분석: 토픽모델링을 활용하여)

  • Hyungjun Seo
    • Informatization Policy
    • /
    • v.30 no.1
    • /
    • pp.41-66
    • /
    • 2023
  • This study aims to reveal the direction of municipal ordinances for smart cities, while focusing on 74 municipal ordinances from 72 municipal governments through topic modeling. As a result, the main keywords that show a high frequency belong to establishment and operations of the Smart City Committee. From the result of topic modeling Latent Dirichlet Allocation(LDA), it classifies municipal ordinances for smart cities into eight topics as follows: Topic 1(security for process of smart cities), Topic 2(promotion of smart city industry), Topic 3(composition of a smart city consultative body for local residents), Topic 4(support system for smart cities), Topic 5(management for personal information), Topic 6(use of smart city data), Topic 7(implementation for intelligent public administration), and Topic 8(smart city promotion). As for topic categorization by region, Topics 5, 6, and 8 which are mostly related to the practical operation of smart cities have a significant portion of municipal ordinances for smart cities in the Seoul metropolitan area. Then, Topics 2, 3, and 4 which are mostly related to the initial implementation of smart cities have a significant portion of municipal ordinances for smart cities in provincial areas.

A Study on Classification System for using internet information resources on Interior Design (인테리어 디자인 분야 인터넷 정보 자원 활용을 위한 분류체계 연구)

  • Lim, Kyung-Ran
    • Archives of design research
    • /
    • v.17 no.4
    • /
    • pp.79-88
    • /
    • 2004
  • This study is aimed to grasp the organization of Internet information resources and to infer the characteristics of resource search engines so that criteria may be established to classify and evaluate Internet information resources. In addition, the author has compared and analyzed interior design classification systems of directory sites of each subject that provide classification system based on the Internet, foreign sites to be used to search for information, and domestic information-specialized sites in order to set up models of interior design classification systems of directories of each Web subject. The systems have been analyzed against such four measures as comprehensiveness of the subject scope, logicality of classification systems, preciseness of subject terms, and effectiveness of searches. Information of interior designs is mixed with that of related fields, and so its information search and classification are not organized systematically. The author has analyzed such a problem so as to present models of search engine classification systems for interior design information classification after considering both academic and practical aspects.

  • PDF

A Study on the Topical Associations of Simultaneously Borrowed Books in Public Libraries (공공도서관 동시 대출 도서의 주제 연관성 분석 연구)

  • Woojin Kang;In Yeong Jeong;Jongwook Lee
    • Journal of Korean Library and Information Science Society
    • /
    • v.54 no.3
    • /
    • pp.33-55
    • /
    • 2023
  • There has been research to understand users' information behaviors using book circulation data of public libraries. In this study, we examined the subject areas of books simultaneously borrowed by users of public libraries and aimed to identify the relationships among the subject areas. To accomplish this, we utilized the Korean Decimal Classification codes of 984,790 loaned books in 2019 to transform the lists of concurrently borrowed books, totaling 22,443,699 records, by the same users on the same day, into vectors using the ITEM2VEC technique. Next, we extracted ten highly related classification codes for each classification code, utilizing a total of 522 classification codes to create a network. We identified 15 communities within this network and examined the characteristics of each community. Among the 15 communities, those consisting of two or more main classes allowed us to identify meaningful thematic associations. This study, grounded in users' book usage behaviors, has suggested the topics of books that could be borrowed together. The findings offer valuable insights for library collection development and placement, recommending related subject materials, and revising classification systems.