• Title/Summary/Keyword: Korean text classification

Search Result 413, Processing Time 0.021 seconds

Improvement and Analysis for an Electrical Fire Cause Classification (전기화재원인분류의 문제점 분석 및 개선안 제시)

  • Lee, Jong-Ho;Kim, Doo-Hyun;Kim, Sung-Chul
    • Fire Science and Engineering
    • /
    • v.23 no.2
    • /
    • pp.36-40
    • /
    • 2009
  • This paper presents research about the development of electrical fire cause classification in order to improve the reliability of electrical fire statistics and to collect electrical fires data efficiently. The incorrect and biased knowledge for electrical fires changed the classification of certain types of fires, from non-electrical to electrical. It is convenient and required to develop the standardized form that makes, in the assessment of the cause of electrical fires, the fire investigators directly ticking the appropriate box on the fire report form or making an assessment of a text description. In this study, newly developed electrical fire cause classification structure, which is well-defined hierarchical structure so that there are not any relationship or overlap between cause categories, is suggested. Also the suggested classification structure can be used for electrical fire investigation and statistics, which minimizes the mistake that diagnose non-electrical fires into electrical ones.

A New Approach to Statistical Analysis of Electrical Fire and Classification of Electrical Fire Causes

  • Kim, Doo-Hyun;Lee, Jong-Ho;Kim, Sung-Chul
    • International Journal of Safety
    • /
    • v.6 no.2
    • /
    • pp.17-21
    • /
    • 2007
  • This paper aims at the statistical analysis of electrical fire and classification of electrical fire causes to collect electrical fires data efficiently. Electrical fire statistics are produced to monitor the number and characteristics of fires attended by fire fighters, including the causes and effects of fire so that action can be taken to reduce the human and financial cost of fire. Electrical fires make up the majority of fires in Korea(including nearly 30% of total fires according to recent figures), The incorrect and biased knowledge for electrical fires changed the classification of certain types of fires, from non-electrical to electrical. It is convenient and required to develop the standardized form that makes, in the assessment of the cause of electrical fires, the fire fighters directly ticking the appropriate box on the fire report form or making an assessment of a text description. Therefore, it is highly recommended to develop electrical fire cause classification and electrical fire assessment on the fire statistics in order to categorize and assess electrical fires exactly. In this paper newly developed electrical fire cause classification structure, which is well-defined hierarchical structure so that there are not any relationship or overlap between cause categories, is suggested. Also fire statistics systems of foreign countries are introduced and compared.

A Deep Learning-based Depression Trend Analysis of Korean on Social Media (딥러닝 기반 소셜미디어 한글 텍스트 우울 경향 분석)

  • Park, Seojeong;Lee, Soobin;Kim, Woo Jung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.91-117
    • /
    • 2022
  • The number of depressed patients in Korea and around the world is rapidly increasing every year. However, most of the mentally ill patients are not aware that they are suffering from the disease, so adequate treatment is not being performed. If depressive symptoms are neglected, it can lead to suicide, anxiety, and other psychological problems. Therefore, early detection and treatment of depression are very important in improving mental health. To improve this problem, this study presented a deep learning-based depression tendency model using Korean social media text. After collecting data from Naver KonwledgeiN, Naver Blog, Hidoc, and Twitter, DSM-5 major depressive disorder diagnosis criteria were used to classify and annotate classes according to the number of depressive symptoms. Afterwards, TF-IDF analysis and simultaneous word analysis were performed to examine the characteristics of each class of the corpus constructed. In addition, word embedding, dictionary-based sentiment analysis, and LDA topic modeling were performed to generate a depression tendency classification model using various text features. Through this, the embedded text, sentiment score, and topic number for each document were calculated and used as text features. As a result, it was confirmed that the highest accuracy rate of 83.28% was achieved when the depression tendency was classified based on the KorBERT algorithm by combining both the emotional score and the topic of the document with the embedded text. This study establishes a classification model for Korean depression trends with improved performance using various text features, and detects potential depressive patients early among Korean online community users, enabling rapid treatment and prevention, thereby enabling the mental health of Korean society. It is significant in that it can help in promotion.

A Study on Automatic Keyword Classification (용어의 자동분류에 관한 연구)

  • Seo, Eun-Gyoung
    • Journal of the Korean Society for information Management
    • /
    • v.1 no.1
    • /
    • pp.78-99
    • /
    • 1984
  • In this paper, the automatic keyword classification which is one of the automatic construction methods of retrieval thesaurus is experimented to the Korean language on the basis that the use of retrieval thesaurus would increase the efficiency of information retrieval in the natural language retrieval system searching machine-readable data base. Furthermore, this paper proposes the application methods. In this experiment, the automatic keyword classification was based on the assumption that semantic relationships between terms can be found out by the statistical patterns of terms occurring in a text.

  • PDF

Analyzing and classifying emotional flow of story in emotion dimension space (정서 차원 공간에서 소설의 지배 정서 분석 및 분류)

  • Rhee, Shin-Young;Ham, Jun-Seok;Ko, Il-Ju
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.3
    • /
    • pp.299-326
    • /
    • 2011
  • The text such as stories, blogs, chat, message and reviews have the overall emotional flow. It can be classified to the text having similar emotional flow if we compare the similarity between texts, and it can be used such as recommendations and opinion collection. In this paper, we extract emotion terms from the text sequentially and analysis emotion terms in the pleasantness-unpleasantness and activation dimension in order to identify the emotional flow of the text. To analyze the 'dominant emotion' which is the overall emotional flow in the text, we add the time dimension as sequential flow of the text, and analyze the emotional flow in three dimensional space: pleasantness-unpleasantness, activation and time. Also, we suggested that a classification method to compute similarity of the emotional flow in the text using the Euclidean distance in three dimensional space. With the proposed method, we analyze the dominant emotion in korean modern short stories and classify them to similar dominant emotion.

  • PDF

An Analytical Study on Performance Factors of Automatic Classification based on Machine Learning (기계학습에 기초한 자동분류의 성능 요소에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.2
    • /
    • pp.33-59
    • /
    • 2016
  • This study examined the factors affecting the performance of automatic classification for the domestic conference papers based on machine learning techniques. In particular, In view of the classification performance that assigning automatically the class labels to the papers in Proceedings of the Conference of Korean Society for Information Management using Rocchio algorithm, I investigated the characteristics of the key factors (classifier formation methods, training set size, weighting schemes, label assigning methods) through the diversified experiments. Consequently, It is more effective that apply proper parameters (${\beta}$, ${\lambda}$) and training set size (more than 5 years) according to the classification environments and properties of the document set. and If the performance is equivalent, I discovered that the use of the more simple methods (single weighting schemes) is very efficient. Also, because the classification of domestic papers is corresponding with multi-label classification which assigning more than one label to an article, it is necessary to develop the optimum classification model based on the characteristics of the key factors in consideration of this environment.

Topic Classification for Suicidology

  • Read, Jonathon;Velldal, Erik;Ovrelid, Lilja
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.2
    • /
    • pp.143-150
    • /
    • 2012
  • Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vector machine classifiers, using cost-sensitive learning to deal with class imbalance. The features investigated range from a simple bag-of-words and n-grams over stems, to information drawn from syntactic dependency analysis and WordNet synonym sets. The experimental results are complemented by an analysis of systematic errors in both the output of our system and the gold-standard annotations.

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.81-86
    • /
    • 2016
  • Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.

EFTG: Efficient and Flexible Top-K Geo-textual Publish/Subscribe

  • zhu, Hong;Li, Hongbo;Cui, Zongmin;Cao, Zhongsheng;Xie, Meiyi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.12
    • /
    • pp.5877-5897
    • /
    • 2018
  • With the popularity of mobile networks and smartphones, geo-textual publish/subscribe messaging has attracted wide attention. Different from the traditional publish/subscribe format, geo-textual data is published and subscribed in the form of dynamic data flow in the mobile network. The difference creates more requirements for efficiency and flexibility. However, most of the existing Top-k geo-textual publish/subscribe schemes have the following deficiencies: (1) All publications have to be scored for each subscription, which is not efficient enough. (2) A user should take time to set a threshold for each subscription, which is not flexible enough. Therefore, we propose an efficient and flexible Top-k geo-textual publish/subscribe scheme. First, our scheme groups publish and subscribe based on text classification. Thus, only a few parts of related publications should be scored for each subscription, which significantly enhances efficiency. Second, our scheme proposes an adaptive publish/subscribe matching algorithm. The algorithm does not require the user to set a threshold. It can adaptively return Top-k results to the user for each subscription, which significantly enhances flexibility. Finally, theoretical analysis and experimental evaluation verify the efficiency and effectiveness of our scheme.