• Title/Summary/Keyword: Multi-category classification

Search Result 43, Processing Time 0.028 seconds

New Text Sentiment Classification Method (새로운 텍스트 감정 분류 방법)

  • Shin, Seong-Yoon;Lee, Hyun-Chang;Shin, Kwang-Seong;Kim, Hyung-Jin;Lee, Jae-Wan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.553-554
    • /
    • 2021
  • This paper proposes a convergence model based on LSTM and CNN deep learning techniques, and obtains good results by applying it to multi-category news datasets. According to the experiment, the deep learning-based fusion model significantly improved the precision and accuracy of text sentiment classification.

  • PDF

Text Classification Method Using Deep Learning Model Fusion and Its Application

  • Shin, Seong-Yoon;Cho, Gwang-Hyun;Cho, Seung-Pyo;Lee, Hyun-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.409-410
    • /
    • 2022
  • This paper proposes a fusion model based on Long-Short Term Memory networks (LSTM) and CNN deep learning methods, and applied to multi-category news datasets, and achieved good results. Experiments show that the fusion model based on deep learning has greatly improved the precision and accuracy of text sentiment classification. This method will become an important way to optimize the model and improve the performance of the model.

  • PDF

A multi-dimensional crime spatial pattern analysis and prediction model based on classification

  • Hajela, Gaurav;Chawla, Meenu;Rasool, Akhtar
    • ETRI Journal
    • /
    • v.43 no.2
    • /
    • pp.272-287
    • /
    • 2021
  • This article presents a multi-dimensional spatial pattern analysis of crime events in San Francisco. Our analysis includes the impact of spatial resolution on hotspot identification, temporal effects in crime spatial patterns, and relationships between various crime categories. In this work, crime prediction is viewed as a classification problem. When predictions for a particular category are made, a binary classification-based model is framed, and when all categories are considered for analysis, a multiclass model is formulated. The proposed crime-prediction model (HotBlock) utilizes spatiotemporal analysis for predicting crime in a fixed spatial region over a period of time. It is robust under variation of model parameters. HotBlock's results are compared with baseline real-world crime datasets. It is found that the proposed model outperforms the standard DeepCrime model in most cases.

Extraction of Spatial Characteristics of Cadastral Land Category from RapidEye Satellite Images

  • La, Phu Hien;Huh, Yong;Eo, Yang Dam;Lee, Soo Bong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.32 no.6
    • /
    • pp.581-590
    • /
    • 2014
  • With rapid land development, land category should be updated on a regular basis. However, manual field surveys have certain limitations. In this study, attempts were made to extract a feature vector considering spectral signature by parcel, PIMP (Percent Imperviousness), texture, and VIs (Vegetation Indices) based on RapidEye satellite image and cadastral map. A total of nine land categories in which feature vectors were significantly extracted from the images were selected and classified using SVM (Support Vector Machine). According to accuracy assessment, by comparing the cadastral map and classification result, the overall accuracy was 0.74. In the paddy-field category, in particular, PO acc. (producer's accuracy) and US acc. (user's accuracy) were highest at 0.85 and 0.86, respectively.

Deep Learning Music genre automatic classification voting system using Softmax (소프트맥스를 이용한 딥러닝 음악장르 자동구분 투표 시스템)

  • Bae, June;Kim, Jangyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.1
    • /
    • pp.27-32
    • /
    • 2019
  • Research that implements the classification process through Deep Learning algorithm, one of the outstanding human abilities, includes a unimodal model, a multi-modal model, and a multi-modal method using music videos. In this study, the results were better by suggesting a system to analyze each song's spectrum into short samples and vote for the results. Among Deep Learning algorithms, CNN showed superior performance in the category of music genre compared to RNN, and improved performance when CNN and RNN were applied together. The system of voting for each CNN result by Deep Learning a short sample of music showed better results than the previous model and the model with Softmax layer added to the model performed best. The need for the explosive growth of digital media and the automatic classification of music genres in numerous streaming services is increasing. Future research will need to reduce the proportion of undifferentiated songs and develop algorithms for the last category classification of undivided songs.

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

A Basal Cell Carcinoma Classifier with an Ambiguous Category (모호한 카테고리를 도입한 기저 세포암 검출기)

  • Park, Aa-Ron;Min, So-Hee;Baek, Seong-Joon;Na, Seung-Yu
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.261-262
    • /
    • 2006
  • According to the previous work, various well known methods including maximum a posteriori probability classifier (MAP) and multi layer perceptron networks classifier (MLP) showed competitive results. Since even the small errors often leads to a fatal result, we investigated the method that reduces classification error perfectly by screening out some ambiguous patterns. Those ambiguous patterns can be examined by routine biopsy. We incorporated an ambiguous category in MAP and MLP. Classification results involving 216 spectra gave 100% sensitivity for the case of MLP.

  • PDF

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Land Cover Classification of Multi-functional Administrative City for Hazard Mitigation Precaution (행정중심복합도시 재해경감대책을 위한 토지피복분류)

  • Han, Seung-Hee
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.8 no.5
    • /
    • pp.77-83
    • /
    • 2008
  • In this study, land cover classification and NDVI evaluation for hazard mitigation precaution are carried out in surrounding areas of Yeongi-gun, Chungcheongnam-do ($132\;km^2$) where a project for multi-functional administrative city is promoted by government. Image acquired from KOMPSAT 2, LANDSAT and ASTER is utilized and comparative evaluation on limitation in classification based on resolution was carried out. The area mainly consists of arable land including mountains, rice fields, ordinary fields, etc thus special attention was paid to the classification of rice fields and ordinary fields. For the classification of image acquired from KOMPSAT 2, segmentation technique for classification of high-resolution image was applied. To evaluate the accuracy of the classification, field investigation was conducted to examine the sample and it was compared with the land usage and classification of land category in land ledger of Korea. Acquired results were made into theme map in shape file format and it would be of great help in decision making of policy for the future-oriented development plan of multi-functional administrative city.

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

  • Kim, Jin-Suk;Choe, Ho-Seop;You, Beom-Jong;Seo, Jeong-Hyun;Lee, Suk-Hoon;Ra, Dong-Yul
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.3
    • /
    • pp.165-180
    • /
    • 2009
  • The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.