• Title/Summary/Keyword: 지지 벡터기계

Search Result 100, Processing Time 0.031 seconds

A Comparative Study on Optimal Feature Identification and Combination for Korean Dialogue Act Classification (한국어 화행 분류를 위한 최적의 자질 인식 및 조합의 비교 연구)

  • Kim, Min-Jeong;Park, Jae-Hyun;Kim, Sang-Bum;Rim, Hae-Chang;Lee, Do-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.11
    • /
    • pp.681-691
    • /
    • 2008
  • In this paper, we have evaluated and compared each feature and feature combinations necessary for statistical Korean dialogue act classification. We have implemented a Korean dialogue act classification system by using the Support Vector Machine method. The experimental results show that the POS bigram does not work well and the morpheme-POS pair and other features can be complementary to each other. In addition, a small number of features, which are selected by a feature selection technique such as chi-square, are enough to show steady performance of dialogue act classification. We also found that the last eojeol plays an important role in classifying an entire sentence, and that Korean characteristics such as free order and frequent subject ellipsis can affect the performance of dialogue act classification.

Binary Forecast of Asian Dust Days over South Korea in the Winter Season (남한지역 겨울철 황사출현일수에 대한 범주 예측모형 개발)

  • Sohn, Keon-Tae;Lee, Hyo-Jin;Kim, Seung-Bum
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.535-546
    • /
    • 2011
  • This study develops statistical models for the binary forecast of Asian dust days over South Korea in the winter season. For this study, we used three kinds of data; the rst one is the observed Asian dust days for a period of 31 years (1980 to 2010) as target values, the second one is four meteorological factors(near surface temperature, precipitation, snowfall, ground wind speed) in the source regions of Asian dust based on the NCEP reanalysis data and the third one is the large-scale climate indices. Four kinds of statistical models(multiple regression models, logistic regression models, decision trees, and support vector machines) are applied and compared based on skill scores(hit rate, probability of detection and false alarm rate).

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

Signal Sequence Prediction Based on Hydrophobicity and Substitution Matrix (소수성과 치환행렬에 기반한 신호서열 예측)

  • Chi, Sang-Mun
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.7
    • /
    • pp.595-602
    • /
    • 2007
  • This paper proposes a method that discriminates signal peptide and predicts the cleavage site of the secretory proteins cleaved by the signal peptidase I. The preprocessing stage uses hydrophobicity scales of amino acids in order to predict the presence of signal sequence and the cleavage site. The preprocessing enhances the performance of the prediction method by eliminating the non-secretory proteins in the early stage of prediction. for the effective use of support vector machine for the signal sequence prediction, the biologically relevant distance between the amino acid sequences is defined by using the hydrophobicity and substitution matrix; the hydrophobicity can be used to Predict the location of amino acid in a cell and the substitution matrix represents the evolutionary relationships of amino acids. The proposed method showed 98.9% discrimination rates from signal sequences and 88% correct rate of the cleavage site prediction on Swiss-Prot release 50 protein database using the 5-fold-cross-validation. In the comparison tests, the proposed method has performed significantly better than other prediction methods.

A Study on Automatic Text Categorization of Web-Based Query Using Synonymy List (유사어 사전을 이용한 웹기반 질의문의 자동 범주화에 관한 연구)

  • Nam, Young-Joon;Kim, Gyu-Hwan
    • Journal of Information Management
    • /
    • v.35 no.4
    • /
    • pp.81-105
    • /
    • 2004
  • In this study, the way of the automatic text categorization on web-based query was implemented. X2 methods based on the Supported Vector Machine were used to test the efficiency of text categorization on queries. This test is carried out by the model using the Synonymy List. 713 synonyms were extracted manually from the tested documents. As the result of this test, the precision ratio and the recall ratio were decreased by -0.01% and by 8.53%, respectively whether the synonyms were assigned or not. It also shows that the Value of F1 Measure was increased by 4.58%. The standard deviation between the recall and precision ratio was improve by 18.39%.

Emotion Prediction System using Movie Script and Cinematography (영화 시나리오와 영화촬영기법을 이용한 감정 예측 시스템)

  • Kim, Jinsu
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.12
    • /
    • pp.33-38
    • /
    • 2018
  • Recently, we are trying to predict the emotion from various information and to convey the emotion information that the supervisor wants to inform the audience. In addition, audiences intend to understand the flow of emotions through various information of non-dialogue parts, such as cinematography, scene background, background sound and so on. In this paper, we propose to extract emotions by mixing not only the context of scripts but also the cinematography information such as color, background sound, composition, arrangement and so on. In other words, we propose an emotional prediction system that learns and distinguishes various emotional expression techniques into dialogue and non-dialogue regions, contributes to the completeness of the movie, and quickly applies them to new changes. The precision of the proposed system is improved by about 5.1% and 0.4%, and the recall is improved by about 4.3% and 1.6%, respectively, when compared with the modified n-gram and morphological analysis.

Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities (문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.251-271
    • /
    • 2007
  • This paper studies the problem of classifying documents with labeled and unlabeled learning data, especially with regards to using document similarity features. The problem of using unlabeled data is practically important because in many information systems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. There are two steps In general semi-supervised learning algorithm. First, it trains a classifier using the available labeled documents, and classifies the unlabeled documents. Then, it trains a new classifier using all the training documents which were labeled either manually or automatically. We suggested two types of semi-supervised learning algorithm with regards to using document similarity features. The one is one step semi-supervised learning which is using unlabeled documents only to generate document similarity features. And the other is two step semi-supervised learning which is using unlabeled documents as learning examples as well as similarity features. Experimental results, obtained using support vector machines and naive Bayes classifier, show that we can get improved performance with small labeled and large unlabeled documents then the performance of supervised learning which uses labeled-only data. When considering the efficiency of a classifier system, the one step semi-supervised learning algorithm which is suggested in this study could be a good solution for improving classification performance with unlabeled documents.

A Korean Document Sentiment Classification System based on Semantic Properties of Sentiment Words (감정 단어의 의미적 특성을 반영한 한국어 문서 감정분류 시스템)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.4
    • /
    • pp.317-322
    • /
    • 2010
  • This paper proposes how to improve performance of the Korean document sentiment-classification system using semantic properties of the sentiment words. A sentiment word means a word with sentiment, and sentiment features are defined by a set of the sentiment words which are important lexical resource for the sentiment classification. Sentiment feature represents different sentiment intensity in general field and in specific domain. In general field, we can estimate the sentiment intensity using a snippet from a search engine, while in specific domain, training data can be used for this estimation. When the sentiment intensity of the sentiment features are estimated, it is called semantic orientation and is used to estimate the sentiment intensity of the sentences in the text documents. After estimating sentiment intensity of the sentences, we apply that to the weights of sentiment features. In this paper, we evaluate our system in three different cases such as general, domain-specific, and general/domain-specific semantic orientation using support vector machine. Our experimental results show the improved performance in all cases, and, especially in general/domain-specific semantic orientation, our proposed method performs 3.1% better than a baseline system indexed by only content words.

A Method for Correcting Air-Pressure Data Collected by Mini-AWS (소형 자동기상관측장비(Mini-AWS) 기압자료 보정 기법)

  • Ha, Ji-Hun;Kim, Yong-Hyuk;Im, Hyo-Hyuc;Choi, Deokwhan;Lee, Yong Hee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.3
    • /
    • pp.182-189
    • /
    • 2016
  • For high accuracy of forecast using numerical weather prediction models, we need to get weather observation data that are large and high dense. Korea Meteorological Administration (KMA) mantains Automatic Weather Stations (AWSs) to get weather observation data, but their installation and maintenance costs are high. Mini-AWS is a very compact automatic weather station that can measure and record temperature, humidity, and pressure. In contrast to AWS, costs of Mini-AWS's installation and maintenance are low. It also has a little space restraints for installing. So it is easier than AWS to install mini-AWS on places where we want to get weather observation data. But we cannot use the data observed from Mini-AWSs directly, because it can be affected by surrounding. In this paper, we suggest a correcting method for using pressure data observed from Mini-AWS as weather observation data. We carried out preconditioning process on pressure data from Mini-AWS. Then they were corrected by using machine learning methods with the aim of adjusting to pressure data of the AWS closest to them. Our experimental results showed that corrected pressure data are in regulation and our correcting method using SVR showed very good performance.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.