• Title/Summary/Keyword: 지지벡터기계학습

Search Result 64, Processing Time 0.023 seconds

Improving the Performance of SVM Text Categorization with Inter-document Similarities (문헌간 유사도를 이용한 SVM 분류기의 문헌분류성능 향상에 관한 연구)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.3 s.57
    • /
    • pp.261-287
    • /
    • 2005
  • The purpose of this paper is to explore the ways to improve the performance of SVM (Support Vector Machines) text classifier using inter-document similarities. SVMs are powerful machine learning systems, which are considered as the state-of-the-art technique for automatic document classification. In this paper text categorization via SVMs approach based on feature representation with document vectors is suggested. In this approach, document vectors instead of index terms are used as features, and vector similarities instead of term weights are used as feature values. Experiments show that SVM classifier with document vector features can improve the document classification performance. For the sake of run-time efficiency, two methods are developed: One is to select document vector features, and the other is to use category centroid vector features instead. Experiments on these two methods show that we can get improved performance with small vector feature set than the performance of conventional methods with index term features.

A Study on Low Power Design of SVM Algorithm for IoT Environment (IoT 환경을 위한 SVM 알고리즘 저전력화 방안 연구)

  • Song, Jun-Seok;Kim, Sang-Young;Song, Byung-Hoo;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.73-74
    • /
    • 2017
  • SVM(Support Vector Machine) 알고리즘은 대표적인 기계 학습 분류 알고리즘으로 감정 분석, 제스처 인식 등 다양한 분야의 문제를 해결하기 위해 사용되고 있다. SVM 알고리즘은 분리경계면(Hyper-Plane) 또는 분리경계면 집합 중 지지벡터(Support Vector)라 불리는 특정한 점들로 이루어진 두 그룹 간의 거리 차이(Margin)를 최대로 하는 분리경계면을 이용하여 데이터를 분류하는 알고리즘이다. 높은 정확도를 제공하지만 처리 속도가 느리며 학습을 위해 대량의 데이터 및 메모리가 필요하기 때문에 자원이 제한적인 IoT 환경에서 사용이 어렵다. 본 논문에서는 자원이 제한된 IoT 노드를 기반으로 효율적으로 데이터를 학습하기 위해 K-means 알고리즘을 이용하여 SVM 알고리즘의 저전력화 방안을 연구한다.

  • PDF

Classification Accuracy by Deviation-based Classification Method with the Number of Training Documents (학습문서의 개수에 따른 편차기반 분류방법의 분류 정확도)

  • Lee, Yong-Bae
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.325-332
    • /
    • 2014
  • It is generally accepted that classification accuracy is affected by the number of learning documents, but there are few studies that show how this influences automatic text classification. This study is focused on evaluating the deviation-based classification model which is developed recently for genre-based classification and comparing it to other classification algorithms with the changing number of training documents. Experiment results show that the deviation-based classification model performs with a superior accuracy of 0.8 from categorizing 7 genres with only 21 training documents. This exceeds the accuracy of Bayesian and SVM. The Deviation-based classification model obtains strong feature selection capability even with small number of training documents because it learns subject information within genre while other methods use different learning process.

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

Text Categorization Based on Terminology and Information Extraction (전문용어 및 정보추출에 기반한 문서분류시스템)

  • Lee, Kyung-Soon;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.79-84
    • /
    • 1999
  • 본 연구에서는 문서분류시스템에서 자질의 표현으로 전문분야사전을 이용한 분야정보와 개체정보추출을 통한 개체정보를 이용한다. 또한 지식정보를 보완하기 위해 통계적인 방법으로 범주 전문용어를 인식하여 자질로 표현하는 방법을 제안한다. 문서에 나타난 용어들이 어떤 특정 전문분야에 속하는 용어들이 많이 나타나는 경우 그 문서는 용어들이 속한 분야의 문서일 가능성이 높다. 또한, 정보추출을 통해 용어가 어떠한 개체를 나타내는지를 인식하여 문서를 표현함으로써 문서가 내포하는 의미를 보다 잘 반영할 수 있게 된다. 분야정보나 개체정보를 알 수 없는 용어에 대해서는 학습문서로부터 전문분야를 자동 인식함으로써 문서표현의 지식정보를 보완한다. 전문분야, 개체정보 및 범주전문용어에 기반해서 표현된 문서의 자질에 대해서 지지벡터기계 학습에 기반한 문서분류기틀 이용하여 각 범주에 대해 이진분류를 하였다. 제안된 문서자질표현은 용어기반의 자질표현에 비해 좋은 성능을 보이고 있다.

  • PDF

A Weight Boosting Method of Sentiment Features for Korean Document Sentiment Classification (한국어 문서 감정분류를 위한 감정 자질 가중치 강화 기법)

  • Hwang, Jaewon;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2008.10a
    • /
    • pp.201-206
    • /
    • 2008
  • 본 논문은 한국어 문서 감정분류에 기반이 되는 감정 자질의 가중치 강화를 통해 감정분류의 성능 향상을 얻을 수 있는 기법을 제안한다. 먼저, 어휘 자원인 감정 자질을 확보하고, 확장된 감정 자질이 감정 분류에 얼마나 기여하는지를 평가한다. 그리고 학습 데이터를 이용하여 얻을 수 있는 감정 자질의 카이 제곱 통계량(${\chi}^2$ statics)값을 이용하여 각 문장의 감정 강도를 구한다. 이렇게 구한 문장의 감정 강도의 값을 TF-IDF 가중치 기법에 접목하여 감정 자질의 가중치를 강화시킨다. 마지막으로 긍정 문서에서는 긍정 감정 자질만 강화하고 부정 문서에서는 부정 감정 자질만 강화하여 학습하였다. 본 논문에서는 문서 분류에 뛰어난 성능을 보여주는 지지 벡터 기계(Support Vector Machine)를 사용하여 제안한 방법의 성능을 평가한다. 평가 결과, 일반적인 정보 검색에서 사용하는 내용어(Content Word) 기반의 자질을 사용한 경우 보다 약 2.0%의 성능 향상을 보였다.

  • PDF

Composing Recommended Route through Machine Learning of Navigational Data (항적 데이터 학습을 통한 추천 항로 구성에 관한 연구)

  • Kim, Joo-Sung;Jeong, Jung Sik;Lee, Seong-Yong;Lee, Eun-seok
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2016.05a
    • /
    • pp.285-286
    • /
    • 2016
  • We aim to propose the prediction modeling method of ship's position with extracting ship's trajectory model through pattern recognition based on the data that are being collected in VTS centers at real time. Support Vector Machine algorithm was used for data modeling. The optimal parameters are calculated with k-fold cross validation and grid search. We expect that the proposed modeling method could support VTS operators' decision making in case of complex encountering traffic situations.

  • PDF

Defect Diagnostics of Gas Turbine Engine Using Support Vector Machine and Artificial Neural Network (Support Vector Machine과 인공신경망을 이용한 가스터빈 엔진의 결함 진단에 관한 연구)

  • Park Jun-Cheol;Roh Tae-Seong;Choi Dong-Whan;Lee Chang-Ho
    • Journal of the Korean Society of Propulsion Engineers
    • /
    • v.10 no.2
    • /
    • pp.102-109
    • /
    • 2006
  • In this Paper, Support Vector Machine(SVM) and Artificial Neural Network(ANN) are used for developing the defect diagnostic algorithm of the aircraft turbo-shaft engine. The system that uses the ANN falls in a local minima when it learns many nonlinear data, and its classification accuracy ratio becomes low. To make up for this risk, the Separate Learning Algorithm(SLA) of ANN has been proposed by using SVM. This is the method that ANN learns selectively after discriminating the defect position by SVM, then more improved performance estimation can be obtained than using ANN only. The proposed SLA can make the higher classification accuracy by decreasing the nonlinearity of the massive data during the training procedure.

Effective Fingerprint Classification using Subsumed One-Vs-All Support Vector Machines and Naive Bayes Classifiers (포섭구조 일대다 지지벡터기계와 Naive Bayes 분류기를 이용한 효과적인 지문분류)

  • Hong, Jin-Hyuk;Min, Jun-Ki;Cho, Ung-Keun;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.10
    • /
    • pp.886-895
    • /
    • 2006
  • Fingerprint classification reduces the number of matches required in automated fingerprint identification systems by categorizing fingerprints into a predefined class. Support vector machines (SVMs), widely used in pattern classification, have produced a high accuracy rate when performing fingerprint classification. In order to effectively apply SVMs to multi-class fingerprint classification systems, we propose a novel method in which SVMs are generated with the one-vs-all (OVA) scheme and dynamically ordered with $na{\ddot{i}}ve$ Bayes classifiers. More specifically, it uses representative fingerprint features such as the FingerCode, singularities and pseudo ridges to train the OVA SVMs and $na{\ddot{i}}ve$ Bayes classifiers. The proposed method has been validated on the NIST-4 database and produced a classification accuracy of 90.8% for 5-class classification. Especially, it has effectively managed tie problems usually occurred in applying OVA SVMs to multi-class classification.

Mortality Prediction of Older Adults Admitted to the Emergency Department (응급실 방문 노인 환자의 사망률 예측)

  • Park, Junhyeok;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.7
    • /
    • pp.275-280
    • /
    • 2018
  • As the global population becomes aging, the demand for health services for the elderly is expected to increase. In particular, The elderly visiting the emergency department sometimes have complex medical, social, and physical problems, such as having a variety of illnesses or complaints of unusual symptoms. The proposed system is designed to predict the mortality of the elderly patients who are over 65 years old and have admitted the emergency department. For mortality prediction, we compare the support vector machines and Feed Forward Neural Network (FFNN) trained with medical data such as age, sex, blood pressure, body temperature, etc. The results of the FFNN with a hidden layer are best in the mortality prediction, and F1 score and the AUC is 52.0%, 88.6% respectively. If we improve the performance of the proposed system by extracting better medical features, we will be able to provide better medical services through an effective and quick allocation of medical resources for the elderly patients visiting the emergency department.