• Title/Summary/Keyword: support vector machine(SVM)

Search Result 1,254, Processing Time 0.023 seconds

Comparative Study of Various Machine-learning Features for Tweets Sentiment Classification (트윗 감정 분류를 위한 다양한 기계학습 자질에 대한 비교 연구)

  • Hong, Cho-Hee;Kim, Hark-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.12
    • /
    • pp.471-478
    • /
    • 2012
  • Various studies on sentiment classification of documents have been performed. Recently, they have been applied to twitter sentiment classification. However, they did not show good performances because they did not consider the characteristics of tweets such as tweet structure, emoticons, spelling errors, and newly-coined words. In this paper, we perform experiments on various input features (emoticon polarity, retweet polarity, author polarity, and replacement words) which affect twitter sentiment classification model based on machine-learning techniques. In the experiments with a sentiment classification model based on a support vector machine, we found that the emoticon polarity features and the author polarity features can contribute to improve the performance of a twitter sentiment classification model. Then, we found that the retweet polarity features and the replacement words features do not affect the performance of a twitter sentiment classification model contrary to our expectations.

SVM Kernel Design Using Local Feature Analysis (지역특징분석을 이용한 SVM 커널 디자인)

  • Lee, Il-Yong;Ahn, Jung-Ho
    • Journal of Digital Contents Society
    • /
    • v.11 no.1
    • /
    • pp.17-24
    • /
    • 2010
  • The purpose of this study is to design and implement a kernel for the support vector machine(SVM) to improve the performance of face recognition. Local feature analysis(LFA) has been well known for its good performance. SVM kernel plays a limited role of mapping low dimensional face features to high dimensional feature space but the proposed kernel using LFA is designed for face recognition purpose. Because of the novel method that local face information is extracted from training set and combined into the kernel, this method is expected to apply to various object recognition/detection tasks. The experimental results shows its improved performance.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

SVM-Based EEG Signal for Hand Gesture Classification (서포트 벡터 머신 기반 손동작 뇌전도 구분에 대한 연구)

  • Hong, Seok-min;Min, Chang-gi;Oh, Ha-Ryoung;Seong, Yeong-Rak;Park, Jun-Seok
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.7
    • /
    • pp.508-514
    • /
    • 2018
  • An electroencephalogram (EEG) evaluates the electrical activity generated by brain cell interactions that occur during brain activity, and an EEG can evaluate the brain activity caused by hand movement. In this study, a 16-channel EEG was used to measure the EEG generated before and after hand movement. The measured data can be classified as a supervised learning model, a support vector machine (SVM). To shorten the learning time of the SVM, a feature extraction and vector dimension reduction by filtering is proposed that minimizes motion-related information loss and compresses EEG information. The classification results showed an average of 72.7% accuracy between the sitting position and the hand movement at the electrodes of the frontal lobe.

Prediction of Soil Moisture with Open Source Weather Data and Machine Learning Algorithms (공공 기상데이터와 기계학습 모델을 이용한 토양수분 예측)

  • Jang, Young-bin;Jang, Ik-hoon;Choe, Young-chan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.1
    • /
    • pp.1-12
    • /
    • 2020
  • As one of the essential resources in the agricultural process, soil moisture has been carefully managed by predicting future changes and deficits. In recent years, statistics and machine learning based approach to predict soil moisture has been preferred in academia for its generalizability and ease of use in the field. However, little is known that machine learning based soil moisture prediction is applicable in the situation of South Korea. In this sense, this paper aims to examine 1) whether publicly available weather data generated in South Korea has sufficient quality to predict soil moisture, 2) which machine learning algorithm would perform best in the situation of South Korea, and 3) whether a single machine learning model could be generally applicable in various regions. We used various machine learning methods such as Support Vector Machines (SVM), Random Forest (RF), Extremely Randomized Trees (ET), Gradient Boosting Machines (GBM), and Deep Feedforward Network (DFN) to predict future soil moisture in Andong, Boseong, Cheolwon, Suncheon region with open source weather data. As a result, GBM model showed the lowest prediction error in every data set we used (R squared: 0.96, RMSE: 1.8). Furthermore, GBM showed the lowest variance of prediction error between regions which indicates it has the highest generalizability.

A study on EPB shield TBM face pressure prediction using machine learning algorithms (머신러닝 기법을 활용한 토압식 쉴드TBM 막장압 예측에 관한 연구)

  • Kwon, Kibeom;Choi, Hangseok;Oh, Ju-Young;Kim, Dongku
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.24 no.2
    • /
    • pp.217-230
    • /
    • 2022
  • The adequate control of TBM face pressure is of vital importance to maintain face stability by preventing face collapse and surface settlement. An EPB shield TBM excavates the ground by applying face pressure with the excavated soil in the pressure chamber. One of the challenges during the EPB shield TBM operation is the control of face pressure due to difficulty in managing the excavated soil. In this study, the face pressure of an EPB shield TBM was predicted using the geological and operational data acquired from a domestic TBM tunnel site. Four machine learning algorithms: KNN (K-Nearest Neighbors), SVM (Support Vector Machine), RF (Random Forest), and XGB (eXtreme Gradient Boosting) were applied to predict the face pressure. The model comparison results showed that the RF model yielded the lowest RMSE (Root Mean Square Error) value of 7.35 kPa. Therefore, the RF model was selected as the optimal machine learning algorithm. In addition, the feature importance of the RF model was analyzed to evaluate appropriately the influence of each feature on the face pressure. The water pressure indicated the highest influence, and the importance of the geological conditions was higher in general than that of the operation features in the considered site.

Classification of Ground-Glass Opacity Nodules with Small Solid Components using Multiview Images and Texture Analysis in Chest CT Images (흉부 CT 영상에서 다중 뷰 영상과 텍스처 분석을 통한 고형 성분이 작은 폐 간유리음영 결절 분류)

  • Lee, Seon Young;Jung, Julip;Lee, Han Sang;Hong, Helen
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.7
    • /
    • pp.994-1003
    • /
    • 2017
  • Ground-glass opacity nodules(GGNs) in chest CT images are associated with lung cancer, and have a different malignant rate depending on existence of solid component in the nodules. In this paper, we propose a method to classify pure GGNs and part-solid GGNs using multiview images and texture analysis in pulmonary GGNs with solid components of 5mm or smaller. We extracted 1521 features from the GGNs segmented from the chest CT images and classified the GGNs using a SVM classification model with selected features that classify pure GGNs and part-solid GGNs through a feature selection method. Our method showed 85% accuracy using the SVM classifier with the top 10 features selected in the multiview images.

An Optimized CLBP Descriptor Based on a Scalable Block Size for Texture Classification

  • Li, Jianjun;Fan, Susu;Wang, Zhihui;Li, Haojie;Chang, Chin-Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.288-301
    • /
    • 2017
  • In this paper, we propose an optimized algorithm for texture classification by computing a completed modeling of the local binary pattern (CLBP) instead of the traditional LBP of a scalable block size in an image. First, we show that the CLBP descriptor is a better representative than LBP by extracting more information from an image. Second, the CLBP features of scalable block size of an image has an adaptive capability in representing both gross and detailed features of an image and thus it is suitable for image texture classification. This paper successfully implements a machine learning scheme by applying the CLBP features of a scalable size to the Support Vector Machine (SVM) classifier. The proposed scheme has been evaluated on Outex and CUReT databases, and the evaluation result shows that the proposed approach achieves an improved recognition rate compared to the previous research results.

Sentiment Analysis of COVID-19 Vaccination in Saudi Arabia

  • Sawsan Alowa;Lama Alzahrani;Noura Alhakbani;Hend Alrasheed
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.2
    • /
    • pp.13-30
    • /
    • 2023
  • Since the COVID-19 vaccine became available, people have been sharing their opinions on social media about getting vaccinated, causing discussions of the vaccine to trend on Twitter alongside certain events, making the website a rich data source. This paper explores people's perceptions regarding the COVID-19 vaccine during certain events and how these events influenced public opinion about the vaccine. The data consisted of tweets sent during seven important events that were gathered within 14 days of the first announcement of each event. These data represent people's reactions to these events without including irrelevant tweets. The study targeted tweets sent in Arabic from users located in Saudi Arabia. The data were classified as positive, negative, or neutral in tone. Four classifiers were used-support vector machine (SVM), naïve Bayes (NB), logistic regression (LOGR), and random forest (RF)-in addition to a deep learning model using BiLSTM. The results showed that the SVM achieved the highest accuracy, at 91%. Overall perceptions about the COVID-19 vaccine were 54% negative, 36% neutral, and 10% positive.

A Study on the Failure Diagnosis of Transfer Robot for Semiconductor Automation Based on Machine Learning Algorithm (머신러닝 알고리즘 기반 반도체 자동화를 위한 이송로봇 고장진단에 대한 연구)

  • Kim, Mi Jin;Ko, Kwang In;Ku, Kyo Mun;Shim, Jae Hong;Kim, Kihyun
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.4
    • /
    • pp.65-70
    • /
    • 2022
  • In manufacturing and semiconductor industries, transfer robots increase productivity through accurate and continuous work. Due to the nature of the semiconductor process, there are environments where humans cannot intervene to maintain internal temperature and humidity in a clean room. So, transport robots take responsibility over humans. In such an environment where the manpower of the process is cutting down, the lack of maintenance and management technology of the machine may adversely affect the production, and that's why it is necessary to develop a technology for the machine failure diagnosis system. Therefore, this paper tries to identify various causes of failure of transport robots that are widely used in semiconductor automation, and the Prognostics and Health Management (PHM) method is considered for determining and predicting the process of failures. The robot mainly fails in the driving unit due to long-term repetitive motion, and the core components of the driving unit are motors and gear reducer. A simulation drive unit was manufactured and tested around this component and then applied to 6-axis vertical multi-joint robots used in actual industrial sites. Vibration data was collected for each cause of failure of the robot, and then the collected data was processed through signal processing and frequency analysis. The processed data can determine the fault of the robot by utilizing machine learning algorithms such as SVM (Support Vector Machine) and KNN (K-Nearest Neighbor). As a result, the PHM environment was built based on machine learning algorithms using SVM and KNN, confirming that failure prediction was partially possible.