• Title/Summary/Keyword: Support Vector Machines(SVM)

Search Result 282, Processing Time 0.026 seconds

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Impurity profiling and chemometric analysis of methamphetamine seizures in Korea

  • Shin, Dong Won;Ko, Beom Jun;Cheong, Jae Chul;Lee, Wonho;Kim, Suhkmann;Kim, Jin Young
    • Analytical Science and Technology
    • /
    • v.33 no.2
    • /
    • pp.98-107
    • /
    • 2020
  • Methamphetamine (MA) is currently the most abused illicit drug in Korea. MA is produced by chemical synthesis, and the final target drug that is produced contains small amounts of the precursor chemicals, intermediates, and by-products. To identify and quantify these trace compounds in MA seizures, a practical and feasible approach for conducting chromatographic fingerprinting with a suite of traditional chemometric methods and recently introduced machine learning approaches was examined. This was achieved using gas chromatography (GC) coupled with a flame ionization detector (FID) and mass spectrometry (MS). Following appropriate examination of all the peaks in 71 samples, 166 impurities were selected as the characteristic components. Unsupervised (principal component analysis (PCA), hierarchical cluster analysis (HCA), and K-means clustering) and supervised (partial least squares-discriminant analysis (PLS-DA), orthogonal partial least squares-discriminant analysis (OPLS-DA), support vector machines (SVM), and deep neural network (DNN) with Keras) chemometric techniques were employed for classifying the 71 MA seizures. The results of the PCA, HCA, K-means clustering, PLS-DA, OPLS-DA, SVM, and DNN methods for quality evaluation were in good agreement. However, the tested MA seizures possessed distinct features, such as chirality, cutting agents, and boiling points. The study indicated that the established qualitative and semi-quantitative methods will be practical and useful analytical tools for characterizing trace compounds in illicit MA seizures. Moreover, they will provide a statistical basis for identifying the synthesis route, sources of supply, trafficking routes, and connections between seizures, which will support drug law enforcement agencies in their effort to eliminate organized MA crime.

A Predictive Model of the Generator Output Based on the Learning of Performance Data in Power Plant (발전플랜트 성능데이터 학습에 의한 발전기 출력 추정 모델)

  • Yang, HacJin;Kim, Seong Kun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.12
    • /
    • pp.8753-8759
    • /
    • 2015
  • Establishment of analysis procedures and validated performance measurements for generator output is required to maintain stable management of generator output in turbine power generation cycle. We developed turbine expansion model and measurement validation model for the performance calculation of generator using turbine output based on ASME (American Society of Mechanical Engineers) PTC (Performance Test Code). We also developed verification model for uncertain measurement data related to the turbine and generator output. Although the model in previous researches was developed using artificial neural network and kernel regression, the verification model in this paper was based on algorithms through Support Vector Machine (SVM) model to overcome the problems of unmeasured data. The selection procedures of related variables and data window for verification learning was also developed. The model reveals suitability in the estimation procss as the learning error was in the range of about 1%. The learning model can provide validated estimations for corrective performance analysis of turbine cycle output using the predictions of measurement data loss.

A Study of the Feature Classification and the Predictive Model of Main Feed-Water Flow for Turbine Cycle (주급수 유량의 형상 분류 및 추정 모델에 대한 연구)

  • Yang, Hac Jin;Kim, Seong Kun;Choi, Kwang Hee
    • Journal of Energy Engineering
    • /
    • v.23 no.4
    • /
    • pp.263-271
    • /
    • 2014
  • Corrective thermal performance analysis is required for thermal power plants to determine performance status of turbine cycle. We developed classification method for main feed water flow to make precise correction for performance analysis based on ASME (American Society of Mechanical Engineers) PTC (Performance Test Code). The classification is based on feature identification of status of main water flow. Also we developed predictive algorithms for corrected main feed-water through Support Vector Machine (SVM) Model for each classified feature area. The results was compared to estimations using Neural Network(NN) and Kernel Regression(KR). The feature classification and predictive model of main feed-water flow provides more practical methods for corrective thermal performance analysis of turbine cycle.

Shallow Parsing on Grammatical Relations in Korean Sentences (한국어 문법관계에 대한 부분구문 분석)

  • Lee, Song-Wook;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.984-989
    • /
    • 2005
  • This study aims to identify grammatical relations (GRs) in Korean sentences. The key task is to find the GRs in sentences in terms of such GR categories as subject, object, and adverbial. To overcome this problem, we are fared with the many ambiguities. We propose a statistical model, which resolves the grammatical relational ambiguity first, and then finds correct noun phrases (NPs) arguments of given verb phrases (VP) by using the probabilities of the GRs given NPs and VPs in sentences. The proposed model uses the characteristics of the Korean language such as distance, no-crossing and case property. We attempt to estimate the probabilities of GR given an NP and a VP with Support Vector Machines (SVM) classifiers. Through an experiment with a tree and GR tagged corpus for training the model, we achieved an overall accuracy of $84.8\%,\;94.1\%,\;and\;84.8\%$ in identifying subject, object, and adverbial relations in sentences, respectively.

Learning-based approach for License Plate Recognition System (학습 기반의 자동차 번호판 인식 시스템)

  • 김종배;김갑기;김광인;박민호;김항준
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.1
    • /
    • pp.1-11
    • /
    • 2001
  • This paper presents a learning-based approach for the construction of license Plate recognition system. The system consist of three modules. They are respectively, car detection module, license plate recognition module and recognition module. Car detection module detects a car in the given image sequence obtained from the camera with simple color-based approach. Segmentation module extracts the license plate in detect car image using neural network as filters for analyzing the color and texture properties of license plate. Recognition module then reads characters in detected license plate with support vector machine (SVM)-based characters recognizer. The system has been tested from parking lot and tollgate, etc. and have show the following performances on average: Car detect rate 100%, segmentation rate 97.5%, and character recognition rate about 97.2%. Overall system performances is 94.7% and processing time is one sec. Then our propose system does well using real world.

  • PDF

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Using Keystroke Dynamics for Implicit Authentication on Smartphone

  • Do, Son;Hoang, Thang;Luong, Chuyen;Choi, Seungchan;Lee, Dokyeong;Bang, Kihyun;Choi, Deokjai
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.8
    • /
    • pp.968-976
    • /
    • 2014
  • Authentication methods on smartphone are demanded to be implicit to users with minimum users' interaction. Existing authentication methods (e.g. PINs, passwords, visual patterns, etc.) are not effectively considering remembrance and privacy issues. Behavioral biometrics such as keystroke dynamics and gait biometrics can be acquired easily and implicitly by using integrated sensors on smartphone. We propose a biometric model involving keystroke dynamics for implicit authentication on smartphone. We first design a feature extraction method for keystroke dynamics. And then, we build a fusion model of keystroke dynamics and gait to improve the authentication performance of single behavioral biometric on smartphone. We operate the fusion at both feature extraction level and matching score level. Experiment using linear Support Vector Machines (SVM) classifier reveals that the best results are achieved with score fusion: a recognition rate approximately 97.86% under identification mode and an error rate approximately 1.11% under authentication mode.

Improving the Performance of a Fast Text Classifier with Document-side Feature Selection (문서측 자질선정을 이용한 고속 문서분류기의 성능향상에 관한 연구)

  • Lee, Jae-Yun
    • Journal of Information Management
    • /
    • v.36 no.4
    • /
    • pp.51-69
    • /
    • 2005
  • High-speed classification method becomes an important research issue in text categorization systems. A fast text categorization technique, named feature value voting, is introduced recently on the text categorization problems. But the classification accuracy of this technique is not good as its classification speed. We present a novel approach for feature selection, named document-side feature selection, and apply it to feature value voting method. In this approach, there is no feature selection process in learning phase; but realtime feature selection is executed in classification phase. Our results show that feature value voting with document-side feature selection can allow fast and accurate text classification system, which seems to be competitive in classification performance with Support Vector Machines, the state-of-the-art text categorization algorithms.

De-Noising and Contour Preserving Digit Enhancement for Meter Digit Recognition (계량기 숫자 인식을 위한 잡영 제거 및 윤곽보존 숫자강화)

  • Yi, Eun-Gyoo;Ko, Jae-Pil
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.515-520
    • /
    • 2006
  • 계량기 숫자 인식은 일반적으로 사용되고 있는 아날로그 계량기에 카메라를 부착하여, 검침 시 숫자 계기판 영상을 전송받고, 그 영상으로부터 숫자를 추출 및 인식하는 기술이다. 계량기 숫자 인식에서는 카메라의 설치 상태 및 기타 환경적인 요인들로 인해 숫자 계기판 영상의 일관성 있는 취득이 어렵게 된다. 본 논문에서는 숫자 인식에 악영향을 미치는, 취득 영상의 상태 변화를 보정해주기 위해 잡영 제거 및 윤곽보존 숫자강화를 제안하였다. 잡영 제거를 위해 잡영을 분포 위치에 따라서 세 가지 타입으로 나누었으며, 각 타입별로 잡영 제거를 하였다. 윤곽보존 숫자강화 과정에서는 일반적인 이진화 기법이 가지는 테두리 정보손실을 최소화할 수 있도록, 숫자 테두리의 명도를 보존하면서 숫자 중심부분의 밝기를 강화시켰다. 전처리 전/후의 인식률 비교 실험을 위해 SVM(Support Vector Machines)을 사용하였으며, 학습 데이터 1,409장과 조명 상태를 달리하여 취득한 1,782의 테스트 데이터를 실험 데이터로 사용하였다. 실험 결과, 81.09%라는 성능 향상을 확인하였으며 이는 제안한 전처리 기법이 조명으로 인한 데이터의 상태 변화 문제를 해결해줌으로써 인식 성능 향상에 크게 기여한다는 것을 입증해준다.

  • PDF