• Title/Summary/Keyword: discriminant feature

Search Result 200, Processing Time 0.02 seconds

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Codec Employing SVM Based on Discriminative Weight Training (SMV코덱의 음성/음악 분류 성능 향상을 위한 최적화된 가중치를 적용한 입력벡터 기반의 SVM 구현)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk;Cho, Ki-Ho;Kim, Nam-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.471-476
    • /
    • 2009
  • In this paper, we apply a discriminative weight training to a support vector machine (SVM) based speech/music classification for the selectable mode vocoder (SMV) of 3GPP2. In our approach, the speech/music decision rule is expressed as the SVM discriminant function by incorporating optimally weighted features of the SMV based on a minimum classification error (MCE) method which is different from the previous work in that different weights are assigned to each the feature of SMV. The performance of the proposed approach is evaluated under various conditions and yields better results compared with the conventional scheme in the SVM.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Real-Time Face Recognition Based on Subspace and LVQ Classifier (부분공간과 LVQ 분류기에 기반한 실시간 얼굴 인식)

  • Kwon, Oh-Ryun;Min, Kyong-Pil;Chun, Jun-Chul
    • Journal of Internet Computing and Services
    • /
    • v.8 no.3
    • /
    • pp.19-32
    • /
    • 2007
  • This paper present a new face recognition method based on LVQ neural net to construct a real time face recognition system. The previous researches which used PCA, LDA combined neural net usually need much time in training neural net. The supervised LVQ neural net needs much less time in training and can maximize the separability between the classes. In this paper, the proposed method transforms the input face image by PCA and LDA sequentially into low-dimension feature vectors and recognizes the face through LVQ neural net. In order to make the system robust to external light variation, light compensation is performed on the detected face by max-min normalization method as preprocessing. PCA and LDA transformations are applied to the normalized face image to produce low-level feature vectors of the image. In order to determine the initial centers of LVQ and speed up the convergency of the LVQ neural net, the K-Means clustering algorithm is adopted. Subsequently, the class representative vectors can be produced by LVQ2 training using initial center vectors. The face recognition is achieved by using the euclidean distance measure between the center vector of classes and the feature vector of input image. From the experiments, we can prove that the proposed method is more effective in the recognition ratio for the cases of still images from ORL database and sequential images rather than using conventional PCA of a hybrid method with PCA and LDA.

  • PDF

Fingerprint Recognition Algorithm using Clique (클릭 구조를 이용한 지문 인식 알고리즘)

  • Ahn, Do-Sung;Kim, Hak-Il
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.2
    • /
    • pp.69-80
    • /
    • 1999
  • Recently, social requirements of personal identification techniques are rapidly expanding in a number of new application ares. Especially fingerprint recognition is the most important technology. Fingerprint recognition technologies are well established, proven, cost and legally accepted. Therefore, it has more spot lighted among the any other biometrics technologies. In this paper we propose a new on-line fingerprint recognition algorithm for non-inked type live scanner to fit their increasing of security level under the computing environment. Fingerprint recognition system consists of two distinct structural blocks: feature extraction and feature matching. The main topic in this paper focuses on the feature matching using the fingerprint minutiae (ridge ending and bifurcation). Minutiae matching is composed in the alignment stage and matching stage. Success of optimizing the alignment stage is the key of real-time (on-line) fingerprint recognition. Proposed alignment algorithm using clique shows the strength in the search space optimization and partially incomplete image. We make our own database to get the generality. Using the traditional statistical discriminant analysis, 0.05% false acceptance rate (FAR) at 8.83% false rejection rate (FRR) in 1.55 second average matching speed on a Pentium system have been achieved. This makes it possible to construct high performance fingerprint recognition system.

  • PDF

A Method of Feature Extraction on Motor Imagery EEG Using FLD and PCA Based on Sub-Band CSP (서브 밴드 CSP기반 FLD 및 PCA를 이용한 동작 상상 EEG 특징 추출 방법 연구)

  • Park, Sang-Hoon;Lee, Sang-Goog
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1535-1543
    • /
    • 2015
  • The brain-computer interface obtains a user's electroencephalogram as a replacement communication unit for the disabled such that the user is able to control machines by simply thinking instead of using hands or feet. In this paper, we propose a feature extraction method based on a non-selected filter by SBCSP to classify motor imagery EEG. First, we divide frequencies (4~40 Hz) into 4-Hz units and apply CSP to each Unit. Second, we obtain the FLD score vector by combining FLD results. Finally, the FLD score vector is projected onto the optimal plane for classification using PCA. We use BCI Competition III dataset IVa, and Extracted features are used as input for LS-SVM. The classification accuracy of the proposed method was evaluated using $10{\times}10$ fold cross-validation. For subjects 'aa', 'al', 'av', 'aw', and 'ay', results were $85.29{\pm}0.93%$, $95.43{\pm}0.57%$, $72.57{\pm}2.37%$, $91.82{\pm}1.38%$, and $93.50{\pm}0.69%$, respectively.

I-vector similarity based speech segmentation for interested speaker to speaker diarization system (화자 구분 시스템의 관심 화자 추출을 위한 i-vector 유사도 기반의 음성 분할 기법)

  • Bae, Ara;Yoon, Ki-mu;Jung, Jaehee;Chung, Bokyung;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.461-467
    • /
    • 2020
  • In noisy and multi-speaker environments, the performance of speech recognition is unavoidably lower than in a clean environment. To improve speech recognition, in this paper, the signal of the speaker of interest is extracted from the mixed speech signals with multiple speakers. The VoiceFilter model is used to effectively separate overlapped speech signals. In this work, clustering by Probabilistic Linear Discriminant Analysis (PLDA) similarity score was employed to detect the speech signal of the interested speaker, which is used as the reference speaker to VoiceFilter-based separation. Therefore, by utilizing the speaker feature extracted from the detected speech by the proposed clustering method, this paper propose a speaker diarization system using only the mixed speech without an explicit reference speaker signal. We use phone-dataset consisting of two speakers to evaluate the performance of the speaker diarization system. Source to Distortion Ratio (SDR) of the operator (Rx) speech and customer speech (Tx) are 5.22 dB and -5.22 dB respectively before separation, and the results of the proposed separation system show 11.26 dB and 8.53 dB respectively.

Machine Learning Based Structural Health Monitoring System using Classification and NCA (분류 알고리즘과 NCA를 활용한 기계학습 기반 구조건전성 모니터링 시스템)

  • Shin, Changkyo;Kwon, Hyunseok;Park, Yurim;Kim, Chun-Gon
    • Journal of Advanced Navigation Technology
    • /
    • v.23 no.1
    • /
    • pp.84-89
    • /
    • 2019
  • This is a pilot study of machine learning based structural health monitoring system using flight data of composite aircraft. In this study, the most suitable machine learning algorithm for structural health monitoring was selected and dimensionality reduction method for application on the actual flight data was conducted. For these tasks, impact test on the cantilever beam with added mass, which is the simulation of damage in the aircraft wing structure was conducted and classification model for damage states (damage location and level) was trained. Through vibration test of cantilever beam with fiber bragg grating (FBG) sensor, data of normal and 12 damaged states were acquired, and the most suitable algorithm was selected through comparison between algorithms like tree, discriminant, support vector machine (SVM), kNN, ensemble. Besides, through neighborhood component analysis (NCA) feature selection, dimensionality reduction which is necessary to deal with high dimensional flight data was conducted. As a result, quadratic SVMs performed best with 98.7% for without NCA and 95.9% for with NCA. It is also shown that the application of NCA improved prediction speed, training time, and model memory.

Improvement of generalization of linear model through data augmentation based on Central Limit Theorem (데이터 증가를 통한 선형 모델의 일반화 성능 개량 (중심극한정리를 기반으로))

  • Hwang, Doohwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.19-31
    • /
    • 2022
  • In Machine learning, we usually divide the entire data into training data and test data, train the model using training data, and use test data to determine the accuracy and generalization performance of the model. In the case of models with low generalization performance, the prediction accuracy of newly data is significantly reduced, and the model is said to be overfit. This study is about a method of generating training data based on central limit theorem and combining it with existed training data to increase normality and using this data to train models and increase generalization performance. To this, data were generated using sample mean and standard deviation for each feature of the data by utilizing the characteristic of central limit theorem, and new training data was constructed by combining them with existed training data. To determine the degree of increase in normality, the Kolmogorov-Smirnov normality test was conducted, and it was confirmed that the new training data showed increased normality compared to the existed data. Generalization performance was measured through differences in prediction accuracy for training data and test data. As a result of measuring the degree of increase in generalization performance by applying this to K-Nearest Neighbors (KNN), Logistic Regression, and Linear Discriminant Analysis (LDA), it was confirmed that generalization performance was improved for KNN, a non-parametric technique, and LDA, which assumes normality between model building.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.