• Title/Summary/Keyword: Ensemble Support Vector Machine

Search Result 82, Processing Time 0.023 seconds

Speech Query Recognition for Tamil Language Using Wavelet and Wavelet Packets

  • Iswarya, P.;Radha, V.
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1135-1148
    • /
    • 2017
  • Speech recognition is one of the fascinating fields in the area of Computer science. Accuracy of speech recognition system may reduce due to the presence of noise present in speech signal. Therefore noise removal is an essential step in Automatic Speech Recognition (ASR) system and this paper proposes a new technique called combined thresholding for noise removal. Feature extraction is process of converting acoustic signal into most valuable set of parameters. This paper also concentrates on improving Mel Frequency Cepstral Coefficients (MFCC) features by introducing Discrete Wavelet Packet Transform (DWPT) in the place of Discrete Fourier Transformation (DFT) block to provide an efficient signal analysis. The feature vector is varied in size, for choosing the correct length of feature vector Self Organizing Map (SOM) is used. As a single classifier does not provide enough accuracy, so this research proposes an Ensemble Support Vector Machine (ESVM) classifier where the fixed length feature vector from SOM is given as input, termed as ESVM_SOM. The experimental results showed that the proposed methods provide better results than the existing methods.

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

  • Do Young Kim;Na Yeon Kim;Hyon Hee Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.173-178
    • /
    • 2023
  • As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.

Support vector ensemble for incipient fault diagnosis in nuclear plant components

  • Ayodeji, Abiodun;Liu, Yong-kuo
    • Nuclear Engineering and Technology
    • /
    • v.50 no.8
    • /
    • pp.1306-1313
    • /
    • 2018
  • The randomness and incipient nature of certain faults in reactor systems warrant a robust and dynamic detection mechanism. Existing models and methods for fault diagnosis using different mathematical/statistical inferences lack incipient and novel faults detection capability. To this end, we propose a fault diagnosis method that utilizes the flexibility of data-driven Support Vector Machine (SVM) for component-level fault diagnosis. The technique integrates separately-built, separately-trained, specialized SVM modules capable of component-level fault diagnosis into a coherent intelligent system, with each SVM module monitoring sub-units of the reactor coolant system. To evaluate the model, marginal faults selected from the failure mode and effect analysis (FMEA) are simulated in the steam generator and pressure boundary of the Chinese CNP300 PWR (Qinshan I NPP) reactor coolant system, using a best-estimate thermal-hydraulic code, RELAP5/SCDAP Mod4.0. Multiclass SVM model is trained with component level parameters that represent the steady state and selected faults in the components. For optimization purposes, we considered and compared the performances of different multiclass models in MATLAB, using different coding matrices, as well as different kernel functions on the representative data derived from the simulation of Qinshan I NPP. An optimum predictive model - the Error Correcting Output Code (ECOC) with TenaryComplete coding matrix - was obtained from experiments, and utilized to diagnose the incipient faults. Some of the important diagnostic results and heuristic model evaluation methods are presented in this paper.

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

  • Wiharto, Wiharto;Suryani, Esti;Setyawan, Sigit;Putra, Bintang PE
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.1
    • /
    • pp.31-40
    • /
    • 2022
  • Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.

Improving SVM Classification by Constructing Ensemble (앙상블 구성을 이용한 SVM 분류성능의 향상)

  • 제홍모;방승양
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.251-258
    • /
    • 2003
  • A support vector machine (SVM) is supposed to provide a good generalization performance, but the actual performance of a actually implemented SVM is often far from the theoretically expected level. This is largely because the implementation is based on an approximated algorithm, due to the high complexity of time and space. To improve this limitation, we propose ensemble of SVMs by using Bagging (bootstrap aggregating) and Boosting. By a Bagging stage each individual SVM is trained independently using randomly chosen training samples via a bootstrap technique. By a Boosting stage an individual SVM is trained by choosing training samples according to their probability distribution. The probability distribution is updated by the error of independent classifiers, and the process is iterated. After the training stage, they are aggregated to make a collective decision in several ways, such ai majority voting, the LSE(least squares estimation) -based weighting, and double layer hierarchical combining. The simulation results for IRIS data classification, the hand-written digit recognition and Face detection show that the proposed SVM ensembles greatly outperforms a single SVM in terms of classification accuracy.

Cancer Diagnosis System using Genetic Algorithm and Multi-boosting Classifier (Genetic Algorithm과 다중부스팅 Classifier를 이용한 암진단 시스템)

  • Ohn, Syng-Yup;Chi, Seung-Do
    • Journal of the Korea Society for Simulation
    • /
    • v.20 no.2
    • /
    • pp.77-85
    • /
    • 2011
  • It is believed that the anomalies or diseases of human organs are identified by the analysis of the patterns. This paper proposes a new classification technique for the identification of cancer disease using the proteome patterns obtained from two-dimensional polyacrylamide gel electrophoresis(2-D PAGE). In the new classification method, three different classification methods such as support vector machine(SVM), multi-layer perceptron(MLP) and k-nearest neighbor(k-NN) are extended by multi-boosting method in an array of subclassifiers and the results of each subclassifier are merged by ensemble method. Genetic algorithm was applied to obtain optimal feature set in each subclassifier. We applied our method to empirical data set from cancer research and the method showed the better accuracy and more stable performance than single classifier.

Prediction of Residual Resistance Coefficient of Low-Speed Full Ships Using Hull Form Variables and Machine Learning Approaches (선형변수 기계학습 기법을 활용한 저속비대선의 잉여저항계수 추정)

  • Kim, Yoo-Chul;Yang, Kyung-Kyu;Kim, Myung-Soo;Lee, Young-Yeon;Kim, Kwang-Soo
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.57 no.6
    • /
    • pp.312-321
    • /
    • 2020
  • In this study, machine learning techniques were applied to predict the residual resistance coefficient (Cr) of low-speed full ships. The used machine learning methods are Ridge regression, support vector regression, random forest, neural network and their ensemble model. 19 hull form variables were used as input variables for machine learning methods. The hull form variables and Cr data obtained from 139 hull forms of KRISO database were used in analysis. 80 % of the total data were used as training models and the rest as validation. Some non-linear models showed the overfitted results and the ensemble model showed better results than others.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

Forecasting Day-ahead Electricity Price Using a Hybrid Improved Approach

  • Hu, Jian-Ming;Wang, Jian-Zhou
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.6
    • /
    • pp.2166-2176
    • /
    • 2017
  • Electricity price prediction plays a crucial part in making the schedule and managing the risk to the competitive electricity market participants. However, it is a difficult and challenging task owing to the characteristics of the nonlinearity, non-stationarity and uncertainty of the price series. This study proposes a hybrid improved strategy which incorporates data preprocessor components and a forecasting engine component to enhance the forecasting accuracy of the electricity price. In the developed forecasting procedure, the Seasonal Adjustment (SA) method and the Ensemble Empirical Mode Decomposition (EEMD) technique are synthesized as the data preprocessing component; the Coupled Simulated Annealing (CSA) optimization method and the Least Square Support Vector Regression (LSSVR) algorithm construct the prediction engine. The proposed hybrid approach is verified with electricity price data sampled from the power market of New South Wales in Australia. The simulation outcome manifests that the proposed hybrid approach obtains the observable improvement in the forecasting accuracy compared with other approaches, which suggests that the proposed combinational approach occupies preferable predication ability and enough precision.

Sensor Fault Detection Scheme based on Deep Learning and Support Vector Machine (딥 러닝 및 서포트 벡터 머신기반 센서 고장 검출 기법)

  • Yang, Jae-Wan;Lee, Young-Doo;Koo, In-Soo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.2
    • /
    • pp.185-195
    • /
    • 2018
  • As machines have been automated in the field of industries in recent years, it is a paramount importance to manage and maintain the automation machines. When a fault occurs in sensors attached to the machine, the machine may malfunction and further, a huge damage will be caused in the process line. To prevent the situation, the fault of sensors should be monitored, diagnosed and classified in a proper way. In the paper, we propose a sensor fault detection scheme based on SVM and CNN to detect and classify typical sensor errors such as erratic, drift, hard-over, spike, and stuck faults. Time-domain statistical features are utilized for the learning and testing in the proposed scheme, and the genetic algorithm is utilized to select the subset of optimal features. To classify multiple sensor faults, a multi-layer SVM is utilized, and ensemble technique is used for CNN. As a result, the SVM that utilizes a subset of features selected by the genetic algorithm provides better performance than the SVM that utilizes all the features. However, the performance of CNN is superior to that of the SVM.