• Title/Summary/Keyword: ensemble vector

Search Result 117, Processing Time 0.027 seconds

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

Gaussian noise addition approaches for ensemble optimal interpolation implementation in a distributed hydrological model

  • Manoj Khaniya;Yasuto Tachikawa;Kodai Yamamoto;Takahiro Sayama;Sunmin Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.25-25
    • /
    • 2023
  • The ensemble optimal interpolation (EnOI) scheme is a sub-optimal alternative to the ensemble Kalman filter (EnKF) with a reduced computational demand making it potentially more suitable for operational applications. Since only one model is integrated forward instead of an ensemble of model realizations, online estimation of the background error covariance matrix is not possible in the EnOI scheme. In this study, we investigate two Gaussian noise based ensemble generation strategies to produce dynamic covariance matrices for assimilation of water level observations into a distributed hydrological model. In the first approach, spatially correlated noise, sampled from a normal distribution with a fixed fractional error parameter (which controls its standard deviation), is added to the model forecast state vector to prepare the ensembles. In the second method, we use an adaptive error estimation technique based on the innovation diagnostics to estimate this error parameter within the assimilation framework. The results from a real and a set of synthetic experiments indicate that the EnOI scheme can provide better results when an optimal EnKF is not identified, but performs worse than the ensemble filter when the true error characteristics are known. Furthermore, while the adaptive approach is able to reduce the sensitivity to the fractional error parameter affecting the first (non-adaptive) approach, results are usually worse at ungauged locations with the former.

  • PDF

Developing an Ensemble Classifier for Bankruptcy Prediction (부도 예측을 위한 앙상블 분류기 개발)

  • Min, Sung-Hwan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.7
    • /
    • pp.139-148
    • /
    • 2012
  • An ensemble of classifiers is to employ a set of individually trained classifiers and combine their predictions. It has been found that in most cases the ensembles produce more accurate predictions than the base classifiers. Combining outputs from multiple classifiers, known as ensemble learning, is one of the standard and most important techniques for improving classification accuracy in machine learning. An ensemble of classifiers is efficient only if the individual classifiers make decisions as diverse as possible. Bagging is the most popular method of ensemble learning to generate a diverse set of classifiers. Diversity in bagging is obtained by using different training sets. The different training data subsets are randomly drawn with replacement from the entire training dataset. The random subspace method is an ensemble construction technique using different attribute subsets. In the random subspace, the training dataset is also modified as in bagging. However, this modification is performed in the feature space. Bagging and random subspace are quite well known and popular ensemble algorithms. However, few studies have dealt with the integration of bagging and random subspace using SVM Classifiers, though there is a great potential for useful applications in this area. The focus of this paper is to propose methods for improving SVM performance using hybrid ensemble strategy for bankruptcy prediction. This paper applies the proposed ensemble model to the bankruptcy prediction problem using a real data set from Korean companies.

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

  • Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
    • Advances in nano research
    • /
    • v.13 no.5
    • /
    • pp.499-512
    • /
    • 2022
  • This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m3, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.

Ensemble Design of Machine Learning Technigues: Experimental Verification by Prediction of Drifter Trajectory (앙상블을 이용한 기계학습 기법의 설계: 뜰개 이동경로 예측을 통한 실험적 검증)

  • Lee, Chan-Jae;Kim, Yong-Hyuk
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.3
    • /
    • pp.57-67
    • /
    • 2018
  • The ensemble is a unified approach used for getting better performance by using multiple algorithms in machine learning. In this paper, we introduce boosting and bagging, which have been widely used in ensemble techniques, and design a method using support vector regression, radial basis function network, Gaussian process, and multilayer perceptron. In addition, our experiment was performed by adding a recurrent neural network and MOHID numerical model. The drifter data used for our experimental verification consist of 683 observations in seven regions. The performance of our ensemble technique is verified by comparison with four algorithms each. As verification, mean absolute error was adapted. The presented methods are based on ensemble models using bagging, boosting, and machine learning. The error rate was calculated by assigning the equal weight value and different weight value to each unit model in ensemble. The ensemble model using machine learning showed 61.7% improvement compared to the average of four machine learning technique.

Credit Risk Evaluations of Online Retail Enterprises Using Support Vector Machines Ensemble: An Empirical Study from China

  • LI, Xin;XIA, Han
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.9 no.8
    • /
    • pp.89-97
    • /
    • 2022
  • The e-commerce market faces significant credit risks due to the complexity of the industry and information asymmetries. Therefore, credit risk has started to stymie the growth of e-commerce. However, there is no reliable system for evaluating the creditworthiness of e-commerce companies. Therefore, this paper constructs a credit risk evaluation index system that comprehensively considers the online and offline behavior of online retail enterprises, including 15 indicators that reflect online credit risk and 15 indicators that reflect offline credit risk. This paper establishes an integration method based on a fuzzy integral support vector machine, which takes the factor analysis results of the credit risk evaluation index system of online retail enterprises as the input and the credit risk evaluation results of online retail enterprises as the output. The classification results of each sub-classifier and the importance of each sub-classifier decision to the final decision have been taken into account in this method. Select the sample data of 1500 online retail loan customers from a bank to test the model. The empirical results demonstrate that the proposed method outperforms a single SVM and traditional SVMs aggregation technique via majority voting in terms of classification accuracy, which provides a basis for banks to establish a reliable evaluation system.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

Context-Aware Fusion with Support Vector Machine (Support Vector Machine을 이용한 문맥 인지형 융합)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.6
    • /
    • pp.19-26
    • /
    • 2014
  • An ensemble classifier system is a widely-used multi-classifier system, which combines the results from each classifier and, as a result, achieves better classification result than any single classifier used. Several methods have been used to build an ensemble classifier including boosting, which is a cascade method where misclassified examples in previous stage are used to boost the performance in current stage. Boosting is, however, a serial method which does not form a complete feedback loop. In this paper, proposed is context sensitive SVM ensemble (CASE) which adopts SVM, one of the best classifiers in term of classification rate, as a basic classifier and clustering method to divide feature space into contexts. As CASE divides feature space and trains SVMs simultaneously, the result from one component can be applied to the other and CASE achieves better result than boosting. Experimental results prove the usefulness of the proposed method.

An Ensemble Classifier using Two Dimensional LDA

  • Park, Cheong-Hee
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.6
    • /
    • pp.817-824
    • /
    • 2010
  • Linear Discriminant Analysis (LDA) has been successfully applied for dimension reduction in face recognition. However, LDA requires the transformation of a face image to a one-dimensional vector and this process can cause the correlation information among neighboring pixels to be disregarded. On the other hand, 2D-LDA uses 2D images directly without a transformation process and it has been shown to be superior to the traditional LDA. Nevertheless, there are some problems in 2D-LDA. First, it is difficult to determine the optimal number of feature vectors in a reduced dimensional space. Second, the size of rectangular windows used in 2D-LDA makes strong impacts on classification accuracies but there is no reliable way to determine an optimal window size. In this paper, we propose a new algorithm to overcome those problems in 2D-LDA. We adopt an ensemble approach which combines several classifiers obtained by utilizing various window sizes. And a practical method to determine the number of feature vectors is also presented. Experimental results demonstrate that the proposed method can overcome the difficulties with choosing an optimal window size and the number of feature vectors.

Ensemble of Degraded Artificial Intelligence Modules Against Adversarial Attacks on Neural Networks

  • Sutanto, Richard Evan;Lee, Sukho
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.3
    • /
    • pp.148-152
    • /
    • 2018
  • Adversarial attacks on artificial intelligence (AI) systems use adversarial examples to achieve the attack objective. Adversarial examples consist of slightly changed test data, causing AI systems to make false decisions on these examples. When used as a tool for attacking AI systems, this can lead to disastrous results. In this paper, we propose an ensemble of degraded convolutional neural network (CNN) modules, which is more robust to adversarial attacks than conventional CNNs. Each module is trained on degraded images. During testing, images are degraded using various degradation methods, and a final decision is made utilizing a one-hot encoding vector that is obtained by summing up all the output vectors of the modules. Experimental results show that the proposed ensemble network is more resilient to adversarial attacks than conventional networks, while the accuracies for normal images are similar.