• Title/Summary/Keyword: Vector Machines

Search Result 534, Processing Time 0.028 seconds

Developing an Ensemble Classifier for Bankruptcy Prediction (부도 예측을 위한 앙상블 분류기 개발)

  • Min, Sung-Hwan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.7
    • /
    • pp.139-148
    • /
    • 2012
  • An ensemble of classifiers is to employ a set of individually trained classifiers and combine their predictions. It has been found that in most cases the ensembles produce more accurate predictions than the base classifiers. Combining outputs from multiple classifiers, known as ensemble learning, is one of the standard and most important techniques for improving classification accuracy in machine learning. An ensemble of classifiers is efficient only if the individual classifiers make decisions as diverse as possible. Bagging is the most popular method of ensemble learning to generate a diverse set of classifiers. Diversity in bagging is obtained by using different training sets. The different training data subsets are randomly drawn with replacement from the entire training dataset. The random subspace method is an ensemble construction technique using different attribute subsets. In the random subspace, the training dataset is also modified as in bagging. However, this modification is performed in the feature space. Bagging and random subspace are quite well known and popular ensemble algorithms. However, few studies have dealt with the integration of bagging and random subspace using SVM Classifiers, though there is a great potential for useful applications in this area. The focus of this paper is to propose methods for improving SVM performance using hybrid ensemble strategy for bankruptcy prediction. This paper applies the proposed ensemble model to the bankruptcy prediction problem using a real data set from Korean companies.

Prediction of the Movement Directions of Index and Stock Prices Using Extreme Gradient Boosting (익스트림 그라디언트 부스팅을 이용한 지수/주가 이동 방향 예측)

  • Kim, HyoungDo
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.9
    • /
    • pp.623-632
    • /
    • 2018
  • Both investors and researchers are attentive to the prediction of stock price movement directions since the accurate prediction plays an important role in strategic decision making on stock trading. According to previous studies, taken together, one can see that different factors are considered depending on stock markets and prediction periods. This paper aims to analyze what data mining techniques show better performance with some representative index and stock price datasets in the Korea stock market. In particular, extreme gradient boosting technique, proving itself to be the fore-runner through recent open competitions, is applied to the prediction problem. Its performance has been analyzed in comparison with other data mining techniques reported good in the prediction of stock price movement directions such as random forests, support vector machines, and artificial neural networks. Through experiments with the index/price datasets of 12 years, it is identified that the gradient boosting technique is the best in predicting the movement directions after 1 to 4 days with a few partial equivalence to the other techniques.

Automatic Email Multi-category Classification Using Dynamic Category Hierarchy and Non-negative Matrix Factorization (비음수 행렬 분해와 동적 분류 체계를 사용한 자동 이메일 다원 분류)

  • Park, Sun;An, Dong-Un
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.5
    • /
    • pp.378-385
    • /
    • 2010
  • The explosive increase in the use of email has made to need email classification efficiently and accurately. Current work on the email classification method have mainly been focused on a binary classification that filters out spam-mails. This methods are based on Support Vector Machines, Bayesian classifiers, rule-based classifiers. Such supervised methods, in the sense that the user is required to manually describe the rules and keyword list that is used to recognize the relevant email. Other unsupervised method using clustering techniques for the multi-category classification is created a category labels from a set of incoming messages. In this paper, we propose a new automatic email multi-category classification method using NMF for automatic category label construction method and dynamic category hierarchy method for the reorganization of email messages in the category labels. The proposed method in this paper, a large number of emails are managed efficiently by classifying multi-category email automatically, email messages in their category are reorganized for enhancing accuracy whenever users want to classify all their email messages.

Binary Forecast of Asian Dust Days over South Korea in the Winter Season (남한지역 겨울철 황사출현일수에 대한 범주 예측모형 개발)

  • Sohn, Keon-Tae;Lee, Hyo-Jin;Kim, Seung-Bum
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.535-546
    • /
    • 2011
  • This study develops statistical models for the binary forecast of Asian dust days over South Korea in the winter season. For this study, we used three kinds of data; the rst one is the observed Asian dust days for a period of 31 years (1980 to 2010) as target values, the second one is four meteorological factors(near surface temperature, precipitation, snowfall, ground wind speed) in the source regions of Asian dust based on the NCEP reanalysis data and the third one is the large-scale climate indices. Four kinds of statistical models(multiple regression models, logistic regression models, decision trees, and support vector machines) are applied and compared based on skill scores(hit rate, probability of detection and false alarm rate).

The Measurement of 3-Phase Current with Single Current Sensor and the Compensation of Voltage Distortion in Carrier-Based PWM Technique (삼각파 비교 PWM 기법에 있어서 단일 전류센서에 의한 삼상 전류 측정 및 전압 왜곡 보상)

  • 김경서
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.8 no.3
    • /
    • pp.292-298
    • /
    • 2003
  • Most of the three phase inverters for adjustable speed drive of AC machines are equipped with two or three current sensors for measurement of three phase current. One method to reduce the number of current sensors is that single current sensor measures the DC link current, then three phase current is reconstructed using the measured value and the switching status. To improve the measurement accuracy, switching state should be maintained for more than minimum switching time. Many papers have been published, which focused on the readjustment of pulse width and compensation of voltage distortion. Those methods are suitable for space vector modulation. But there are some difficulties in applying these methods to carrier-based PWM which is widely used in industry. In this paper, new current measurement method and voltage compensation method are proposed which are suitable for carrier-based PWM, then, the validity of proposed method is confirmed through experiment.

Compressive strength prediction of CFRP confined concrete using data mining techniques

  • Camoes, Aires;Martins, Francisco F.
    • Computers and Concrete
    • /
    • v.19 no.3
    • /
    • pp.233-241
    • /
    • 2017
  • During the last two decades, CFRP have been extensively used for repair and rehabilitation of existing structures as well as in new construction applications. For rehabilitation purposes CFRP are currently used to increase the load and the energy absorption capacities and also the shear strength of concrete columns. Thus, the effect of CFRP confinement on the strength and deformation capacity of concrete columns has been extensively studied. However, the majority of such studies consider empirical relationships based on correlation analysis due to the fact that until today there is no general law describing such a hugely complex phenomenon. Moreover, these studies have been focused on the performance of circular cross section columns and the data available for square or rectangular cross sections are still scarce. Therefore, the existing relationships may not be sufficiently accurate to provide satisfactory results. That is why intelligent models with the ability to learn from examples can and must be tested, trying to evaluate their accuracy for composite compressive strength prediction. In this study the forecasting of wrapped CFRP confined concrete strength was carried out using different Data Mining techniques to predict CFRP confined concrete compressive strength taking into account the specimens' cross section: circular or rectangular. Based on the results obtained, CFRP confined concrete compressive strength can be accurately predicted for circular cross sections using SVM with five and six input parameters without spending too much time. The results for rectangular sections were not as good as those obtained for circular sections. It seems that the prediction can only be obtained with reasonable accuracy for certain values of the lateral confinement coefficient due to less efficiency of lateral confinement for rectangular cross sections.

Predicting the Response of Segmented Customers for the Promotion Using Data Mining (데이터마이닝을 이용한 세분화된 고객집단의 프로모션 고객반응 예측)

  • Hong, Tae-Ho;Kim, Eun-Mi
    • Information Systems Review
    • /
    • v.12 no.2
    • /
    • pp.75-88
    • /
    • 2010
  • This paper proposed a method that segmented customers utilizing SOM(Self-organizing Map) and predicted the customers' response of a marketing promotion for each customer's segments. Our proposed method focused on predicting the response of customers dividing into customers' segment whereas most studies have predicted the response of customers all at once. We deployed logistic regression, neural networks, and support vector machines to predict customers' response that is a kind of dichotomous classification while the integrated approach was utilized to improve the performance of the prediction model. Sample data including 45 variables regarding demographic data about 600 customers, transaction data, and promotion activities were applied to the proposed method presenting classification matrix and the comparative analyses of each data mining techniques. We could draw some significant promotion strategies for segmented customers applying our proposed method to sample data.

Effective Mood Classification Method based on Music Segments (부분 정보에 기반한 효과적인 음악 무드 분류 방법)

  • Park, Gun-Han;Park, Sang-Yong;Kang, Seok-Joong
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.3
    • /
    • pp.391-400
    • /
    • 2007
  • According to the recent advances in multimedia computing, storage and searching technology have made large volume of music contents become prevalent. Also there has been increasing needs for the study on efficient categorization and searching technique for music contents management. In this paper, a new classifying method using the local information of music content and music tone feature is proposed. While the conventional classifying algorithms are based on entire information of music content, the algorithm proposed in this paper focuses on only the specific local information, which can drastically reduce the computing time without losing classifying accuracy. In order to improve the classifying accuracy, it uses a new classification feature based on music tone. The proposed method has been implemented as a part of MuSE (Music Search/Classification Engine) which was installed on various systems including commercial PDAs and PCs.

  • PDF

Exploring the Sentiment Analysis of Electric Vehicles Social Media Data by Using Feature Selection Methods (속성선택방법을 이용한 전기자동차 소셜미디어 데이터의 감성분석 연구)

  • Costello, Francis Joseph;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.18 no.2
    • /
    • pp.249-259
    • /
    • 2020
  • This study presents a recently obtained social media data set based upon the case study of Electric Vehicles (EV) and looks to implement a sentiment analysis (SA) in order to gain insights. This study uses two methods in order to fully analyze the public's sentiment on EVs. First, we implement a SA tool in which we used to extract the sentiment of comments. Next we labeled the data with these sentiments obtained and classified them. While performing classification we found the problem of dimensionality and also explored the use of feature selection (FS) models in order to reduce the data set's dimensionality. We found that the use of three FS models (Chi Squared, Information Gain and ReliefF) showed the most promising results when used alongside a logistic and support vector machines classification algorithm. the contributions of this paper are in providing an real-world example of social media text analytics which can be adopted in many other areas of research and business. Moving forward researchers can use the methodological approach in this paper to further refine and improve their own case uses in text analytics.

Feature Vector Decision Method of Various Fault Signals for Neural-network-based Fault Diagnosis System (신경회로망 기반 고장 진단 시스템을 위한 고장 신호별 특징 벡터 결정 방법)

  • Han, Hyung-Seob;Cho, Sang-Jin;Chong, Ui-Pil
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.20 no.11
    • /
    • pp.1009-1017
    • /
    • 2010
  • As rotating machines play an important role in industrial applications such as aeronautical, naval and automotive industries, many researchers have developed various condition monitoring system and fault diagnosis system by applying various techniques such as signal processing and pattern recognition. Recently, fault diagnosis systems using artificial neural network have been proposed. For effective fault diagnosis, this paper used MLP(multi-layer perceptron) network which is widely used in pattern classification. Since using obtained signals without preprocessing as inputs of neural network can decrease performance of fault classification, it is very important to extract significant features of captured signals and to apply suitable features into diagnosis system according to the kinds of obtained signals. Therefore, this paper proposes the decision method of the proper feature vectors about each fault signal for neural-network-based fault diagnosis system. We applied LPC coefficients, maximum magnitudes of each spectral section in FFT and RMS(root mean square) and variance of wavelet coefficients as feature vectors and selected appropriate feature vectors as comparing error ratios of fault diagnosis for sound, vibration and current fault signals. From experiment results, LPC coefficients and maximum magnitudes of each spectral section showed 100 % diagnosis ratios for each fault and the method using wavelet coefficients had noise-robust characteristic.