Search | Korea Science

Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction (기업부실 예측 데이터의 불균형 문제 해결을 위한 앙상블 학습)

Kim, Myoung-Jong
- Journal of Intelligence and Information Systems
- /
- v.15 no.3
- /
- pp.1-15
- /
- 2009
In a classification problem, data imbalance occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. This paper proposes a Geometric Mean-based Boosting (GM-Boost) to resolve the problem of data imbalance. Since GM-Boost introduces the notion of geometric mean, it can perform learning process considering both majority and minority sides, and reinforce the learning on misclassified data. An empirical study with bankruptcy prediction on Korea companies shows that GM-Boost has the higher classification accuracy than previous methods including Under-sampling, Over-Sampling, and AdaBoost, used in imbalanced data and robust learning performance regardless of the degree of data imbalance.
PDF

AdaBoost Face Detect ion System Based on Skin-color Filter and Face Candidate Region Localization (피부색 필터와 얼굴 후보 영역 국소화에 기반한 AdaBoost 얼굴검출 시스템)

Kim Ik Hoon;Seo Hae Jong;Park Young Kyung;Kim Joong Kyu
- Proceedings of the Korean Institute of Communication Sciences Conference
- /
- 2004.11a
- /
- pp.119-119
- /
- 2004
PDF

Estimation of compressive strength of BFS and WTRP blended cement mortars with machine learning models

Ozcan, Giyasettin;Kocak, Yilmaz;Gulbandilar, Eyyup
- Computers and Concrete
- /
- v.19 no.3
- /
- pp.275-282
- /
- 2017
The aim of this study is to build Machine Learning models to evaluate the effect of blast furnace slag (BFS) and waste tire rubber powder (WTRP) on the compressive strength of cement mortars. In order to develop these models, 12 different mixes with 288 specimens of the 2, 7, 28, and 90 days compressive strength experimental results of cement mortars containing BFS, WTRP and BFS+WTRP were used in training and testing by Random Forest, Ada Boost, SVM and Bayes classifier machine learning models, which implement standard cement tests. The machine learning models were trained with 288 data that acquired from experimental results. The models had four input parameters that cover the amount of Portland cement, BFS, WTRP and sample ages. Furthermore, it had one output parameter which is compressive strength of cement mortars. Experimental observations from compressive strength tests were compared with predictions of machine learning methods. In order to do predictive experimentation, we exploit R programming language and corresponding packages. During experimentation on the dataset, Random Forest, Ada Boost and SVM models have produced notable good outputs with higher coefficients of determination of R2, RMS and MAPE. Among the machine learning algorithms, Ada Boost presented the best R2, RMS and MAPE values, which are 0.9831, 5.2425 and 0.1105, respectively. As a result, in the model, the testing results indicated that experimental data can be estimated to a notable close extent by the model.
https://doi.org/10.12989/cac.2017.19.3.275 인용 KSCI

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

Kim, Myoung-Jong
- Journal of Intelligence and Information Systems
- /
- v.18 no.2
- /
- pp.29-45
- /
- 2012
Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.
https://doi.org/10.13088/jiis.2012.18.2.029 인용 PDF KSCI

AdaBoost-based Real-Time Face Detection & Tracking System (AdaBoost 기반의 실시간 고속 얼굴검출 및 추적시스템의 개발)

Kim, Jeong-Hyun;Kim, Jin-Young;Hong, Young-Jin;Kwon, Jang-Woo;Kang, Dong-Joong;Lho, Tae-Jung
- Journal of Institute of Control, Robotics and Systems
- /
- v.13 no.11
- /
- pp.1074-1081
- /
- 2007
This paper presents a method for real-time face detection and tracking which combined Adaboost and Camshift algorithm. Adaboost algorithm is a method which selects an important feature called weak classifier among many possible image features by tuning weight of each feature from learning candidates. Even though excellent performance extracting the object, computing time of the algorithm is very high with window size of multi-scale to search image region. So direct application of the method is not easy for real-time tasks such as multi-task OS, robot, and mobile environment. But CAMshift method is an improvement of Mean-shift algorithm for the video streaming environment and track the interesting object at high speed based on hue value of the target region. The detection efficiency of the method is not good for environment of dynamic illumination. We propose a combined method of Adaboost and CAMshift to improve the computing speed with good face detection performance. The method was proved for real image sequences including single and more faces.
https://doi.org/10.5302/J.ICROS.2007.13.11.1074 인용 PDF KSCI

Real-time Slant Face detection using improvement AdaBoost algorithm (개선한 아다부스트 알고리즘을 이용한 기울어진 얼굴 실시간 검출)

Na, Jong-Won
- Journal of Advanced Navigation Technology
- /
- v.12 no.3
- /
- pp.280-285
- /
- 2008
The traditional face detection method is to use difference picture method are used to detect movement. However, most do not consider this mathematical approach using real-time or real-time implementation of the algorithm is complicated, not easy. This paper, the first to detect real-time facial image is converted YCbCr and RGB video input. Next, you convert the difference between video images of two adjacent to obtain and then to conduct Glassfire Labeling. Labeling value compared to the threshold behavior Area recognizes and converts video extracts. Actions to convert video to conduct face detection, and detection of facial characteristics required for the extraction and use of AdaBoost algorithm.
PDF

Design and Implementation of a Bimodal User Recognition System using Face and Audio (얼굴과 음성 정보를 이용한 바이모달 사용자 인식 시스템 설계 및 구현)

Kim Myung-Hun;Lee Chi-Geun;So In-Mi;Jung Sung-Tae
- Journal of the Korea Society of Computer and Information
- /
- v.10 no.5 s.37
- /
- pp.353-362
- /
- 2005
Recently, study of Bimodal recognition has become very active. In this paper we propose a Bimodal user recognition system that uses face information and audio information. Face recognition consists of face detection step and face recognition step. Face detection uses AdaBoost to find face candidate area. After finding face candidates, PCA feature extraction is applied to decrease the dimension of feature vector. And then, SVM classifiers are used to detect and recognize face. Audio recognition uses MFCC for audio feature extraction and HMM is used for audio recognition. Experimental results show that the Bimodal recognition can improve the user recognition rate much more than audio only recognition, especially in the Presence of noise.
PDF

Extraction of the License Plate Region Using HoG and AdaBoost (HoG와 AdaBoost를 이용한 번호판 영역 추출)

Lew, Sheen;Yi, Cui-Sheng;Lee, Wan-Joo;Lee, Byeong-Rae;Min, Kyoung-Won;Kang, Hyun-Chul
- Journal of Digital Contents Society
- /
- v.10 no.4
- /
- pp.597-604
- /
- 2009
For the improvement of license plate recognition system, correct extraction of a license plate region as well as character recognition is important. In this paper, with the analysis and classification of the error patterns in the process of plate region extraction, we tried to improve the extraction of the region using HoG(histogram of gradient) features and Adaboost. The results show that the HoG feature is robust to the noise and various types of the plates, and also is very effective to extract the region failed before.
PDF

Design and Implementation of a Real-Time Face Detection System (실시간 얼굴 검출 시스템 설계 및 구현)

Jung Sung-Tae;Lee Ho-Geun
- Journal of Korea Multimedia Society
- /
- v.8 no.8
- /
- pp.1057-1068
- /
- 2005
This paper proposes a real-time face detection system which detects multiple faces from low resolution video such as web-camera video. First, It finds face region candidates by using AdaBoost based object detection method which selects a small number of critical features from a larger set. Next, it generates reduced feature vector for each face region candidate by using principle component analysis. Finally, it classifies if the candidate is a face or non-face by using SVM(Support Vector Machine) based binary classification. According to experiment results, the proposed method achieves real-time face detection from low resolution video. Also, it reduces the false detection rate than existing methods by using PCA and SVM based face classification step.
PDF

Improvement of Face Recognition Speed Using Pose Estimation (얼굴의 자세추정을 이용한 얼굴인식 속도 향상)

Choi, Sun-Hyung;Cho, Seong-Won;Chung, Sun-Tae
- Journal of the Korean Institute of Intelligent Systems
- /
- v.20 no.5
- /
- pp.677-682
- /
- 2010
This paper addresses a method of estimating roughly the human pose by comparing Haar-wavelet value which is learned in face detection technology using AdaBoost algorithm. We also presents its application to face recognition. The learned weak classifier is used to a Haar-wavelet robust to each pose's feature by comparing the coefficients during the process of face detection. The Mahalanobis distance is used to measure the matching degree in Haar-wavelet selection. When a facial image is detected using the selected Haar-wavelet, the pose is estimated. The proposed pose estimation can be used to improve face recognition speed. Experiments are conducted to evaluate the performance of the proposed method for pose estimation.
https://doi.org/10.5391/JKIIS.2010.20.5.677 인용 PDF KSCI

Search Result 185, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)