• Title/Summary/Keyword: 부스팅

Search Result 135, Processing Time 0.03 seconds

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

Performance Comparison of Machine Learning Based on Neural Networks and Statistical Methods for Prediction of Drifter Movement (뜰개 이동 예측을 위한 신경망 및 통계 기반 기계학습 기법의 성능 비교)

  • Lee, Chan-Jae;Kim, Gyoung-Do;Kim, Yong-Hyuk
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.10
    • /
    • pp.45-52
    • /
    • 2017
  • Drifter is an equipment for observing the characteristics of seawater in the ocean, and it can be used to predict effluent oil diffusion and to observe ocean currents. In this paper, we design models or the prediction of drifter trajectory using machine learning. We propose methods for estimating the trajectory of drifter using support vector regression, radial basis function network, Gaussian process, multilayer perceptron, and recurrent neural network. When the propose mothods were compared with the existing MOHID numerical model, performance was improve on three of the four cases. In particular, LSTM, the best performed method, showed the imporvement by 47.59% Future work will improve the accuracy by weighting using bagging and boosting.

Design of Small-Area and High-Reliability 512-Bit EEPROM IP for UHF RFID Tag Chips (UHF RFID Tag Chip용 저면적·고신뢰성 512bit EEPROM IP 설계)

  • Lee, Dong-Hoon;Jin, Liyan;Jang, Ji-Hye;Ha, Pan-Bong;Kim, Young-Hee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.2
    • /
    • pp.302-312
    • /
    • 2012
  • In this paper, small-area and high-reliability design techniques of a 512-bit EEPROM are designed for UHF RFID tag chips. For a small-area technique, there are a WL driver circuit simplifying its decoding logic and a VREF generator using a resistor divider instead of a BGR. The layout size of the designed 512-bit EEPROM IP with MagnaChip's $0.18{\mu}m$ EEPROM is $59.465{\mu}m{\times}366.76{\mu}m$ which is 16.7% smaller than the conventional counterpart. Also, we solve a problem of breaking 5V devices by keeping VDDP voltage constant since a boosted output from a DC-DC converter is made discharge to the common ground VSS instead of VDDP (=3.15V) in getting out of the write mode.

A Study on Recognition of Moving Object Crowdedness Based on Ensemble Classifiers in a Sequence (혼합분류기 기반 영상내 움직이는 객체의 혼잡도 인식에 관한 연구)

  • An, Tae-Ki;Ahn, Seong-Je;Park, Kwang-Young;Park, Goo-Man
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.2A
    • /
    • pp.95-104
    • /
    • 2012
  • Pattern recognition using ensemble classifiers is composed of strong classifier which consists of many weak classifiers. In this paper, we used feature extraction to organize strong classifier using static camera sequence. The strong classifier is made of weak classifiers which considers environmental factors. So the strong classifier overcomes environmental effect. Proposed method uses binary foreground image by frame difference method and the boosting is used to train crowdedness model and recognize crowdedness using features. Combination of weak classifiers makes strong ensemble classifier. The classifier could make use of potential features from the environment such as shadow and reflection. We tested the proposed system with road sequence and subway platform sequence which are included in "AVSS 2007" sequence. The result shows good accuracy and efficiency on complex environment.

An Improved AdaBoost Algorithm by Clustering Samples (샘플 군집화를 이용한 개선된 아다부스트 알고리즘)

  • Baek, Yeul-Min;Kim, Joong-Geun;Kim, Whoi-Yul
    • Journal of Broadcast Engineering
    • /
    • v.18 no.4
    • /
    • pp.643-646
    • /
    • 2013
  • We present an improved AdaBoost algorithm to avoid overfitting phenomenon. AdaBoost is widely known as one of the best solutions for object detection. However, AdaBoost tends to be overfitting when a training dataset has noisy samples. To avoid the overfitting phenomenon of AdaBoost, the proposed method divides positive samples into K clusters using k-means algorithm, and then uses only one cluster to minimize the training error at each iteration of weak learning. Through this, excessive partitions of samples are prevented. Also, noisy samples are excluded for the training of weak learners so that the overfitting phenomenon is effectively reduced. In our experiment, the proposed method shows better classification and generalization ability than conventional boosting algorithms with various real world datasets.

Enhanced Method for Person Name Retrieval in Academic Information Service (학술정보서비스에서 인명검색 고도화 방법)

  • Han, Hee-Jun;Yae, Yong-Hee;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.490-498
    • /
    • 2010
  • In the web or not, all academic information have the creator which produces that information. The creator can be individual, organization, institution, or country. Most information consist of the title, author and content. The article among academic information is described by title, author, keywords, abstract, publisher, ISSN(International Standard Serial Number) and etc., and the patent information is consisted some metadata such as invention title, applicant, inventors, agents, application number, claim items etc. Most web-based academic information services provide search functions to user by processing and handling these metadata, and the search function using the author field is important. In this paper, we propose an effective indexing management for person name search, and search techniques using boosting factor and near operation based on phrase search to improve precision rate of search result. And we describe person name retrieval result with another expression name, co-authors and persons in same research field. The approach presented in this paper provides accurate data and additional search results to user efficiently.

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Improving Weak Classifiers by Using Discriminant Function in Selecting Threshold Values (판별 함수를 이용한 문턱치 선정에 의한 약분류기 개선)

  • Shyam, Adhikari;Yoo, Hyeon-Joong;Kim, Hyong-Suk
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.12
    • /
    • pp.84-90
    • /
    • 2010
  • In this paper, we propose a quadratic discriminant analysis based approach for improving the discriminating strength of weak classifiers based on simple Haar-like features that were used in the Viola-Jones object detection framework. Viola and Jones built a strong classifier using a boosted ensemble of weak classifiers. However, their single threshold (or decision boundary) based weak classifier is sub-optimal and too weak for efficient discrimination between object class and background. A quadratic discriminant analysis based approach is presented which leads to hyper-quadric boundary between the object class and background class, thus realizing multiple thresholds based weak classifiers. Experiments carried out for car detection using 1000 positive and 3000 negative images for training, and 500 positive and 500 negative images for testing show that our method yields higher classification performance with fewer classifiers than single threshold based weak classifiers.

An Energy Consumption Prediction Model for Smart Factory Using Data Mining Algorithms (데이터 마이닝 기반 스마트 공장 에너지 소모 예측 모델)

  • Sathishkumar, VE;Lee, Myeongbae;Lim, Jonghyun;Kim, Yubin;Shin, Changsun;Park, Jangwoo;Cho, Yongyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.5
    • /
    • pp.153-160
    • /
    • 2020
  • Energy Consumption Predictions for Industries has a prominent role to play in the energy management and control system as dynamic and seasonal changes are occurring in energy demand and supply. This paper introduces and explores the steel industry's predictive models of energy consumption. The data used includes lagging and leading reactive power lagging and leading current variable, emission of carbon dioxide (tCO2) and load type. Four statistical models are trained and tested in the test set: (a) Linear Regression (LR), (b) Radial Kernel Support Vector Machine (SVM RBF), (c) Gradient Boosting Machine (GBM), and (d) Random Forest (RF). Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are used for calculating regression model predictive performance. When using all the predictors, the best model RF can provide RMSE value 7.33 in the test set.

A Capacitorless Low-Dropout Regulator With Enhanced Response Time (응답 시간을 향상 시킨 외부 커패시터가 없는 Low-Dropout 레귤레이터 회로)

  • Yeo, Jae-Jin;Roh, Jeong-Jin
    • Journal of IKEEE
    • /
    • v.19 no.4
    • /
    • pp.506-513
    • /
    • 2015
  • In this paper, an output-capacitorless, low-dropout (LDO) regulator is designed, which consumes $4.5{\mu}A$ quiescent current. Proposed LDO regulator is realized using two amplifier for good load regulation and fast response time, which provide high gain, high bandwidth, and high slew rate. In addition, a one-shot current boosting circuit is added for current control to charge and discharge the parasitic capacitance at the pass transistor gate. As a result, response time is improved during load-current transition. The designed circuit is implemented through a $0.11-{\mu}m$ CMOS process. We experimentally verify output voltage fluctuation of 260mV and recovery time of $0.8{\mu}s$ at maximum load current 200mA.