• Title/Summary/Keyword: Boosting algorithm

Search Result 165, Processing Time 0.021 seconds

An Ensemble Classifier Based Method to Select Optimal Image Features for License Plate Recognition (차량 번호판 인식을 위한 앙상블 학습기 기반의 최적 특징 선택 방법)

  • Jo, Jae-Ho;Kang, Dong-Joong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.1
    • /
    • pp.142-149
    • /
    • 2016
  • This paper proposes a method to detect LP(License Plate) of vehicles in indoor and outdoor parking lots. In restricted environment, there are many conventional methods for detecting LP. But, it is difficult to detect LP in natural and complex scenes with background clutters because several patterns similar with text or LP always exist in complicated backgrounds. To verify the performance of LP text detection in natural images, we apply MB-LGP feature by combining with ensemble machine learning algorithm in purpose of selecting optimal features of small number in huge pool. The feature selection is performed by adaptive boosting algorithm that shows great performance in minimum false positive detection ratio and in computing time when combined with cascade approach. MSER is used to provide initial text regions of vehicle LP. Throughout the experiment using real images, the proposed method functions robustly extracting LP in natural scene as well as the controlled environment.

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Boosting the Face Recognition Performance of Ensemble Based LDA for Pose, Non-uniform Illuminations, and Low-Resolution Images

  • Haq, Mahmood Ul;Shahzad, Aamir;Mahmood, Zahid;Shah, Ayaz Ali;Muhammad, Nazeer;Akram, Tallha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.3144-3164
    • /
    • 2019
  • Face recognition systems have several potential applications, such as security and biometric access control. Ongoing research is focused to develop a robust face recognition algorithm that can mimic the human vision system. Face pose, non-uniform illuminations, and low-resolution are main factors that influence the performance of face recognition algorithms. This paper proposes a novel method to handle the aforementioned aspects. Proposed face recognition algorithm initially uses 68 points to locate a face in the input image and later partially uses the PCA to extract mean image. Meanwhile, the AdaBoost and the LDA are used to extract face features. In final stage, classic nearest centre classifier is used for face classification. Proposed method outperforms recent state-of-the-art face recognition algorithms by producing high recognition rate and yields much lower error rate for a very challenging situation, such as when only frontal ($0^{\circ}$) face sample is available in gallery and seven poses ($0^{\circ}$, ${\pm}30^{\circ}$, ${\pm}35^{\circ}$, and ${\pm}45^{\circ}$) as a probe on the LFW and the CMU Multi-PIE databases.

Comparing the Performance of 17 Machine Learning Models in Predicting Human Population Growth of Countries

  • Otoom, Mohammad Mahmood
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.220-225
    • /
    • 2021
  • Human population growth rate is an important parameter for real-world planning. Common approaches rely upon fixed parameters like human population, mortality rate, fertility rate, which is collected historically to determine the region's population growth rate. Literature does not provide a solution for areas with no historical knowledge. In such areas, machine learning can solve the problem, but a multitude of machine learning algorithm makes it difficult to determine the best approach. Further, the missing feature is a common real-world problem. Thus, it is essential to compare and select the machine learning techniques which provide the best and most robust in the presence of missing features. This study compares 17 machine learning techniques (base learners and ensemble learners) performance in predicting the human population growth rate of the country. Among the 17 machine learning techniques, random forest outperformed all the other techniques both in predictive performance and robustness towards missing features. Thus, the study successfully demonstrates and compares machine learning techniques to predict the human population growth rate in settings where historical data and feature information is not available. Further, the study provides the best machine learning algorithm for performing population growth rate prediction.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.5
    • /
    • pp.617-625
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

Disguised-Face Discriminator for Embedded Systems

  • Yun, Woo-Han;Kim, Do-Hyung;Yoon, Ho-Sub;Lee, Jae-Yeon
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.761-765
    • /
    • 2010
  • In this paper, we introduce an improved adaptive boosting (AdaBoost) classifier and its application, a disguised-face discriminator that discriminates between bare and disguised faces. The proposed classifier is based on an AdaBoost learning algorithm and regression technique. In the process, the lookup table of AdaBoost learning is utilized. The proposed method is verified on the captured images under several real environments. Experimental results and analysis show the proposed method has a higher and faster performance than other well-known methods.

Scaling Reuse Detection in the Web through Two-way Boosting with Signatures and LSH

  • Kim, Jong Wook
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.6
    • /
    • pp.735-745
    • /
    • 2013
  • The emergence of Web 2.0 technologies, such as blogs and wiki, enable even naive users to easily create and share content on the Web using freely available content sharing tools. Wide availability of almost free data and promiscuous sharing of content through social networking platforms created a content borrowing phenomenon, where the same content appears (in many cases in the form of extensive quotations) in different outlets. An immediate side effect of this phenomenon is that identifying which content is re-used by whom is becoming a critical tool in social network analysis, including expert identification and analysis of information flow. Internet-scale reuse detection, however, poses extremely challenging scalability issues: considering the large size of user created data on the web, it is essential that the techniques developed for content-reuse detection should be fast and scalable. Thus, in this paper, we propose a $qSign_{lsh}$ algorithm, a mechanism for identifying multi-sentence content reuse among documents by efficiently combining sentence-level evidences. The experiment results show that $qSign_{lsh}$ significantly improves the reuse detection speed and provides high recall.

Optimization of Domain-Independent Classification Framework for Mood Classification

  • Choi, Sung-Pil;Jung, Yu-Chul;Myaeng, Sung-Hyon
    • Journal of Information Processing Systems
    • /
    • v.3 no.2
    • /
    • pp.73-81
    • /
    • 2007
  • In this paper, we introduce a domain-independent classification framework based on both k-nearest neighbor and Naive Bayesian classification algorithms. The architecture of our system is simple and modularized in that each sub-module of the system could be changed or improved efficiently. Moreover, it provides various feature selection mechanisms to be applied to optimize the general-purpose classifiers for a specific domain. As for the enhanced classification performance, our system provides conditional probability boosting (CPB) mechanism which could be used in various domains. In the mood classification domain, our optimized framework using the CPB algorithm showed 1% of improvement in precision and 2% in recall compared with the baseline.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.99-104
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.