• Title/Summary/Keyword: boosting algorithm

Search Result 167, Processing Time 0.022 seconds

An Improved AdaBoost Algorithm by Clustering Samples (샘플 군집화를 이용한 개선된 아다부스트 알고리즘)

  • Baek, Yeul-Min;Kim, Joong-Geun;Kim, Whoi-Yul
    • Journal of Broadcast Engineering
    • /
    • v.18 no.4
    • /
    • pp.643-646
    • /
    • 2013
  • We present an improved AdaBoost algorithm to avoid overfitting phenomenon. AdaBoost is widely known as one of the best solutions for object detection. However, AdaBoost tends to be overfitting when a training dataset has noisy samples. To avoid the overfitting phenomenon of AdaBoost, the proposed method divides positive samples into K clusters using k-means algorithm, and then uses only one cluster to minimize the training error at each iteration of weak learning. Through this, excessive partitions of samples are prevented. Also, noisy samples are excluded for the training of weak learners so that the overfitting phenomenon is effectively reduced. In our experiment, the proposed method shows better classification and generalization ability than conventional boosting algorithms with various real world datasets.

Smarter Classification for Imbalanced Data Set and Its Application to Patent Evaluation (불균형 데이터 집합에 대한 스마트 분류방법과 특허 평가에의 응용)

  • Kwon, Ohbyung;Lee, Jonathan Sangyun
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.15-34
    • /
    • 2014
  • Overall, accuracy as a performance measure does not fully consider modular accuracy: the accuracy of classifying 1 (or true) as 1 is not same as classifying 0 (or false) as 0. A smarter classification algorithm would optimize the classification rules to match the modular accuracies' goals according to the nature of problem. Correspondingly, smarter algorithms must be both more generalized with respect to the nature of problems, and free from decretization, which may cause distortion of the real performance. Hence, in this paper, we propose a novel vertical boosting algorithm that improves modular accuracies. Rather than decretizing items, we use simple classifiers such as a regression model that accepts continuous data types. To improve the generalization, and to select a classification model that is well-suited to the nature of the problem domain, we developed a model selection algorithm with smartness. To show the soundness of the proposed method, we performed an experiment with a real-world application: predicting the intellectual properties of e-transaction technology, which had a 47,000+ record data set.

Real-time prediction on the slurry concentration of cutter suction dredgers using an ensemble learning algorithm

  • Han, Shuai;Li, Mingchao;Li, Heng;Tian, Huijing;Qin, Liang;Li, Jinfeng
    • International conference on construction engineering and project management
    • /
    • 2020.12a
    • /
    • pp.463-481
    • /
    • 2020
  • Cutter suction dredgers (CSDs) are widely used in various dredging constructions such as channel excavation, wharf construction, and reef construction. During a CSD construction, the main operation is to control the swing speed of cutter to keep the slurry concentration in a proper range. However, the slurry concentration cannot be monitored in real-time, i.e., there is a "time-lag effect" in the log of slurry concentration, making it difficult for operators to make the optimal decision on controlling. Concerning this issue, a solution scheme that using real-time monitored indicators to predict current slurry concentration is proposed in this research. The characteristics of the CSD monitoring data are first studied, and a set of preprocessing methods are presented. Then we put forward the concept of "index class" to select the important indices. Finally, an ensemble learning algorithm is set up to fit the relationship between the slurry concentration and the indices of the index classes. In the experiment, log data over seven days of a practical dredging construction is collected. For comparison, the Deep Neural Network (DNN), Long Short Time Memory (LSTM), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and the Bayesian Ridge algorithm are tried. The results show that our method has the best performance with an R2 of 0.886 and a mean square error (MSE) of 5.538. This research provides an effective way for real-time predicting the slurry concentration of CSDs and can help to improve the stationarity and production efficiency of dredging construction.

  • PDF

Estimation of Ground-level PM10 and PM2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data (부스팅 기반 기계학습기법을 이용한 지상 미세먼지 농도 산출)

  • Park, Seohui;Kim, Miae;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.321-335
    • /
    • 2021
  • Particulate matter (PM10 and PM2.5 with a diameter less than 10 and 2.5 ㎛, respectively) can be absorbed by the human body and adversely affect human health. Although most of the PM monitoring are based on ground-based observations, they are limited to point-based measurement sites, which leads to uncertainty in PM estimation for regions without observation sites. It is possible to overcome their spatial limitation by using satellite data. In this study, we developed machine learning-based retrieval algorithm for ground-level PM10 and PM2.5 concentrations using aerosol parameters from Geostationary Ocean Color Imager (GOCI) satellite and various meteorological parameters from a numerical weather prediction model during January to December of 2019. Gradient Boosted Regression Trees (GBRT) and Light Gradient Boosting Machine (LightGBM) were used to estimate PM concentrations. The model performances were examined for two types of feature sets-all input parameters (Feature set 1) and a subset of input parameters without meteorological and land-cover parameters (Feature set 2). Both models showed higher accuracy (about 10 % higher in R2) by using the Feature set 1 than the Feature set 2. The GBRT model using Feature set 1 was chosen as the final model for further analysis(PM10: R2 = 0.82, nRMSE = 34.9 %, PM2.5: R2 = 0.75, nRMSE = 35.6 %). The spatial distribution of the seasonal and annual-averaged PM concentrations was similar with in-situ observations, except for the northeastern part of China with bright surface reflectance. Their spatial distribution and seasonal changes were well matched with in-situ measurements.

Method of Improving Personal Name Search in Academic Information Service

  • Han, Heejun;Lee, Seok-Hyoung
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.2 no.2
    • /
    • pp.17-29
    • /
    • 2012
  • All academic information on the web or elsewhere has its creator, that is, a subject who has created the information. The subject can be an individual, a group, or an institution, and can be a nation depending on the nature of the relevant information. Most information is composed of a title, an author, and contents. An essay which is under the academic information category has metadata including a title, an author, keyword, abstract, data about publication, place of publication, ISSN, and the like. A patent has metadata including the title, an applicant, an inventor, an attorney, IPC, number of application, and claims of the invention. Most web-based academic information services enable users to search the information by processing the meta-information. An important element is to search information by using the author field which corresponds to a personal name. This study suggests a method of efficient indexing and using the adjacent operation result ranking algorithm to which phrase search-based boosting elements are applied, and thus improving the accuracy of the search results of personal names. It also describes a method for providing the results of searching co-authors and related researchers in searching personal names. This method can be effectively applied to providing accurate and additional search results in the academic information services.

Cost-Effective APF/UPS System with Seamless Mode Transfer

  • Lee, Woo-Cheol
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.1
    • /
    • pp.195-204
    • /
    • 2015
  • In this paper, the development of a cost-effective active power filter/uninterruptible power supply (APF/UPS) system with seamless mode transfer is described. The proposed scheme employs a pulse-width-modulation (PWM) voltage-source inverter and has two operational modes. First, when the source voltage is normal, the system operates as an APF, which compensates for the harmonics and power factor while boosting the DC-link voltage to be ready for the disturbance, without an additional DC charging circuit. A simple algorithm to detect the load current harmonics is also proposed. Second, when the source voltage is out of the normal range (owing to sag, swell, or outage), it operates a UPS, which controls the output voltage constantly by discharging the DC-link capacitor. Furthermore, a seamless transfer method for the single-phase inverter between the APF mode and the UPS mode is also proposed, in which an IGBT switch with diodes is used as a static bypass switch. Dissimilar to a conventional SCR switch, the IGBT switch can implement a seamless mode transfer. During the UPS operation, when the source voltage returns to the normal range, the system operates as an APF. The proposed system has good transient and steady-state response characteristics. The APF, charging circuit, and UPS systems are implemented in one inverter system. Finally, the validity of the proposed scheme is investigated with simulated and experimental results for a prototype APF/UPS system rated at 3 kVA.

An Efficient Pedestrian Detection Approach Using a Novel Split Function of Hough Forests

  • Do, Trung Dung;Vu, Thi Ly;Nguyen, Van Huan;Kim, Hakil;Lee, Chongho
    • Journal of Computing Science and Engineering
    • /
    • v.8 no.4
    • /
    • pp.207-214
    • /
    • 2014
  • In pedestrian detection applications, one of the most popular frameworks that has received extensive attention in recent years is widely known as a 'Hough forest' (HF). To improve the accuracy of detection, this paper proposes a novel split function to exploit the statistical information of the training set stored in each node during the construction of the forest. The proposed split function makes the trees in the forest more robust to noise and illumination changes. Moreover, the errors of each stage in the training forest are minimized using a global loss function to support trees to track harder training samples. After having the forest trained, the standard HF detector follows up to search for and localize instances in the image. Experimental results showed that the detection performance of the proposed framework was improved significantly with respect to the standard HF and alternating decision forest (ADF) in some public datasets.

Available Transfer Capability Enhancement with FACTS Devices in the Deregulated Electricity Market

  • Manikandan, B.V.;Raja, S. Charles;Venkatesh, P.
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.1
    • /
    • pp.14-24
    • /
    • 2011
  • In order to facilitate the electricity market operation and trade in the restructured environment, ample transmission capability should be provided to satisfy the demand of increasing power transactions. The conflict of this requirement and the restrictions on the transmission expansion in the restructured electricity market has motivated the development of methodologies to enhance the available transfer capability (ATC) of existing transmission grids. The insertion of flexible AC transmission System (FACTS) devices in electrical systems seems to be a promising strategy to enhance single area ATC and multi-area ATC. In this paper, the viability and technical merits of boosting single area ATC and multi-area ATC using Thyristor controlled series compensator (TCSC), static VAR compensator (SVC) and unified power flow controller (UPFC) in single device and multi-type three similar and different device combinations are analyzed. Particle swarm optimization (PSO) algorithm is employed to obtain the optimal settings of FACTS devices. The installation cost is also calculated. The study has been carried out on IEEE 30 bus and IEEE 118 bus systems for the selected bilateral, multilateral and area wise transactions.

Semi-Supervised Recursive Learning of Discriminative Mixture Models for Time-Series Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.3
    • /
    • pp.186-199
    • /
    • 2013
  • We pose pattern classification as a density estimation problem where we consider mixtures of generative models under partially labeled data setups. Unlike traditional approaches that estimate density everywhere in data space, we focus on the density along the decision boundary that can yield more discriminative models with superior classification performance. We extend our earlier work on the recursive estimation method for discriminative mixture models to semi-supervised learning setups where some of the data points lack class labels. Our model exploits the mixture structure in the functional gradient framework: it searches for the base mixture component model in a greedy fashion, maximizing the conditional class likelihoods for the labeled data and at the same time minimizing the uncertainty of class label prediction for unlabeled data points. The objective can be effectively imposed as individual mixture component learning on weighted data, hence our mixture learning typically becomes highly efficient for popular base generative models like Gaussians or hidden Markov models. Moreover, apart from the expectation-maximization algorithm, the proposed recursive estimation has several advantages including the lack of need for a pre-determined mixture order and robustness to the choice of initial parameters. We demonstrate the benefits of the proposed approach on a comprehensive set of evaluations consisting of diverse time-series classification problems in semi-supervised scenarios.

Two-Level Tries: A General Acceleration Structure for Parallel Routing Table Accesses

  • Mingche, Lai;Lei, Gao
    • Journal of Communications and Networks
    • /
    • v.13 no.4
    • /
    • pp.408-417
    • /
    • 2011
  • The stringent performance requirement for the high efficiency of routing protocols on the Internet can be satisfied by exploiting the threaded border gateway protocol (TBGP) on multi-cores, but the state-of-the-art TBGP performance is restricted by a mass of contentions when racing to access the routing table. To this end, the highly-efficient parallel access approach appears to be a promising solution to achieve ultra-high route processing speed. This study proposes a general routing table structure consisting of two-level tries for fast parallel access, and it presents a heuristic-based divide-and-recombine algorithm to solve a mass of contentions, thereby accelerating the parallel route updates of multi-threading and boosting the TBGP performance. As a projected TBGP, this study also modifies the table operations such as insert and lookup, and validates their correctness according to the behaviors of the traditional routing table. Our evaluations on a dual quad-core Xeon server show that the parallel access contentions decrease sharply by 92.5% versus the traditional routing table, and the maximal update time of a thread is reduced by 56.8 % on average with little overhead. The convergence time of update messages are improved by 49.7%.