• Title/Summary/Keyword: Boosting algorithm

Search Result 165, Processing Time 0.021 seconds

Estimating Farmland Prices Using Distance Metrics and an Ensemble Technique (거리척도와 앙상블 기법을 활용한 지가 추정)

  • Lee, Chang-Ro;Park, Key-Ho
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.43-55
    • /
    • 2016
  • This study estimated land prices using instance-based learning. A k-nearest neighbor method was utilized among various instance-based learning methods, and the 10 distance metrics including Euclidean distance were calculated in k-nearest neighbor estimation. One distance metric prediction which shows the best predictive performance would be normally chosen as final estimate out of 10 distance metric predictions. In contrast to this practice, an ensemble technique which combines multiple predictions to obtain better performance was applied in this study. We applied the gradient boosting algorithm, a sort of residual-fitting model to our data in ensemble combining. Sales price data of farm lands in Haenam-gun, Jeolla Province were used to demonstrate advantages of instance-based learning as well as an ensemble technique. The result showed that the ensemble prediction was more accurate than previous 10 distance metric predictions.

Anomalous Trajectory Detection in Surveillance Systems Using Pedestrian and Surrounding Information

  • Doan, Trung Nghia;Kim, Sunwoong;Vo, Le Cuong;Lee, Hyuk-Jae
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.256-266
    • /
    • 2016
  • Concurrently detected and annotated abnormal events can have a significant impact on surveillance systems. By considering the specific domain of pedestrian trajectories, this paper presents two main contributions. First, as introduced in much of the work on trajectory-based anomaly detection in the literature, only information about pedestrian paths, such as direction and speed, is considered. Differing from previous work, this paper proposes a framework that deals with additional types of trajectory-based anomalies. These abnormal events take places when a person enters prohibited areas. Those restricted regions are constructed by an online learning algorithm that uses surrounding information, including detected pedestrians and background scenes. Second, a simple data-boosting technique is introduced to overcome a lack of training data; such a problem particularly challenges all previous work, owing to the significantly low frequency of abnormal events. This technique only requires normal trajectories and fundamental information about scenes to increase the amount of training data for both normal and abnormal trajectories. With the increased amount of training data, the conventional abnormal trajectory classifier is able to achieve better prediction accuracy without falling into the over-fitting problem caused by complex learning models. Finally, the proposed framework (which annotates tracks that enter prohibited areas) and a conventional abnormal trajectory detector (using the data-boosting technique) are integrated to form a united detector. Such a detector deals with different types of anomalous trajectories in a hierarchical order. The experimental results show that all proposed detectors can effectively detect anomalous trajectories in the test phase.

Store Sales Prediction Using Gradient Boosting Model (그래디언트 부스팅 모델을 활용한 상점 매출 예측)

  • Choi, Jaeyoung;Yang, Heeyoon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.171-177
    • /
    • 2021
  • Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.

Super-resolution Algorithm using Local Structure Analysis and Scene Adaptive Dictionary (국부 구조 분석과 장면 적응 사전을 이용한 초고해상도 알고리즘)

  • Choi, Ik Hyun;Lim, Kyoung Won;Song, Byung Cheol
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.144-154
    • /
    • 2013
  • This paper proposes a new super-resolution algorithm where sharpness enhancement is merged in order to improve overall visual quality of up-scaled images. In the learning stage, multiple dictionaries are generated according to sharpness strength, and a proper dictionary among those dictionaries is selected to adapt to each patch in the inference stage. Also, additional post-processing suppresses boosting of artifacts in input low-resolution images during the inference stage. Experimental results that the proposed algorithm provides 0.3 higher CPBD than the bi-cubic and 0.1 higher CPBD than Song's and Fan's algorithms. Also, we can observe that the proposed algorithm shows better quality in textures and edges than the previous works. Finally, the proposed algorithm has a merit in terms of computational complexity because it requires the memory of only 17% in comparison with the previous work.

Prediction of Soil Moisture with Open Source Weather Data and Machine Learning Algorithms (공공 기상데이터와 기계학습 모델을 이용한 토양수분 예측)

  • Jang, Young-bin;Jang, Ik-hoon;Choe, Young-chan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.1
    • /
    • pp.1-12
    • /
    • 2020
  • As one of the essential resources in the agricultural process, soil moisture has been carefully managed by predicting future changes and deficits. In recent years, statistics and machine learning based approach to predict soil moisture has been preferred in academia for its generalizability and ease of use in the field. However, little is known that machine learning based soil moisture prediction is applicable in the situation of South Korea. In this sense, this paper aims to examine 1) whether publicly available weather data generated in South Korea has sufficient quality to predict soil moisture, 2) which machine learning algorithm would perform best in the situation of South Korea, and 3) whether a single machine learning model could be generally applicable in various regions. We used various machine learning methods such as Support Vector Machines (SVM), Random Forest (RF), Extremely Randomized Trees (ET), Gradient Boosting Machines (GBM), and Deep Feedforward Network (DFN) to predict future soil moisture in Andong, Boseong, Cheolwon, Suncheon region with open source weather data. As a result, GBM model showed the lowest prediction error in every data set we used (R squared: 0.96, RMSE: 1.8). Furthermore, GBM showed the lowest variance of prediction error between regions which indicates it has the highest generalizability.

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction (교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.3
    • /
    • pp.77-90
    • /
    • 2018
  • Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.

Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction (앙상블 머신러닝 모형을 이용한 하천 녹조발생 예측모형의 입력변수 특성에 따른 성능 영향)

  • Kang, Byeong-Koo;Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.6
    • /
    • pp.417-424
    • /
    • 2021
  • Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.

The Coverage Area for Extended Delivery Service in Eastern Economic Corridor (EEC): A Case of Thailand Post Co., Ltd

  • AMCHANG, Chompoonut
    • Journal of Distribution Science
    • /
    • v.18 no.4
    • /
    • pp.39-50
    • /
    • 2020
  • Purpose: This paper aimed to study the current locations of post offices to analyze service coverage area for parcel delivery in the Eastern Economics Corridor (EEC), which must be considered in the last mile to extend delivery service for e-commerce growth. Thailand Post was the case study in this paper. Research design, data and methodology: To involve solving the delivery service area under the last mile condition, the authors proposed a network analysis to determine service radius by employing a Geographic Information System (GIS). Furthermore, this paper applied Dijkstra's algorithm as a network analysis tool from GIS for analyzing the last mile service coverage area in a new economics zone. At the same time, the authors suggested an approach as a solution to locate last mile delivery center in EEC. Results: The results of the study pointed out that Thailand Post should consider more last mile delivery centers in EEC to support its express service in urban areas as well as improve the efficiency of service coverage for parcel delivery and create more advantages against competitors. Conclusions: This paper proposes a network analysis to extend the last mile service for parcel delivery by following Dijkstra's algorithm from GIS and a solution approach to add more last mile delivery centers. The results of the research will contribute to boosting customer satisfaction for last mile delivery service and enabling easy accessibility to a service center in EEC.

Real-time Smoke Detection Based on Colour Information, Morphological and Dynamic Features of the Smoke (연기의 색 정보, 형태학적 및 동적 특징 기반의 실시간 연기 검출)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.1
    • /
    • pp.21-26
    • /
    • 2015
  • In this paper, we propose a system which can detect the smoke in real time from the high-quality IP camera. For real-time processing, open directly the RTSP streams transmitted from the IP camera using the library FFmpeg as opening a video file. To recognize smoke, color information and morphological characteristics of smoke, as well as the dynamic characteristics of the smoke also considered for candidate regions. To combine the characteristics of the various smoke effectively, the Adaboost algorithm, was used as the boosting algorithm finally. Through the experiments with input videos from IP camera, the proposed algorithms were useful to detect smokes.

DR Image Enhancement Using Multiscale Non-Linear Gain Control For Laplacian Pyramid Transformation (라플라시안 피라미드에서의 다중스케일 비선형 이득 조절을 이용한 DR 영상 개선)

  • Shin, Dong-Kyu;Lee, Jin-Su;Kim, Sung-Hee;Park, In-Sung;Kim, Dong-Youn
    • Journal of Biomedical Engineering Research
    • /
    • v.28 no.2
    • /
    • pp.199-204
    • /
    • 2007
  • In digital radiography, to improve the contrast of digital radiography image, the multi-scale nonlinear amplification algorithm based on unsharp masking is one of the major image enhancement algorithms. In this paper, we used the Laplacian pyramid to decompose a digital radiography(DR) image. In our simulation, the DR image was decomposed into seven layers and the coefficients of the each layer was amplified with nonlinear function. We also imported a noise containment algorithm to limit noise amplification. To enhance the contrast of image, we proposed a new adaptive non-linear gain amplification coefficients. As a result of having applied to some clinical data, a detail visibility was improved significantly without unacceptable noise boosting. Images that acquired with the proposed adaptive non-linear gain coefficients have shown superior quality to those that applied similar gain control method and expected to be accepted in the clinical applications.