Search | Korea Science

Comparison of tree-based ensemble models for regression

Park, Sangho;Kim, Chanmin
- Communications for Statistical Applications and Methods
- /
- v.29 no.5
- /
- pp.561-589
- /
- 2022
When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.
https://doi.org/10.29220/CSAM.2022.29.5.561 인용 PDF KSCI

URL Phishing Detection System Utilizing Catboost Machine Learning Approach

Fang, Lim Chian;Ayop, Zakiah;Anawar, Syarulnaziah;Othman, Nur Fadzilah;Harum, Norharyati;Abdullah, Raihana Syahirah
- International Journal of Computer Science & Network Security
- /
- v.21 no.9
- /
- pp.297-302
- /
- 2021
The development of various phishing websites enables hackers to access confidential personal or financial data, thus, decreasing the trust in e-business. This paper compared the detection techniques utilizing URL-based features. To analyze and compare the performance of supervised machine learning classifiers, the machine learning classifiers were trained by using more than 11,005 phishing and legitimate URLs. 30 features were extracted from the URLs to detect a phishing or legitimate URL. Logistic Regression, Random Forest, and CatBoost classifiers were then analyzed and their performances were evaluated. The results yielded that CatBoost was much better classifier than Random Forest and Logistic Regression with up to 96% of detection accuracy.
https://doi.org/10.22937/IJCSNS.2021.21.9.39 인용 PDF KSCI

Variable Selection with Regression Trees

Chang, Young-Jae
- The Korean Journal of Applied Statistics
- /
- v.23 no.2
- /
- pp.357-366
- /
- 2010
Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.
https://doi.org/10.5351/KJAS.2010.23.2.357 인용 PDF KSCI

Convergence study to detect metabolic syndrome risk factors by gender difference (성별에 따른 대사증후군의 위험요인 탐색을 위한 융복합 연구)

Lee, So-Eun;Rhee, Hyun-Sill
- Journal of Digital Convergence
- /
- v.19 no.12
- /
- pp.477-486
- /
- 2021
This study was conducted to detect metabolic syndrome risk factors and gender difference in adults. 18,616 cases of adults are collected by Korea Health and Nutrition Examination Study from 2016 to 2019. Using 4 types of machine Learning(Logistic Regression, Decision Tree, Naïve Bayes, Random Forest) to predict Metabolic Syndrome. The results showed that the Random Forest was superior to other methods in men and women. In both of participants, BMI, diet(fat, vitamin C, vitamin A, protein, energy intake), number of underlying chronic disease and age were the upper importance. In women, education level, menarche age, menopause was additional upper importance and age, number of underlying chronic disease were more powerful importance than men. Future study have to verify various strategy to prevent metabolic syndrome.
https://doi.org/10.14400/JDC.2021.19.12.477 인용 PDF KSCI

Predicting Gross Box Office Revenue for Domestic Films

Song, Jongwoo;Han, Suji
- Communications for Statistical Applications and Methods
- /
- v.20 no.4
- /
- pp.301-309
- /
- 2013
This paper predicts gross box office revenue for domestic films using the Korean film data from 2008-2011. We use three regression methods, Linear Regression, Random Forest and Gradient Boosting to predict the gross box office revenue. We only consider domestic films with a revenue size of at least KRW 500 million; relevant explanatory variables are chosen by data visualization and variable selection techniques. The key idea of analyzing this data is to construct the meaningful explanatory variables from the data sources available to the public. Some variables must be categorized to conduct more effective analysis and clustering methods are applied to achieve this task. We choose the best model based on performance in the test set and important explanatory variables are discussed.
https://doi.org/10.5351/CSAM.2013.20.4.301 인용 PDF KSCI

A development of the gas pipeline risk prediction models (도시가스 배관 위험 예측 모델 개발)

Park, Giljoo;Kim, Young-Chan;Lee, ChangYeol;Jo, Young-do;Chung, Won Hee
- Proceedings of the Korean Society of Disaster Information Conference
- /
- 2017.11a
- /
- pp.360-361
- /
- 2017
도시가스 배관의 안전을 위해 다양한 시스템이 가동되고 있지만 대부분 현장점검에 의존하는 한계점을 가지고 있다. 본 연구에서는 국내 도시가스 공급업체들 중 하나인 중부도시가스사의 실시간 배관운영 데이터를 분석해 배관의 위험을 예측한다. 배관의 압력, 출력전압, 출력전류, 방식전위, 전위값 데이터와 기타 도시가스 관련요인 데이터를 통합해 상관분석을 진행한다. 그리고 특정 공급권역의 실시간 배관 압력 데이터를 분석해 압력 수치를 예측한다. Random forest regression과 support vector regression(SVR) 알고리즘을 사용해 모델을 구성한 결과 배관 데이터의 시계열 정보를 추가한 데이터 셋과 random forest regression을 사용한 모델에서 가장 우수한 예측 성능을 보인다.
PDF

Ensemble approach for improving prediction in kernel regression and classification

Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
- Communications for Statistical Applications and Methods
- /
- v.23 no.4
- /
- pp.355-362
- /
- 2016
Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.
https://doi.org/10.5351/CSAM.2016.23.4.355 인용 PDF KSCI

Study on the ensemble methods with kernel ridge regression

Kim, Sun-Hwa;Cho, Dae-Hyeon;Seok, Kyung-Ha
- Journal of the Korean Data and Information Science Society
- /
- v.23 no.2
- /
- pp.375-383
- /
- 2012
The purpose of the ensemble methods is to increase the accuracy of prediction through combining many classifiers. According to recent studies, it is proved that random forests and forward stagewise regression have good accuracies in classification problems. However they have great prediction error in separation boundary points because they used decision tree as a base learner. In this study, we use the kernel ridge regression instead of the decision trees in random forests and boosting. The usefulness of our proposed ensemble methods was shown by the simulation results of the prostate cancer and the Boston housing data.
https://doi.org/10.7465/jkdi.2012.23.2.375 인용 PDF KSCI

Crop Yield and Crop Production Predictions using Machine Learning

Divya Goel;Payal Gulati
- International Journal of Computer Science & Network Security
- /
- v.23 no.9
- /
- pp.17-28
- /
- 2023
Today Agriculture segment is a significant supporter of Indian economy as it represents 18% of India's Gross Domestic Product (GDP) and it gives work to half of the nation's work power. Farming segment are required to satisfy the expanding need of food because of increasing populace. Therefore, to cater the ever-increasing needs of people of nation yield prediction is done at prior. The farmers are also benefited from yield prediction as it will assist the farmers to predict the yield of crop prior to cultivating. There are various parameters that affect the yield of crop like rainfall, temperature, fertilizers, ph level and other atmospheric conditions. Thus, considering these factors the yield of crop is thus hard to predict and becomes a challenging task. Thus, motivated this work as in this work dataset of different states producing different crops in different seasons is prepared; which was further pre-processed and there after machine learning techniques Gradient Boosting Regressor, Random Forest Regressor, Decision Tree Regressor, Ridge Regression, Polynomial Regression, Linear Regression are applied and their results are compared using python programming.
https://doi.org/10.22937/IJCSNS.2023.23.9.3 인용 PDF

City Gas Pipeline Pressure Prediction Model (도시가스 배관압력 예측모델)

Chung, Won Hee;Park, Giljoo;Gu, Yeong Hyeon;Kim, Sunghyun;Yoo, Seong Joon;Jo, Young-do
- The Journal of Society for e-Business Studies
- /
- v.23 no.2
- /
- pp.33-47
- /
- 2018
City gas pipelines are buried underground. Because of this, pipeline is hard to manage, and can be easily damaged. This research proposes a real time prediction system that helps experts can make decision about pressure anomalies. The gas pipline pressure data of Jungbu City Gas Company, which is one of the domestic city gas suppliers, time variables and environment variables are analysed. In this research, regression models that predicts pipeline pressure in minutes are proposed. Random forest, support vector regression (SVR), long-short term memory (LSTM) algorithms are used to build pressure prediction models. A comparison of pressure prediction models' preformances shows that the LSTM model was the best. LSTM model for Asan-si have root mean square error (RMSE) 0.011, mean absolute percentage error (MAPE) 0.494. LSTM model for Cheonan-si have RMSE 0.015, MAPE 0.668.
https://doi.org/10.7838/jsebs.2018.23.2.033 인용 PDF KSCI

Search Result 283, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)