• Title/Summary/Keyword: Random Forest

Search Result 1,072, Processing Time 0.032 seconds

Performance of Random Forest Classifier for Flood Mapping Using Sentinel-1 SAR Images

  • Chu, Yongjae;Lee, Hoonyol
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.4
    • /
    • pp.375-386
    • /
    • 2022
  • The city of Khartoum, the capital of Sudan, was heavily damaged by the flood of the Nile in 2020. Classification using satellite images can define the damaged area and help emergency response. As Synthetic Aperture Radar (SAR) uses microwave that can penetrate cloud, it is suitable to use in the flood study. In this study, Random Forest classifier, one of the supervised classification algorithms, was applied to the flood event in Khartoum with various sizes of the training dataset and number of images using Sentinel-1 SAR. To create a training dataset, we used unsupervised classification and visual inspection. Firstly, Random Forest was performed by reducing the size of each class of the training dataset, but no notable difference was found. Next, we performed Random Forest with various number of images. Accuracy became better as the number of images in creased, but converged to a maximum value when the dataset covers the duration from flood to the completion of drainage.

Study on the Effect of Training Data Sampling Strategy on the Accuracy of the Landslide Susceptibility Analysis Using Random Forest Method (Random Forest 기법을 이용한 산사태 취약성 평가 시 훈련 데이터 선택이 결과 정확도에 미치는 영향)

  • Kang, Kyoung-Hee;Park, Hyuck-Jin
    • Economic and Environmental Geology
    • /
    • v.52 no.2
    • /
    • pp.199-212
    • /
    • 2019
  • In the machine learning techniques, the sampling strategy of the training data affects a performance of the prediction model such as generalizing ability as well as prediction accuracy. Especially, in landslide susceptibility analysis, the data sampling procedure is the essential step for setting the training data because the number of non-landslide points is much bigger than the number of landslide points. However, the previous researches did not consider the various sampling methods for the training data. That is, the previous studies selected the training data randomly. Therefore, in this study the authors proposed several different sampling methods and assessed the effect of the sampling strategies of the training data in landslide susceptibility analysis. For that, total six different scenarios were set up based on the sampling strategies of landslide points and non-landslide points. Then Random Forest technique was trained on the basis of six different scenarios and the attribute importance for each input variable was evaluated. Subsequently, the landslide susceptibility maps were produced using the input variables and their attribute importances. In the analysis results, the AUC values of the landslide susceptibility maps, obtained from six different sampling strategies, showed high prediction rates, ranges from 70 % to 80 %. It means that the Random Forest technique shows appropriate predictive performance and the attribute importance for the input variables obtained from Random Forest can be used as the weight of landslide conditioning factors in the susceptibility analysis. In addition, the analysis results obtained using specific sampling strategies for training data show higher prediction accuracy than the analysis results using the previous random sampling method.

A measure of discrepancy based on margin of victory useful for the determination of random forest size (랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.515-524
    • /
    • 2017
  • In this study, a measure of discrepancy based on MV (margin of victory) has been suggested that might be useful in determining the size of random forest for classification. Here MV is a scaled difference in the votes, at infinite random forest, of two most popular classes of current random forest. More specifically, max(-MV,0) is proposed as a reasonable measure of discrepancy by noting that negative MV values mean a discrepancy in two most popular classes between the current and infinite random forests. We propose an appropriate diagnostic statistic based on this measure that might be useful for the determination of random forest size, and then we derive its asymptotic distribution. Finally, a simulation study has been conducted to compare the performances, in finite samples, between this proposed statistic and other recently proposed diagnostic statistics.

Research on improving correctness of cardiac disorder data classifier by applying Best-First decision tree method (Best-First decision tree 기법을 적용한 심전도 데이터 분류기의 정확도 향상에 관한 연구)

  • Lee, Hyun-Ju;Shin, Dong-Kyoo;Park, Hee-Won;Kim, Soo-Han;Shin, Dong-Il
    • Journal of Internet Computing and Services
    • /
    • v.12 no.6
    • /
    • pp.63-71
    • /
    • 2011
  • Cardiac disorder data are generally tested using the classifier and QRS-Complex and R-R interval which is used in this experiment are often extracted by ECG(Electrocardiogram) signals. The experimentation of ECG data with classifier is generally performed with SVM(Support Vector Machine) and MLP(Multilayer Perceptron) classifier, but this study experimented with Best-First Decision Tree(B-F Tree) derived from the Dicision Tree among Random Forest classifier algorithms to improve accuracy. To compare and analyze accuracy, experimentation of SVM, MLP, RBF(Radial Basic Function) Network and Decision Tree classifiers are performed and also compared the result of announced papers carried out under same interval and data. Comparing the accuracy of Random Forest classifier with above four ones, Random Forest is the best in accuracy. As though R-R interval was extracted using Band-pass filter in pre-processing of this experiment, in future, more filter study is needed to extract accurate interval.

A simple diagnostic statistic for determining the size of random forest (랜덤포레스트의 크기 결정을 위한 간편 진단통계량)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.855-863
    • /
    • 2016
  • In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.

The Development of Biomass Model for Pinus densiflora in Chungnam Region Using Random Effect (임의효과를 이용한 충남지역 소나무림의 바이오매스 모형 개발)

  • Pyo, Jungkee;Son, Yeong Mo
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.2
    • /
    • pp.213-218
    • /
    • 2017
  • The purpose of this study was to develop age-biomass model in Chungnam region containing random effect. To develop the biomass model by species and tree component, data for Pinus densiflora in central region is collected to 30 plots (150 trees). The mixed model were used to fixed effect in the age-biomass relation for Pinus densiflora, with random effect representing correlation of survey area were obtained. To verify the evaluation of the model for random effect, the akaike information criterion (abbreviated as, AIC) was used to calculate the variance-covariance matrix, and residual of repeated data. The estimated variance-covariance matrix, and residual were -1.0022, 0.6240, respectively. The model with random effect (AIC=377.2) has low AIC value, comparison with other study relating to random effects. It is for this reason that random effect associated with categorical data were used in the data fitting process, the model can be calibrated to fit the Chungnam region by obtaining measurements. Therefore, the results of this study could be useful method for developing biomass model using random effects by region.

Machine Learning Based Domain Classification for Korean Dialog System (기계학습을 이용한 한국어 대화시스템 도메인 분류)

  • Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Dialog system is becoming a new dominant interaction way between human and computer. It allows people to be provided with various services through natural language. The dialog system has a common structure of a pipeline consisting of several modules (e.g., speech recognition, natural language understanding, and dialog management). In this paper, we tackle a task of domain classification for the natural language understanding module by employing machine learning models such as convolutional neural network and random forest. For our dataset of seven service domains, we showed that the random forest model achieved the best performance (F1 score 0.97). As a future work, we will keep finding a better approach for domain classification by investigating other machine learning models.

Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms (기계학습 알고리즘을 이용한 보행만족도 예측모형 개발)

  • Lee, Jae Seung;Lee, Hyunhee
    • Journal of Korea Planning Association
    • /
    • v.54 no.3
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.

Classification Model and Crime Occurrence City Forecasting Based on Random Forest Algorithm

  • KANG, Sea-Am;CHOI, Jeong-Hyun;KANG, Min-soo
    • Korean Journal of Artificial Intelligence
    • /
    • v.10 no.1
    • /
    • pp.21-25
    • /
    • 2022
  • Korea has relatively less crime than other countries. However, the crime rate is steadily increasing. Many people think the crime rate is decreasing, but the crime arrest rate has increased. The goal is to check the relationship between CCTV and the crime rate as a way to lower the crime rate, and to identify the correlation between areas without CCTV and areas without CCTV. If you see a crime that can happen at any time, I think you should use a random forest algorithm. We also plan to use machine learning random forest algorithms to reduce the risk of overfitting, reduce the required training time, and verify high-level accuracy. The goal is to identify the relationship between CCTV and crime occurrence by creating a crime prevention algorithm using machine learning random forest techniques. Assuming that no crime occurs without CCTV, it compares the crime rate between the areas where the most crimes occur and the areas where there are no crimes, and predicts areas where there are many crimes. The impact of CCTV on crime prevention and arrest can be interpreted as a comprehensive effect in part, and the purpose isto identify areas and frequency of frequent crimes by comparing the time and time without CCTV.

Performance Comparison Analysis of Artificial Intelligence Models for Estimating Remaining Capacity of Lithium-Ion Batteries

  • Kyu-Ha Kim;Byeong-Soo Jung;Sang-Hyun Lee
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.3
    • /
    • pp.310-314
    • /
    • 2023
  • The purpose of this study is to predict the remaining capacity of lithium-ion batteries and evaluate their performance using five artificial intelligence models, including linear regression analysis, decision tree, random forest, neural network, and ensemble model. We is in the study, measured Excel data from the CS2 lithium-ion battery was used, and the prediction accuracy of the model was measured using evaluation indicators such as mean square error, mean absolute error, coefficient of determination, and root mean square error. As a result of this study, the Root Mean Square Error(RMSE) of the linear regression model was 0.045, the decision tree model was 0.038, the random forest model was 0.034, the neural network model was 0.032, and the ensemble model was 0.030. The ensemble model had the best prediction performance, with the neural network model taking second place. The decision tree model and random forest model also performed quite well, and the linear regression model showed poor prediction performance compared to other models. Therefore, through this study, ensemble models and neural network models are most suitable for predicting the remaining capacity of lithium-ion batteries, and decision tree and random forest models also showed good performance. Linear regression models showed relatively poor predictive performance. Therefore, it was concluded that it is appropriate to prioritize ensemble models and neural network models in order to improve the efficiency of battery management and energy systems.