• 제목/요약/키워드: Random Forest (RF)

검색결과 182건 처리시간 0.03초

랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도 (A measure of discrepancy based on margin of victory useful for the determination of random forest size)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권3호
    • /
    • pp.515-524
    • /
    • 2017
  • 이 연구에서는 분류를 위한 RF (random forest)의 크기 결정에 유용한 승리표차 MV (margin of victory)에 기반한 불일치 측도를 제안하고자 한다. 여기서 MV는 현재의 RF에서 1등과 2등을 차지하는 집단이 무한 RF에서 차지하는 승리표차이다. 구체적으로 -MV가 양수이면 현재와 무한 RF 사이에 1등과 2등인 집단에서 불일치가 생긴다는 점에 착안하여, max(-MV, 0)을 하나의 불일치 측도로 제안한다. 이 불일치 측도에 근거하여 RF의 크기 결정에 적절한 진단통계량을 제안하며, 또한 이 통계량의 이론적인 점근분포를 유도한다. 마지막으로 이 통계량을 최근에 제안된 진단통계량들과 소표본 하에서 성능을 비교하는 모의실험을 실행한다.

랜덤포레스트의 크기 결정을 위한 간편 진단통계량 (A simple diagnostic statistic for determining the size of random forest)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권4호
    • /
    • pp.855-863
    • /
    • 2016
  • 이 연구에서는 RF (random forest)의 크기 결정을 위한 간편 진단통계량을 제안한다. 이 방법은 현재까지 생성된 의사결정나무의 1등과 2등인 집단이 무한히 생성된 의사결정나무에서 차지하는 승리표차인 MV (margin of victory)에 근거한다. 따라서 MV가 음수이면 현재의 RF와 무한 RF 사이에 괴리가 생기는 것을 의미한다. 이 연구에서 제안하는 방법은 -MV가 고정된 작은 양수 (예를 들면 0.03)보다 큰 개체의 비율에 근거한다. 이 방법에 의한 적절한 통계량 도출과 함께 이 통계량의 이론적인 분포를 유도한다. 또한 최근에 제안된 진단통계량과 성능을 비교하는 모의실험을 수행한다.

랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구 (An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest)

  • 김판준
    • 정보관리학회지
    • /
    • 제36권2호
    • /
    • pp.57-77
    • /
    • 2019
  • 대표적인 앙상블 기법으로서 랜덤포레스트(RF)를 문헌정보학 분야의 학술지 논문에 대한 자동분류에 적용하였다. 특히, 국내 학술지 논문에 주제 범주를 자동 할당하는 분류 성능 측면에서 트리 수, 자질선정, 학습집합 크기 등 주요 요소들에 대한 다각적인 실험을 수행하였다. 이를 통해, 실제 환경의 불균형 데이터세트(imbalanced dataset)에 대하여 랜덤포레스트(RF)의 성능을 최적화할 수 있는 방안을 모색하였다. 결과적으로 국내 학술지 논문의 자동분류에서 랜덤포레스트(RF)는 트리 수 구간 100~1000(C)과 카이제곱통계량(CHI)으로 선정한 소규모의 자질집합(10%), 대부분의 학습집합(9~10년)을 사용하는 경우에 가장 좋은 분류 성능을 기대할 수 있는 것으로 나타났다.

Application of machine learning for merging multiple satellite precipitation products

  • Van, Giang Nguyen;Jung, Sungho;Lee, Giha
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2021년도 학술발표회
    • /
    • pp.134-134
    • /
    • 2021
  • Precipitation is a crucial component of water cycle and play a key role in hydrological processes. Traditionally, gauge-based precipitation is the main method to achieve high accuracy of rainfall estimation, but its distribution is sparsely in mountainous areas. Recently, satellite-based precipitation products (SPPs) provide grid-based precipitation with spatio-temporal variability, but SPPs contain a lot of uncertainty in estimated precipitation, and the spatial resolution quite coarse. To overcome these limitations, this study aims to generate new grid-based daily precipitation using Automatic weather system (AWS) in Korea and multiple SPPs(i.e. CHIRPSv2, CMORPH, GSMaP, TRMMv7) during the period of 2003-2017. And this study used a machine learning based Random Forest (RF) model for generating new merging precipitation. In addition, several statistical linear merging methods are used to compare with the results of the RF model. In order to investigate the efficiency of RF, observed data from 64 observed Automated Synoptic Observation System (ASOS) were collected to evaluate the accuracy of the products through Kling-Gupta efficiency (KGE), probability of detection (POD), false alarm rate (FAR), and critical success index (CSI). As a result, the new precipitation generated through the random forest model showed higher accuracy than each satellite rainfall product and spatio-temporal variability was better reflected than other statistical merging methods. Therefore, a random forest-based ensemble satellite precipitation product can be efficiently used for hydrological simulations in ungauged basins such as the Mekong River.

  • PDF

COSMO-SkyMed 2 Image Color Mapping Using Random Forest Regression

  • Seo, Dae Kyo;Kim, Yong Hyun;Eo, Yang Dam;Park, Wan Yong
    • 한국측량학회지
    • /
    • 제35권4호
    • /
    • pp.319-326
    • /
    • 2017
  • SAR (Synthetic aperture radar) images are less affected by the weather compared to optical images and can be obtained at any time of the day. Therefore, SAR images are being actively utilized for military applications and natural disasters. However, because SAR data are in grayscale, it is difficult to perform visual analysis and to decipher details. In this study, we propose a color mapping method using RF (random forest) regression for enhancing the visual decipherability of SAR images. COSMO-SkyMed 2 and WorldView-3 images were obtained for the same area and RF regression was used to establish color configurations for performing color mapping. The results were compared with image fusion, a traditional color mapping method. The UIQI (universal image quality index), the SSIM (structural similarity) index, and CC (correlation coefficients) were used to evaluate the image quality. The color-mapped image based on the RF regression had a significantly higher quality than the images derived from the other methods. From the experimental result, the use of color mapping based on the RF regression for SAR images was confirmed.

An Assessment of a Random Forest Classifier for a Crop Classification Using Airborne Hyperspectral Imagery

  • Jeon, Woohyun;Kim, Yongil
    • 대한원격탐사학회지
    • /
    • 제34권1호
    • /
    • pp.141-150
    • /
    • 2018
  • Crop type classification is essential for supporting agricultural decisions and resource monitoring. Remote sensing techniques, especially using hyperspectral imagery, have been effective in agricultural applications. Hyperspectral imagery acquires contiguous and narrow spectral bands in a wide range. However, large dimensionality results in unreliable estimates of classifiers and high computational burdens. Therefore, reducing the dimensionality of hyperspectral imagery is necessary. In this study, the Random Forest (RF) classifier was utilized for dimensionality reduction as well as classification purpose. RF is an ensemble-learning algorithm created based on the Classification and Regression Tree (CART), which has gained attention due to its high classification accuracy and fast processing speed. The RF performance for crop classification with airborne hyperspectral imagery was assessed. The study area was the cultivated area in Chogye-myeon, Habcheon-gun, Gyeongsangnam-do, South Korea, where the main crops are garlic, onion, and wheat. Parameter optimization was conducted to maximize the classification accuracy. Then, the dimensionality reduction was conducted based on RF variable importance. The result shows that using the selected bands presents an excellent classification accuracy without using whole datasets. Moreover, a majority of selected bands are concentrated on visible (VIS) region, especially region related to chlorophyll content. Therefore, it can be inferred that the phenological status after the mature stage influences red-edge spectral reflectance.

시계열 위성영상과 머신러닝 기법을 이용한 산림 바이오매스 및 배출기준선 추정 (Machine-learning Approaches with Multi-temporal Remotely Sensed Data for Estimation of Forest Biomass and Forest Reference Emission Levels)

  • 이용규;이정수
    • 한국산림과학회지
    • /
    • 제111권4호
    • /
    • pp.603-612
    • /
    • 2022
  • 본 연구는 다중시기 위성영상과 머신러닝 알고리즘을 이용하여 준국가수준의 시계열 산림바이오매스량을 추정하였으며, 이를 바탕으로 산림배출기준선 설정하여 비교·분석하였다. 머신러닝기반의 산림바이오매스 추정 모델을 구축하기 위하여 Landsat TM 위성영상과 유럽항공우주국에서 제공하는 Biomass Climate Change Initiative 정보를 이용하였으며, 머신러닝 알고리즘은 비모수 학습모델인 k-Nearest Neighbor(kNN)과 의사결정나무 기반의 Random Forest(RF)를 적용하였다. 또한, 추정된 산림바이오매스량은 Forest reference emission levels(FREL) 자료와 비교하였다. 머신러닝 알고리즘 별 산림바이오매스 추정 모델을 비교해보면, 최적의 kNN 모델과 RF 모델의 Root Mean Square Error (RMSE)는 각각 35.9와 34.41였으며, RF모델이 kNN모델보다 상대적으로 우수하였다. 또한, FREL, kNN, RF 모델 별 산림배출기준선의 기울기는 각각 약 -33천ton, -253천ton, -92천ton으로 설정되었다.

Comparison of tree-based ensemble models for regression

  • Park, Sangho;Kim, Chanmin
    • Communications for Statistical Applications and Methods
    • /
    • 제29권5호
    • /
    • pp.561-589
    • /
    • 2022
  • When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee;Park, Dooho;Lee, Changhoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권10호
    • /
    • pp.5132-5148
    • /
    • 2017
  • Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

랜덤 포레스트를 이용한 X-선 혈관조영영상에서의 혈관 자동 영역화 알고리즘 (An Automatic Algorithm for Vessel Segmentation in X-Ray Angiogram using Random Forest)

  • 정성희;이수찬;심학준;정호엽;허용석;장혁재
    • 대한의용생체공학회:의공학회지
    • /
    • 제36권4호
    • /
    • pp.79-85
    • /
    • 2015
  • The purpose of this study is to develop an automatic algorithm for vessel segmentation in X-Ray angiogram using Random Forest (RF). The proposed algorithm is composed of the following steps: First, the multiscale hessian-based filtering is performed in order to enhance the vessel structure. Second, eigenvalues and eigenvectors of hessian matrix are used to learn the RF classifier as feature vectors. Finally, we can get the result through the trained RF. We evaluated the similarity between the result of proposed algorithm and the manual segmentation using 349 frames, and compared with the results of the following two methods: Frangi et al. and Krissian et al. According to the experimental results, the proposed algorithm showed high similarity compared to other two methods.