• 제목/요약/키워드: random forest (RF) analysis

검색결과 60건 처리시간 0.023초

COSMO-SkyMed 2 Image Color Mapping Using Random Forest Regression

  • Seo, Dae Kyo;Kim, Yong Hyun;Eo, Yang Dam;Park, Wan Yong
    • 한국측량학회지
    • /
    • 제35권4호
    • /
    • pp.319-326
    • /
    • 2017
  • SAR (Synthetic aperture radar) images are less affected by the weather compared to optical images and can be obtained at any time of the day. Therefore, SAR images are being actively utilized for military applications and natural disasters. However, because SAR data are in grayscale, it is difficult to perform visual analysis and to decipher details. In this study, we propose a color mapping method using RF (random forest) regression for enhancing the visual decipherability of SAR images. COSMO-SkyMed 2 and WorldView-3 images were obtained for the same area and RF regression was used to establish color configurations for performing color mapping. The results were compared with image fusion, a traditional color mapping method. The UIQI (universal image quality index), the SSIM (structural similarity) index, and CC (correlation coefficients) were used to evaluate the image quality. The color-mapped image based on the RF regression had a significantly higher quality than the images derived from the other methods. From the experimental result, the use of color mapping based on the RF regression for SAR images was confirmed.

A Random Forest Model Based Pollution Severity Classification Scheme of High Voltage Transmission Line Insulators

  • Kannan, K.;Shivakumar, R.;Chandrasekar, S.
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권4호
    • /
    • pp.951-960
    • /
    • 2016
  • Tower insulators in electric power transmission network play a crucial role in preserving the reliability of the system. Electrical utilities frequently face the problem of flashover of insulators due to pollution deposition on their surface. Several research works based on leakage current (LC) measurement has been already carried out in developing diagnostic techniques for these insulators. Since the LC signal is highly intermittent in nature, estimation of pollution severity based on LC signal measurement over a short period of time will not produce accurate results. Reports on the measurement and analysis of LC signals over a long period of time is scanty. This paper attempts to use Random Forest (RF) classifier, which produces accurate results on large data bases, to analyze the pollution severity of high voltage tower insulators. Leakage current characteristics over a long period of time were measured in the laboratory on porcelain insulator. Pollution experiments were conducted at 11 kV AC voltage. Time domain analysis and wavelet transform technique were used to extract both basic features and histogram features of the LC signal. RF model was trained and tested with a variety of LC signals measured over a lengthy period of time and it is noticed that the proposed RF model based pollution severity classifier is efficient and will be helpful to electrical utilities for real time implementation.

Comparison of Machine Learning Analysis on Predictive Factors of Children's Planning-Organizing Executive Function by Income Level: Through Home Environment Quality and Wealth Factors

  • Lim, Hye-Kyung;Kim, Hyun-Ok;Park, Hae-Seon
    • 인간식물환경학회지
    • /
    • 제24권6호
    • /
    • pp.651-662
    • /
    • 2021
  • Background and objective: This study identifies whether children's planning-organizing executive function can be significantly classified and predicted by home environment quality and wealth factors. Methods: For empirical analysis, we used the data collected from the 10th Panel Study on Korean Children in 2017. Using machine learning tools such as support vector machine (SVM) and random forest (RF), we evaluated the accuracy of the model in which home environment factors classify and predict children's planning-organizing executive functions, and extract the relative importance of variables that determine these executive functions by income group. Results: First, SVM analysis shows that home environment quality and wealth factors show high accuracy in classification and prediction in all three groups. Second, RF analysis shows that estate had the highest predictive power in the high-income group, followed by income, asset, learning, reinforcement, and emotional environment. In the middle-income group, emotional environment showed the highest score, followed by estate, asset, reinforcement, and income. In the low-income group, estate showed the highest score, followed by income, asset, learning, reinforcement, and emotional environment. Conclusion: This study confirmed that home environment quality and wealth factors are significant factors in predicting children's planning-organizing executive functions.

머신러닝을 이용한 경기도 화재위험요인 예측분석 (Predictive Analysis of Fire Risk Factors in Gyeonggi-do Using Machine Learning)

  • 서민송;에베르 엔리케 카스티요 오소리오;유환희
    • 한국측량학회지
    • /
    • 제39권6호
    • /
    • pp.351-361
    • /
    • 2021
  • 화재는 막대한 재산과 인명피해를 초래하고 있으며 크고 작은 화재가 지속해서 발생하고 있다. 따라서 본 연구는 화재 유형별로 화재에 영향을 미치는 각종 위험요인을 예측하고자 한다. 전국에서 화재 발생 건수가 가장 많은 경기도를 대상으로 화재발생위험요인 예측분석을 실시하였다. 또한, 머신러닝 방법인 SVM, RF, GBRT를 활용하여 각 모형의 정확성을 MAE,RMSE를 통해 적합도가 높은 모형을 제시하였으며 이를 토대로 경기도 화재발생요인 예측분석을 실시하였다. 머신러닝 방법 3가지를 비교분석한 결과 RF가 MAE 1.517, RMSE 1.820으로 나타났으며 MAE, RMSE 검증데이터 및 시험데이터의 경우 MAE값 0.024, RMSE값 0.12의 차이로 매우 유사하게 나타나 가장 우수한 예측력으로 나타났다. RF기법을 적용하여 분석한 결과 공통적으로 발화장소가 화재발생에 가장 큰 영향을 주는 위험요인으로 나타났다. 이러한 연구 결과는 화재발생에 영향을 주는 요인들의 위험순서를 파악하여 화재안전관리의 유용한 자료로 활용될 것으로 예상된다.

Predicting the CPT-based pile set-up parameters using HHO-RF and PSO-RF hybrid models

  • Yun Dawei;Zheng Bing;Gu Bingbing;Gao Xibo;Behnaz Razzaghzadeh
    • Structural Engineering and Mechanics
    • /
    • 제86권5호
    • /
    • pp.673-686
    • /
    • 2023
  • Determining the properties of pile from cone penetration test (CPT) is costly, and need several in-situ tests. At the present study, two novel hybrid learning models, namely PSO-RF and HHO-RF, which are an amalgamation of random forest (RF) with particle swarm optimization (PSO) and Harris hawks optimization (HHO) were developed and applied to predict the pile set-up parameter "A" from CPT for the design aim of the projects. To forecast the "A," CPT data along were collected from different sites in Louisiana, where the selected variables as input were plasticity index (PI), undrained shear strength (Su), and over consolidation ratio (OCR). Results show that both PSO-RF and HHO-RF models have acceptable performance in predicting the set-up parameter "A," with R2 larger than 0.9094, representing the admissible correlation between observed and predicted values. HHO-RF has better proficiency than the PSO-RF model, with R2 and RMSE equal to 0.9328 and 0.0292 for the training phase and 0.9729 and 0.024 for testing data, respectively. Moreover, PI and OBJ indices are considered, in which the HHO-RF model has lower results which leads to outperforming this hybrid algorithm with respect to PSO-RF for predicting the pile set-up parameter "A," consequently being specified as the proposed model. Therefore, the results demonstrate the ability of the HHO algorithm in determining the optimal value of RF hyperparameters than PSO.

Prediction of Global Industrial Water Demand using Machine Learning

  • Panda, Manas Ranjan;Kim, Yeonjoo
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.156-156
    • /
    • 2022
  • Explicitly spatially distributed and reliable data on industrial water demand is very much important for both policy makers and researchers in order to carry a region-specific analysis of water resources management. However, such type of data remains scarce particularly in underdeveloped and developing countries. Current research is limited in using different spatially available socio-economic, climate data and geographical data from different sources in accordance to predict industrial water demand at finer resolution. This study proposes a random forest regression (RFR) model to predict the industrial water demand at 0.50× 0.50 spatial resolution by combining various features extracted from multiple data sources. The dataset used here include National Polar-orbiting Partnership (NPP)/Visible Infrared Imaging Radiometer Suite (VIIRS) night-time light (NTL), Global Power Plant database, AQUASTAT country-wise industrial water use data, Elevation data, Gross Domestic Product (GDP), Road density, Crop land, Population, Precipitation, Temperature, and Aridity. Compared with traditional regression algorithms, RF shows the advantages of high prediction accuracy, not requiring assumptions of a prior probability distribution, and the capacity to analyses variable importance. The final RF model was fitted using the parameter settings of ntree = 300 and mtry = 2. As a result, determinate coefficients value of 0.547 is achieved. The variable importance of the independent variables e.g. night light data, elevation data, GDP and population data used in the training purpose of RF model plays the major role in predicting the industrial water demand.

  • PDF

땅밀림 위험지 평가를 위한 기계학습 분류모델 비교 (A Performance Comparison of Machine Learning Classification Methods for Soil Creep Susceptibility Assessment)

  • 이제만;서정일;이진호;임상준
    • 한국산림과학회지
    • /
    • 제110권4호
    • /
    • pp.610-621
    • /
    • 2021
  • 지진 발생과 집중호우에 의해 땅밀림형 산사태 유형으로 분류되는 땅밀림 현상이 전국적으로 광범위하게 나타나고 있다. 산림청은 땅밀림으로 인한 인명 및 재산 피해를 예방하기 위해 땅밀림 우려지 현장조사 판정표를 통해 땅밀림 발생 위험지를 사전에 파악하고 있다. 한편 최근에는 컴퓨터 기술의 발달로 인공지능의 한 분야인 기계학습 분류기법을 이용하여 산지재해 취약성을 평가하거나 자연재해를 예측하고 있다. 따라서 이 연구에서는 기계학습 분류기법인 k-Nearest Neighbor(k-NN), Naive Bayes(NB), Random Forest(RF), 그리고 Support Vector Machine(SVM) 분류모델을 이용하여 땅밀림 발생 위험등급을 분류하였다. 한국치산기술협회의 2018~2020년 조사 자료 4,618개 중에서 땅밀림 현상의 발생 여부를 고려하여 발생지 총 146개소, 그리고 미발생지 146개소를 임의추출하여 292개 자료를 선정하였으며, 이 중 70%에 해당하는 204개소 자료를 훈련자료로 하여 모델을 구축하였다. 전체 자료의 30%에 해당하는 88개 검증자료에 대해 모델을 평가한 결과, k-NN은 0.727, NB는 0.750, RF는 0.807, 그리고 SVM은 0.750의 분류정확도를 보였다. 또한, Kappa 상관계수는 각각 0.534, 0.580, 0.673 및 0.585, 그리고 AUC는 각각 0.872, 0.912, 0.943 및 0.834로 계산되었다. 따라서 땅밀림 위험지역 판정을 위한 기계학습 분류모델은 RF, NB, SVM, 그리고 k-NN 순으로 높은 성능을 보였다. 기계학습 분류모델은 향후 산지토사재해의 예방 및 대응을 위한 기초자료로 활용 가능하며, 땅밀림 재해 관리 및 피해 경감에 위한 정책 개발에 필요한 정보를 제공할 것이다.

시간단위 전력사용량 시계열 패턴의 군집 및 분류분석 (Clustering and classification to characterize daily electricity demand)

  • 박다인;윤상후
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권2호
    • /
    • pp.395-406
    • /
    • 2017
  • 전력 공급 시스템의 효율적인 운영을 위해 전력수요예측은 필수적이다. 본 연구에서는 군집분석과 분류분석을 이용하여 일 단위 시간별 전력수요량 시계열 패턴의 유형을 살펴보고자 한다. 전력거래소에서 수집된 2008년 1월 1일부터 2012년 12월 31일까지의 일 단위 시간별 전력수요량 데이터를 추세성분, 계절성분, 오차 성분으로 구성된 시계열 자료로 변환하여 사용하였다. 추세성분을 제거한 시계열 자료의 패턴을 구분하기 위한 군집 분석방법은 k-평균 군집분석 (k-means), 가우시안혼합모델 혼합 모델 군집분석 (Gaussian mixture model), 함수적 군집분석 (functional clustering)을 고려하였다. 주성분분석을 통해 24시간 자료를 2개의 요인로 축소한 후 k-평균 군집분석과 가우시안 혼합 모델, 함수적 군집분석을 수행하였다. 군집분석 결과를 토대로 2008년부터 2011년까지 총 4년간 데이터를 4가지 분류분석방법인 의사결정나무, RF (random forest), Naive bayes, SVM (support vector machine)을 통해 훈련시켜 2012년 군집을 예측하였다. 분석 결과 가우시안 혼합 분포기반 군집분석과 RF를 이용한 군집예측 결과의 성능이 가장 우수하였다.

인공지능 기법을 활용한 한반도 해역의 수질평가지수 예측모델 개발 (Development of a Water Quality Indicator Prediction Model for the Korean Peninsula Seas using Artificial Intelligence)

  • 김성수;손규희;김도연;허장무;김성은
    • 해양환경안전학회지
    • /
    • 제29권1호
    • /
    • pp.24-35
    • /
    • 2023
  • 급격한 산업화와 도시화로 인해 해양 오염이 심각해지고 있으며, 이러한 해양 오염을 실효적으로 관리하기 위해 수질평가지수(Water Quality Index, WQI)를 마련하여 활용하고 있다. 하지만 수질평가지수는 다소 복잡한 계산과정으로 인한 정보의 손실, 기준값 변동, 실무자의 계산오류, 통계적 오류 등의 불확실성(uncertainty)을 내포하고 있다. 이에 따라 국내·외에서 인공지능 기법을 활용하여 수질평가지수를 예측하기 위한 연구가 활발히 이루어지고 있다. 본 연구에서는 해양환경측정망 자료(2000 ~ 2020년)를 활용하여 우리나라 전 해역 즉, 5개의 생태구에 대한 WQI를 추정할 수 있는 가장 적합한 인공지능기법을 도출하기 위해 총 6가지의 기법(RF, XGBoost, KNN, Ext, SVM, LR)을 실험하였다. 그 결과, Random Forest 기법이 다른 기법에 비해 가장 우수한 성능을 보였다. Random Forest 기법의 WQI 점수 예측값과 실제값의 잔차 분석 결과, 모든 생태구에서 시간적 및 공간적 예측 성능이 우수한 것으로 나타났다. 이를 통해 본 연구에서 개발한 Random Forest 기법은 높은 정확도를 바탕으로 우리나라 전해역에 대한 WQI를 예측 가능할 것으로 사료된다.

Machine learning-based analysis and prediction model on the strengthening mechanism of biopolymer-based soil treatment

  • Haejin Lee;Jaemin Lee;Seunghwa Ryu;Ilhan Chang
    • Geomechanics and Engineering
    • /
    • 제36권4호
    • /
    • pp.381-390
    • /
    • 2024
  • The introduction of bio-based materials has been recommended in the geotechnical engineering field to reduce environmental pollutants such as heavy metals and greenhouse gases. However, bio-treated soil methods face limitations in field application due to short research periods and insufficient verification of engineering performance, especially when compared to conventional materials like cement. Therefore, this study aimed to develop a machine learning model for predicting the unconfined compressive strength, a representative soil property, of biopolymer-based soil treatment (BPST). Four machine learning algorithms were compared to determine a suitable model, including linear regression (LR), support vector regression (SVR), random forest (RF), and neural network (NN). Except for LR, the SVR, RF, and NN algorithms exhibited high predictive performance with an R2 value of 0.98 or higher. The permutation feature importance technique was used to identify the main factors affecting the strength enhancement of BPST. The results indicated that the unconfined compressive strength of BPST is affected by mean particle size, followed by biopolymer content and water content. With a reliable prediction model, the proposed model can present guidelines prior to laboratory testing and field application, thereby saving a significant amount of time and money.