• Title/Summary/Keyword: Random Forest, RF

Search Result 193, Processing Time 0.036 seconds

Estimation of Surface fCO2 in the Southwest East Sea using Machine Learning Techniques (기계학습법을 이용한 동해 남서부해역의 표층 이산화탄소분압(fCO2) 추정)

  • HAHM, DOSHIK;PARK, SOYEONA;CHOI, SANG-HWA;KANG, DONG-JIN;RHO, TAEKEUN;LEE, TONGSUP
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.24 no.3
    • /
    • pp.375-388
    • /
    • 2019
  • Accurate evaluation of sea-to-air $CO_2$ flux and its variability is crucial information to the understanding of global carbon cycle and the prediction of atmospheric $CO_2$ concentration. $fCO_2$ observations are sparse in space and time in the East Sea. In this study, we derived high resolution time series of surface $fCO_2$ values in the southwest East Sea, by feeding sea surface temperature (SST), salinity (SSS), chlorophyll-a (CHL), and mixed layer depth (MLD) values, from either satellite-observations or numerical model outputs, to three machine learning models. The root mean square error of the best performing model, a Random Forest (RF) model, was $7.1{\mu}atm$. Important parameters in predicting $fCO_2$ in the RF model were SST and SSS along with time information; CHL and MLD were much less important than the other parameters. The net $CO_2$ flux in the southwest East Sea, calculated from the $fCO_2$ predicted by the RF model, was $-0.76{\pm}1.15mol\;m^{-2}yr^{-1}$, close to the lower bound of the previous estimates in the range of $-0.66{\sim}-2.47mol\;m^{-2}yr^{-1}$. The time series of $fCO_2$ predicted by the RF model showed a significant variation even in a short time interval of a week. For accurate evaluation of the $CO_2$ flux in the Ulleung Basin, it is necessary to conduct high resolution in situ observations in spring when $fCO_2$ changes rapidly.

Predicting the mortality of pneumonia patients visiting the emergency department through machine learning (기계학습모델을 통한 응급실 폐렴환자의 사망예측 모델과 기존 예측 모델의 비교)

  • Bae, Yeol;Moon, Hyung Ki;Kim, Soo Hyun
    • Journal of The Korean Society of Emergency Medicine
    • /
    • v.29 no.5
    • /
    • pp.455-464
    • /
    • 2018
  • Objective: Machine learning is not yet widely used in the medical field. Therefore, this study was conducted to compare the performance of preexisting severity prediction models and machine learning based models (random forest [RF], gradient boosting [GB]) for mortality prediction in pneumonia patients. Methods: We retrospectively collected data from patients who visited the emergency department of a tertiary training hospital in Seoul, Korea from January to March of 2015. The Pneumonia Severity Index (PSI) and Sequential Organ Failure Assessment (SOFA) scores were calculated for both groups and the area under the curve (AUC) for mortality prediction was computed. For the RF and GB models, data were divided into a test set and a validation set by the random split method. The training set was learned in RF and GB models and the AUC was obtained from the validation set. The mean AUC was compared with the other two AUCs. Results: Of the 536 investigated patients, 395 were enrolled and 41 of them died. The AUC values of PSI and SOFA scores were 0.799 (0.737-0.862) and 0.865 (0.811-0.918), respectively. The mean AUC values obtained by the RF and GB models were 0.928 (0.899-0.957) and 0.919 (0.886-0.952), respectively. There were significant differences between preexisting severity prediction models and machine learning based models (P<0.001). Conclusion: Classification through machine learning may help predict the mortality of pneumonia patients visiting the emergency department.

Prediction of Hardness for Cold Forging Manufacturing through Machine Learning (기계학습을 활용한 냉간단조 부품 제조 경도 예측 연구)

  • K. Kim;J-.G. Park;U. R. Heo;Y. H. Lee;D. H. Chang;H. W. Yang
    • Transactions of Materials Processing
    • /
    • v.32 no.6
    • /
    • pp.329-334
    • /
    • 2023
  • The process of heat treatment in cold forging is an essential role in enhancing mechanical properties. However, it relies heavily on the experience and skill of individuals. The aim of this study is to predict hardness using machine learning to optimize production efficiency in cold forging manufacturing. Random Forest (RF), Gradient Boosting Regressor (GBR), Extra Trees (ET), and ADAboosting (ADA) models were utilized. In the result, the RF, GBR, and ET models show the excellent performance. However, it was observed that GBR and ET models leaned significantly towards the influence of temperature, unlike the RF model. We suggest that RF model demonstrates greater reliability in predicting hardness due to its ability to consider various variables that occur during the cold forging process.

Development of a Classification Method for Forest Vegetation on the Stand Level, Using KOMPSAT-3A Imagery and Land Coverage Map (KOMPSAT-3A 위성영상과 토지피복도를 활용한 산림식생의 임상 분류법 개발)

  • Song, Ji-Yong;Jeong, Jong-Chul;Lee, Peter Sang-Hoon
    • Korean Journal of Environment and Ecology
    • /
    • v.32 no.6
    • /
    • pp.686-697
    • /
    • 2018
  • Due to the advance in remote sensing technology, it has become easier to more frequently obtain high resolution imagery to detect delicate changes in an extensive area, particularly including forest which is not readily sub-classified. Time-series analysis on high resolution images requires to collect extensive amount of ground truth data. In this study, the potential of land coverage mapas ground truth data was tested in classifying high-resolution imagery. The study site was Wonju-si at Gangwon-do, South Korea, having a mix of urban and natural areas. KOMPSAT-3A imagery taken on March 2015 and land coverage map published in 2017 were used as source data. Two pixel-based classification algorithms, Support Vector Machine (SVM) and Random Forest (RF), were selected for the analysis. Forest only classification was compared with that of the whole study area except wetland. Confusion matrixes from the classification presented that overall accuracies for both the targets were higher in RF algorithm than in SVM. While the overall accuracy in the forest only analysis by RF algorithm was higher by 18.3% than SVM, in the case of the whole region analysis, the difference was relatively smaller by 5.5%. For the SVM algorithm, adding the Majority analysis process indicated a marginal improvement of about 1% than the normal SVM analysis. It was found that the RF algorithm was more effective to identify the broad-leaved forest within the forest, but for the other classes the SVM algorithm was more effective. As the two pixel-based classification algorithms were tested here, it is expected that future classification will improve the overall accuracy and the reliability by introducing a time-series analysis and an object-based algorithm. It is considered that this approach will contribute to improving a large-scale land planning by providing an effective land classification method on higher spatial and temporal scales.

Vulnerability Assessment for Fine Particulate Matter (PM2.5) in the Schools of the Seoul Metropolitan Area, Korea: Part II - Vulnerability Assessment for PM2.5 in the Schools (인공지능을 이용한 수도권 학교 미세먼지 취약성 평가: Part II - 학교 미세먼지 범주화)

  • Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_2
    • /
    • pp.1891-1900
    • /
    • 2021
  • Fine particulate matter (FPM; diameter ≤ 2.5 ㎛) is frequently found in metropolitan areas due to activities associated with rapid urbanization and population growth. Many adolescents spend a substantial amount of time at school where, for various reasons, FPM generated outdoors may flow into indoor areas. The aims of this study were to estimate FPM concentrations and categorize types of FPM in schools. Meteorological and chemical variables as well as satellite-based aerosol optical depth were analyzed as input data in a random forest model, which applied 10-fold cross validation and a grid-search method, to estimate school FPM concentrations, with four statistical indicators used to evaluate accuracy. Loose and strict standards were established to categorize types of FPM in schools. Under the former classification scheme, FPM in most schools was classified as type 2 or 3, whereas under strict standards, school FPM was mostly classified as type 3 or 4.

Prediction of harmful algal cell density in Lake Paldang using machine learning (머신러닝을 활용한 팔당호 유해남조 세포수 예측)

  • Seohyun Byeon;Hankyu Lee;Jin Hwi Kim;Jae-Ki Shin;Yongeun Park
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.234-234
    • /
    • 2023
  • 유해 남조 대발생(Harmful Algal blooms, HABs)이 담수호에 발생하면 마이크로시스틴과 같은 독성물질과 맛·냄새 물질을 생성하여 상수원이용과 친수활동을 방해한다. 그래서 유해 남조 대발생 전 유해남조 세포수를 예측하여 선제적 대응하는 것은 중요하다. 따라서 본 연구는 머신러닝기반 Random Forest(RF)를 활용하여 팔당댐 앞의 유해남조 세포수를 예측하는 모델을 개발하고 성능을 평가하고자 한다. 모델 구축을 위해 2012년 4월부터 2021년 12월까지의 팔당호(삼봉리, 경안천) 및 남북한강(의암댐~이포보)권역의 조류, 수질, 수리/수문, 기상 자료를 수집하여 입력 및 출력 자료로 이용하였다. 수집된 데이터에는 다양한 입력변수들이 있어 남조 세포수 예측 성능 비교를 위한 전체 26개 변수 적용과 통계학적으로 상관관계가 높은 12개 변수 적용을 통해 모델을 구축하였다. 입력, 출력 자료로 이용한 유해남조 세포수는 로그변환된 값으로 사용하였으며 일반적인 조류 시료 채취기간이 7일이므로 7일 후를 예측하기 위한 모델을 구축하였다. 구축한 모델의 성능은 실측데이터와 예측데이터의 R2로 산출하여 평가하였다. 전체 26개 입력변수로 모델 구축 후 학습 및 검증 수행 결과 R2의 학습 0.803, 검증 0.729로 나타났고, 유해남조 세포수와 유의미한 상관관계를 보이는 12개 입력변수로 모델 구축 후 학습 및 검증 수행 R2은 학습 0.784, 검증 0.731로 나타났다. 두 모델의 성능을 살펴본 결과 입력변수 개수의 변화에 따른 성능차이는 크지 않은 것으로 나타났으며, 남조세포수 예측을 위한 모델로서 활용가능함을 알 수 있었다. 향후 연구에서는 Random Forest 외 다른 기계학습 모델들과 딥러닝 모델을 통해 남조세포수 예측 성능이 높은 모델을 구축해볼 필요성이 있다.

  • PDF

Estimation of Fractional Vegetation Cover in Sand Dunes Using Multi-spectral Images from Fixed-wing UAV

  • Choi, Seok Keun;Lee, Soung Ki;Jung, Sung Heuk;Choi, Jae Wan;Choi, Do Yoen;Chun, Sook Jin
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.4
    • /
    • pp.431-441
    • /
    • 2016
  • Since the use of UAV (Unmanned Aerial Vehicle) is convenient for the acquisition of data on broad or inaccessible regions, it is nowadays used to establish spatial information for various fields, such as the environment, ecosystem, forest, or for military purposes. In this study, the process of estimating FVC (Fractional Vegetation Cover), based on multi-spectral UAV, to overcome the limitations of conventional methods is suggested. Hence, we propose that the FVC map is generated by using multi-spectral imaging. First, two types of result classifications were obtained based on RF (Random Forest) using RGB images and NDVI (Normalized Difference Vegetation Index) with RGB images. Then, the result map was reclassified into vegetation and non-vegetation. Finally, an FVC map-based RF were generated by using pixel calculation and FVC map-based GI (Gutman and Ignatov) model were indirectly made by fixed parameters. The method of adding NDVI shows a relatively higher accuracy compared to that of adding only RGB, and in particular, the GI model shows a lower RMSE (Root Mean Square Error) with 0.182 than RF. In this regard, the availability of the GI model which uses only the values of NDVI is higher than that of RF whose accuracy varies according to the results of classification. Our results showed that the GI mode ensures the quality of the FVC if the NDVI maintained at a uniform level. This can be easily achieved by using a UAV, which can provide vegetation data to improve the estimation of FVC.

Defect Severity-based Ensemble Model using FCM (FCM을 적용한 결함심각도 기반 앙상블 모델)

  • Lee, Na-Young;Kwon, Ki-Tae
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.12
    • /
    • pp.681-686
    • /
    • 2016
  • Software defect prediction is an important factor in efficient project management and success. The severity of the defect usually determines the degree to which the project is affected. However, existing studies focus only on the presence or absence of a defect and not the severity of defect. In this study, we proposed an ensemble model using FCM based on defect severity. The severity of the defect of NASA data set's PC4 was reclassified. To select the input column that affected the severity of the defect, we extracted the important defect factor of the data set using Random Forest (RF). We evaluated the performance of the model by changing the parameters in the 10-fold cross-validation. The evaluation results were as follows. First, defect severities were reclassified from 58, 40, 80 to 30, 20, 128. Second, BRANCH_COUNT was an important input column for the degree of severity in terms of accuracy and node impurities. Third, smaller tree number led to more variables for good performance.

Analysis of Dimensionality Reduction Methods Through Epileptic EEG Feature Selection for Machine Learning in BCI (BCI에서 기계 학습을 위한 간질 뇌파 특징 선택을 통한 차원 감소 방법 분석)

  • Tong, Yang;Aliyu, Ibrahim;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1333-1342
    • /
    • 2018
  • Until now, Electroencephalography(: EEG) has been the most important and convenient method for the diagnosis and treatment of epilepsy. However, it is difficult to identify the wave characteristics of an epileptic EEG signals because it is very weak, non-stationary and has strong background noise. In this paper, we analyse the effect of dimensionality reduction methods on Epileptic EEG feature selection and classification. Three dimensionality reduction methods: Pincipal Component Analysis(: PCA), Kernel Principal Component Analysis(: KPCA) and Linear Discriminant Analysis(: LDA) were investigated. The performance of each method was evaluated by using Support Vector Machine SVM, Logistic Regression(: LR), K-Nearestneighbor(: K-NN), Decision Tree(: DR) and Random Forest(: RF). From the experimental result, PCA recorded 75% of highest accuracy in SVM, LR and K-NN. KPCA recorded 85% of best performance in SVM and K-KNN while LDA achieved 100% accuracy in K-NN. Thus, LDA dimensionality reduction is found to provide the best classification result for epileptic EEG signal.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.