• Title/Summary/Keyword: Random Forest (RF)

Search Result 182, Processing Time 0.031 seconds

Comparative Study of Data Preprocessing and ML&DL Model Combination for Daily Dam Inflow Prediction (댐 일유입량 예측을 위한 데이터 전처리와 머신러닝&딥러닝 모델 조합의 비교연구)

  • Youngsik Jo;Kwansue Jung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.358-358
    • /
    • 2023
  • 본 연구에서는 그동안 수자원분야 강우유출 해석분야에 활용되었던 대표적인 머신러닝&딥러닝(ML&DL) 모델을 활용하여 모델의 하이퍼파라미터 튜닝뿐만 아니라 모델의 특성을 고려한 기상 및 수문데이터의 조합과 전처리(lag-time, 이동평균 등)를 통하여 데이터 특성과 ML&DL모델의 조합시나리오에 따른 일 유입량 예측성능을 비교 검토하는 연구를 수행하였다. 이를 위해 소양강댐 유역을 대상으로 1974년에서 2021년까지 축적된 기상 및 수문데이터를 활용하여 1) 강우, 2) 유입량, 3) 기상자료를 주요 영향변수(독립변수)로 고려하고, 이에 a) 지체시간(lag-time), b) 이동평균, c) 유입량의 성분분리조건을 적용하여 총 36가지 시나리오 조합을 ML&DL의 입력자료로 활용하였다. ML&DL 모델은 1) Linear Regression(LR), 2) Lasso, 3) Ridge, 4) SVR(Support Vector Regression), 5) Random Forest(RF), 6) LGBM(Light Gradient Boosting Model), 7) XGBoost의 7가지 ML방법과 8) LSTM(Long Short-Term Memory models), 9) TCN(Temporal Convolutional Network), 10) LSTM-TCN의 3가지 DL 방법, 총 10가지 ML&DL모델을 비교 검토하여 일유입량 예측을 위한 가장 적합한 데이터 조합 특성과 ML&DL모델을 성능평가와 함께 제시하였다. 학습된 모형의 유입량 예측 결과를 비교·분석한 결과, 소양강댐 유역에서는 딥러닝 중에서는 TCN모형이 가장 우수한 성능을 보였고(TCN>TCN-LSTM>LSTM), 트리기반 머신러닝중에서는 Random Forest와 LGBM이 우수한 성능을 보였으며(RF, LGBM>XGB), SVR도 LGBM수준의 우수한 성능을 나타내었다. LR, Lasso, Ridge 세가지 Regression모형은 상대적으로 낮은 성능을 보였다. 또한 소양강댐 댐유입량 예측에 대하여 강우, 유입량, 기상계열을 36가지로 조합한 결과, 입력자료에 lag-time이 적용된 강우계열의 조합 분석에서 세가지 Regression모델을 제외한 모든 모형에서 NSE(Nash-Sutcliffe Efficiency) 0.8이상(최대 0.867)의 성능을 보였으며, lag-time이 적용된 강우와 유입량계열을 조합했을 경우 NSE 0.85이상(최대 0.901)의 더 우수한 성능을 보였다.

  • PDF

Risk Factor Analysis of Cryopreserved Autologous Bone Flap Resorption in Adult Patients Undergoing Cranioplasty with Volumetry Measurement Using Conventional Statistics and Machine-Learning Technique

  • Yohan Son;Jaewoo Chung
    • Journal of Korean Neurosurgical Society
    • /
    • v.67 no.1
    • /
    • pp.103-114
    • /
    • 2024
  • Objective : Decompressive craniectomy (DC) with duroplasty is one of the common surgical treatments for life-threatening increased intracranial pressure (ICP). Once ICP is controlled, cranioplasty (CP) with reinsertion of the cryopreserved autologous bone flap or a synthetic implant is considered for protection and esthetics. Although with the risk of autologous bone flap resorption (BFR), cryopreserved autologous bone flap for CP is one of the important material due to its cost effectiveness. In this article, we performed conventional statistical analysis and the machine learning technique understand the risk factors for BFR. Methods : Patients aged >18 years who underwent autologous bone CP between January 2015 and December 2021 were reviewed. Demographic data, medical records, and volumetric measurements of the autologous bone flap volume from 94 patients were collected. BFR was defined with absolute quantitative method (BFR-A) and relative quantitative method (BFR%). Conventional statistical analysis and random forest with hyper-ensemble approach (RF with HEA) was performed. And overlapped partial dependence plots (PDP) were generated. Results : Conventional statistical analysis showed that only the initial autologous bone flap volume was statistically significant on BFR-A. RF with HEA showed that the initial autologous bone flap volume, interval between DC and CP, and bone quality were the factors with most contribution to BFR-A, while, trauma, bone quality, and initial autologous bone flap volume were the factors with most contribution to BFR%. Overlapped PDPs of the initial autologous bone flap volume on the BRF-A crossed at approximately 60 mL, and a relatively clear separation was found between the non-BFR and BFR groups. Therefore, the initial autologous bone flap of over 60 mL could be a possible risk factor for BFR. Conclusion : From the present study, BFR in patients who underwent CP with autologous bone flap might be inevitable. However, the degree of BFR may differ from one to another. Therefore, considering artificial bone flaps as implants for patients with large DC could be reasonable. Still, the risk factors for BFR are not clearly understood. Therefore, chronological analysis and pathophysiologic studies are needed.

Land Cover Classification with High Spatial Resolution Using Orthoimage and DSM Based on Fixed-Wing UAV

  • Kim, Gu Hyeok;Choi, Jae Wan
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.1
    • /
    • pp.1-10
    • /
    • 2017
  • An UAV (Unmanned Aerial Vehicle) is a flight system that is designed to conduct missions without a pilot. Compared to traditional airborne-based photogrammetry, UAV-based photogrammetry is inexpensive and can obtain high-spatial resolution data quickly. In this study, we aimed to classify the land cover using high-spatial resolution images obtained using a UAV. An RGB camera was used to obtain high-spatial resolution orthoimage. For accurate classification, multispectral image about same areas were obtained using a multispectral sensor. A DSM (Digital Surface Model) and a modified NDVI (Normalized Difference Vegetation Index) were generated using images obtained using the RGB camera and multispectral sensor. Pixel-based classification was performed for twelve classes by using the RF (Random Forest) method. The classification accuracy was evaluated based on the error matrix, and it was confirmed that the proposed method effectively classified the area compared to supervised classification using only the RGB image.

Performance Analysis of Opinion Mining using Word2vec (Word2vec을 이용한 오피니언 마이닝 성과분석 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.7-8
    • /
    • 2018
  • This study proposes an analysis of the Word2vec-based machine learning classifiers for the sake of opinion mining tasks. As a bench-marking method, BOW (Bag-of-Words) was adopted. On the basis of utilizing the Word2vec and BOW as feature extraction methods, we applied Laptop and Restaurant dataset to LR, DT, SVM, RF classifiers. The results showed that the Word2vec feature extraction yields more improved performance.

  • PDF

UAV-based Land Cover Mapping Technique for Monitoring Coastal Sand Dunes

  • Choi, Seok Keun;Kim, Gu Hyeok;Choi, Jae Wan;Lee, Soung Ki;Choi, Do Yoen;Jung, Sung Heuk;Chun, Sook Jin
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.1
    • /
    • pp.11-22
    • /
    • 2017
  • In recent years, coastal dune erosion has accelerated as various structures have been developed around the coastal dunes. A land cover map should be developed to identify the characteristics of sand dunes and to monitor the condition of sand dunes. The Korean Ministry of Environment's land cover maps suffer from problems, such as limited classes, target areas, and durations. Thus, this study conducted experiments using RGB and multispectral images based on UAV (Unmanned Aerial Vehicle) over an approximately one-year cycle to create a land cover map of coastal dunes. RF (Random Forest) classifier was used for the analysis in accordance with the experimental region's characteristics. The pixel- and object-based classification results obtained by using RGB and multispectral cameras were evaluated, respectively. The study results showed that object-based classification using multispectral images had the highest accuracy. Our results suggest that constant monitoring of coastal dunes can be performed effectively.

Comparison of machine learning algorithms for regression and classification of ultimate load-carrying capacity of steel frames

  • Kim, Seung-Eock;Vu, Quang-Viet;Papazafeiropoulos, George;Kong, Zhengyi;Truong, Viet-Hung
    • Steel and Composite Structures
    • /
    • v.37 no.2
    • /
    • pp.193-209
    • /
    • 2020
  • In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.

DLDW: Deep Learning and Dynamic Weighing-based Method for Predicting COVID-19 Cases in Saudi Arabia

  • Albeshri, Aiiad
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.212-222
    • /
    • 2021
  • Multiple waves of COVID-19 highlighted one crucial aspect of this pandemic worldwide that factors affecting the spread of COVID-19 infection are evolving based on various regional and local practices and events. The introduction of vaccines since early 2021 is expected to significantly control and reduce the cases. However, virus mutations and its new variant has challenged these expectations. Several countries, which contained the COVID-19 pandemic successfully in the first wave, failed to repeat the same in the second and third waves. This work focuses on COVID-19 pandemic control and management in Saudi Arabia. This work aims to predict new cases using deep learning using various important factors. The proposed method is called Deep Learning and Dynamic Weighing-based (DLDW) COVID-19 cases prediction method. Special consideration has been given to the evolving factors that are responsible for recent surges in the pandemic. For this purpose, two weights are assigned to data instance which are based on feature importance and dynamic weight-based time. Older data is given fewer weights and vice-versa. Feature selection identifies the factors affecting the rate of new cases evolved over the period. The DLDW method produced 80.39% prediction accuracy, 6.54%, 9.15%, and 7.19% higher than the three other classifiers, Deep learning (DL), Random Forest (RF), and Gradient Boosting Machine (GBM). Further in Saudi Arabia, our study implicitly concluded that lockdowns, vaccination, and self-aware restricted mobility of residents are effective tools in controlling and managing the COVID-19 pandemic.

Combined effect of glass and carbon fiber in asphalt concrete mix using computing techniques

  • Upadhya, Ankita;Thakur, M.S.;Sharma, Nitisha;Almohammed, Fadi H.;Sihag, Parveen
    • Advances in Computational Design
    • /
    • v.7 no.3
    • /
    • pp.253-279
    • /
    • 2022
  • This study investigated and predicted the Marshall stability of glass-fiber asphalt mix, carbon-fiber asphalt mix and glass-carbon-fiber asphalt (hybrid) mix by using machine learning techniques such as Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest(RF), The data was obtained from the experiments and the research articles. Assessment of results indicated that performance of the Artificial Neural Network (ANN) based model outperformed applied models in training and testing datasets with values of indices as; coefficient of correlation (CC) 0.8492 and 0.8234, mean absolute error (MAE) 2.0999 and 2.5408, root mean squared error (RMSE) 2.8541 and 3.3165, relative absolute error (RAE) 48.16% and 54.05%, relative squared error (RRSE) 53.14% and 57.39%, Willmott's index (WI) 0.7490 and 0.7011, Scattering index (SI) 0.4134 and 0.3702 and BIAS 0.3020 and 0.4300 for both training and testing stages respectively. The Taylor diagram also confirms that the ANN-based model outperforms the other models. Results of sensitivity analysis show that Carbon fiber has a major influence in predicting the Marshall stability. However, the carbon fiber (CF) followed by glass-carbon fiber (50GF:50CF) and the optimal combination CF + (50GF:50CF) are found to be most sensitive in predicting the Marshall stability of fibrous asphalt concrete.

A Machine Learning-Driven Approach for Wildfire Detection Using Hybrid-Sentinel Data: A Case Study of the 2022 Uljin Wildfire, South Korea

  • Linh Nguyen Van;Min Ho Yeon;Jin Hyeong Lee;Gi Ha Lee
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.175-175
    • /
    • 2023
  • Detection and monitoring of wildfires are essential for limiting their harmful effects on ecosystems, human lives, and property. In this research, we propose a novel method running in the Google Earth Engine platform for identifying and characterizing burnt regions using a hybrid of Sentinel-1 (C-band synthetic aperture radar) and Sentinel-2 (multispectral photography) images. The 2022 Uljin wildfire, the severest event in South Korean history, is the primary area of our investigation. Given its documented success in remote sensing and land cover categorization applications, we select the Random Forest (RF) method as our primary classifier. Next, we evaluate the performance of our model using multiple accuracy measures, including overall accuracy (OA), Kappa coefficient, and area under the curve (AUC). The proposed method shows the accuracy and resilience of wildfire identification compared to traditional methods that depend on survey data. These results have significant implications for the development of efficient and dependable wildfire monitoring systems and add to our knowledge of how machine learning and remote sensing-based approaches may be combined to improve environmental monitoring and management applications.

  • PDF

Analysis of Online Behavior and Prediction of Learning Performance in Blended Learning Environments

  • JO, Il-Hyun;PARK, Yeonjeong;KIM, Jeonghyun;SONG, Jongwoo
    • Educational Technology International
    • /
    • v.15 no.2
    • /
    • pp.71-88
    • /
    • 2014
  • A variety of studies to predict students' performance have been conducted since educational data such as web-log files traced from Learning Management System (LMS) are increasingly used to analyze students' learning behaviors. However, it is still challenging to predict students' learning achievement in blended learning environment where online and offline learning are combined. In higher education, diverse cases of blended learning can be formed from simple use of LMS for administrative purposes to full usages of functions in LMS for online distance learning class. As a result, a generalized model to predict students' academic success does not fulfill diverse cases of blended learning. This study compares two blended learning classes with each prediction model. The first blended class which involves online discussion-based learning revealed a linear regression model, which explained 70% of the variance in total score through six variables including total log-in time, log-in frequencies, log-in regularities, visits on boards, visits on repositories, and the number of postings. However, the second case, a lecture-based class providing regular basis online lecture notes in Moodle show weaker results from the same linear regression model mainly due to non-linearity of variables. To investigate the non-linear relations between online activities and total score, RF (Random Forest) was utilized. The results indicate that there are different set of important variables for the two distinctive types of blended learning cases. Results suggest that the prediction models and data-mining technique should be based on the considerations of diverse pedagogical characteristics of blended learning classes.