• 제목/요약/키워드: Random forest model

검색결과 542건 처리시간 0.033초

투자와 수출 및 환율의 고용에 대한 의사결정 나무, 랜덤 포레스트와 그래디언트 부스팅 머신러닝 모형 예측 (Investment, Export, and Exchange Rate on Prediction of Employment with Decision Tree, Random Forest, and Gradient Boosting Machine Learning Models)

  • 이재득
    • 무역학회지
    • /
    • 제46권2호
    • /
    • pp.281-299
    • /
    • 2021
  • This paper analyzes the feasibility of using machine learning methods to forecast the employment. The machine learning methods, such as decision tree, artificial neural network, and ensemble models such as random forest and gradient boosting regression tree were used to forecast the employment in Busan regional economy. The following were the main findings of the comparison of their predictive abilities. First, the forecasting power of machine learning methods can predict the employment well. Second, the forecasting values for the employment by decision tree models appeared somewhat differently according to the depth of decision trees. Third, the predictive power of artificial neural network model, however, does not show the high predictive power. Fourth, the ensemble models such as random forest and gradient boosting regression tree model show the higher predictive power. Thus, since the machine learning method can accurately predict the employment, we need to improve the accuracy of forecasting employment with the use of machine learning methods.

Random Forest 기법을 이용한 산사태 취약성 평가 시 훈련 데이터 선택이 결과 정확도에 미치는 영향 (Study on the Effect of Training Data Sampling Strategy on the Accuracy of the Landslide Susceptibility Analysis Using Random Forest Method)

  • 강경희;박혁진
    • 자원환경지질
    • /
    • 제52권2호
    • /
    • pp.199-212
    • /
    • 2019
  • 머신러닝 기법을 활용한 분석에서 훈련 데이터의 샘플링 전략은 예측 정확도 뿐 만 아니라 일반화 능력에도 많은 영향을 미친다. 특히, 산사태 취약성 분석의 경우, 산사태 발생부에 대한 정보에 비해 산사태 미발생부에 대한 정보가 과도하게 많은 데이터 불균형 현상이 발생하며, 이에 따라 분석 모델의 훈련 데이터 설계 시 데이터 샘플링 과정이 필수적이다. 그러나 기존의 연구들은 대부분 산사태 미발생부 선택 시 발생부 데이터와 1:1의 비율을 갖도록 무작위로 선택하는 방법을 적용하였을 뿐, 특정한 선택 기준에 따라 분석을 수행하지 않았다. 따라서 본 연구에서는 훈련 데이터의 샘플링 전략이 모델의 예측 성능에 미치는 결과를 확인하기 위하여 산사태 발생부와 미발생부의 샘플링 전략기준에 따라 서로 다른 6개의 시나리오를 만들어 Random Forest 모델의 훈련에 사용하였다. 또한 Random Forest의 결과 중 하나인 변수 중요도를 각 산사태 유발인자들에 가중치로 곱하여 줌으로써 산사태 취약지수 값을 산정하였으며, 취약지수 값을 이용해 산사태 취약성도를 제작하고 각 결과 지도의 정확도를 비교 분석하였다. 분석 결과, 훈련데이터의 샘플링 방법에 상관없이 두 지역의 산사태 취약성 분석 결과는 모두 70~80%의 정확도를 보였다. 이를 통해 Random Forest 기법의 산사태 취약성 분석기법으로서의 적용 가능성을 확인하였으며, Random Forest 모델이 제공하는 입력변수의 중요도를 산사태 유발인자 가중치로 활용할 수 있음을 확인하였다. 또한 훈련 시나리오 간의 정확도를 비교한 결과, 특정한 기준에 의해 훈련 데이터를 설계하는 것이 기존의 랜덤 선택 방법보다 높은 예측 정확도를 기대할 수 있음을 확인하였다.

동해안 너울성 파도 예측을 위한 머신러닝 모델 연구 (A Study of Machine Learning Model for Prediction of Swelling Waves Occurrence on East Sea)

  • 강동훈;오세종
    • 한국정보기술학회논문지
    • /
    • 제17권9호
    • /
    • pp.11-17
    • /
    • 2019
  • 최근 들어 동해안에서 너울성 파도에 의한 손실이 빈번히 발생하고 있다. 너울성 파도는 다양한 요인들이 결합되어 발생하기 때문에 예측이 어렵다. 본 연구에서는 머신러닝 기술에 기초하여 동해안에서 너울성 파도의 발생을 예측하는 모델을 제안하였다. 모델 개발을 위해 포항 신항의 하역중단 데이터 및 신항 부근의 기압, 풍속, 풍향, 수온 등의 기상자료를 수집하였다. 수집한 데이터로부터 너울발생에 중요한 영향을 미치는 변수들을 선별하였으며, 모델 개발을 위해 다양한 머신러닝 예측 알고리즘들을 테스트 하였다. 그 결과 조위, 수온, 기압이 너울 발생 예측을 위한 주요 변수로 확인이 되었고, Random Forest 모델이 가장 우수한 성능을 보였으며. 모델의 예측 정확도는 88.6%이다.

Machine Learning Model for Reduction Deformation of Plastic Motor Housing for Automobiles

  • Seong-Yeol Han
    • Design & Manufacturing
    • /
    • 제18권2호
    • /
    • pp.64-73
    • /
    • 2024
  • The purpose of this paper is to introduce a fusion method that combines the design of experiments (DOE) and machine learning to optimize the bias of plastic products. The study focuses on the plastic motor housing used in automobiles, which is manufactured through plastic injection molding. Achieving optimal molding for the motor housing involves the optimization of various molding conditions, including injection pressure, injection time, holding pressure, mold temperature, and cooling time. Failure to optimize these conditions can lead to increased product deformation. To minimize the deformation of the motor housing, the widely used Taguchi method, which is one of the design of experiment techniques, was employed to identify the injection molding conditions that affect deformation. Machine learning was then applied to various models based on the identified molding conditions. Among the models, the Random Forest model emerged as the most effective in predicting deformation amounts. The validity of the Random Forest model was also confirmed through verification. The verification results demonstrated the excellent prediction accuracy of the trained Random Forest model. By utilizing the validated model, molding conditions that minimize deformation were determined. Implementation of these optimal molding conditions led to a reduction of approximately 5.3% in deformation compared to the conditions before optimization. It is noteworthy that all injection molding outcomes presented in this paper were obtained through robust injection molding simulations, ensuring both research objectivity and speed.

GeoAI-Based Forest Fire Susceptibility Assessment with Integration of Forest and Soil Digital Map Data

  • Kounghoon Nam;Jong-Tae Kim;Chang-Ju Lee;Gyo-Cheol Jeong
    • 지질공학
    • /
    • 제34권1호
    • /
    • pp.107-115
    • /
    • 2024
  • This study assesses forest fire susceptibility in Gangwon-do, South Korea, which hosts the largest forested area in the nation and constitutes ~21% of the country's forested land. With 81% of its terrain forested, Gangwon-do is particularly susceptible to wildfires, as evidenced by the fact that seven out of the ten most extensive wildfires in Korea have occurred in this region, with significant ecological and economic implications. Here, we analyze 480 historical wildfire occurrences in Gangwon-do between 2003 and 2019 using 17 predictor variables of wildfire occurrence. We utilized three machine learning algorithms—random forest, logistic regression, and support vector machine—to construct wildfire susceptibility prediction models and identify the best-performing model for Gangwon-do. Forest and soil map data were integrated as important indicators of wildfire susceptibility and enhanced the precision of the three models in identifying areas at high risk of wildfires. Of the three models examined, the random forest model showed the best predictive performance, with an area-under-the-curve value of 0.936. The findings of this study, especially the maps generated by the models, are expected to offer important guidance to local governments in formulating effective management and conservation strategies. These strategies aim to ensure the sustainable preservation of forest resources and to enhance the well-being of communities situated in areas adjacent to forests. Furthermore, the outcomes of this study are anticipated to contribute to the safeguarding of forest resources and biodiversity and to the development of comprehensive plans for forest resource protection, biodiversity conservation, and environmental management.

Random Forest를 활용한 산사태 피해 영향인자 평가: 충주시 산사태를 중심으로 (Evaluation of the Importance of Variables When Using a Random Forest Technique to Assess Landslide Damage: Focusing on Chungju Landslides)

  • 이재호;정유진;최정해
    • 지질공학
    • /
    • 제34권1호
    • /
    • pp.51-65
    • /
    • 2024
  • 산사태는 전 세계적으로 매년 큰 재산 피해를 야기하는 자연 재해로 알려져 있다. 국내에서도 기후 변화의 영향으로 산사태 피해가 증가하는 경향을 보이고 있으며, 이로 인한 피해를 줄이기 위해서는 산사태를 증가시키는 인자들을 파악하는 것이 중요하다. 따라서 본 연구는 충청북도 충주시에서 발생한 산사태 피해에 영향을 미치는 변수들의 중요도를 평가하기 위해 랜덤포레스트 모델을 활용하여 14개의 인자들 사이의 중요도를 분석하였다. 연구 결과, 모델의 성능은 AUC가 0.87로 높은 정확도를 보이며, 변수 중요도는 경사 방향, 경사, 계곡까지의 직선 거리, 고도 순으로 정해졌으며, 이는 경사방향과 경사 등의 지형인자가 암종과 유효토심과 같은 지질과 토양인자보다 산사태 피해에 더 큰 영향을 미친다는 것을 시사한다. 이 연구 결과는 산사태 피해 예측지도의 제작 및 산사태 피해 감소에 초점을 맞춘 연구에 기초 자료로서 활용될 수 있을 것으로 기대된다.

An Improved Approach for 3D Hand Pose Estimation Based on a Single Depth Image and Haar Random Forest

  • Kim, Wonggi;Chun, Junchul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권8호
    • /
    • pp.3136-3150
    • /
    • 2015
  • A vision-based 3D tracking of articulated human hand is one of the major issues in the applications of human computer interactions and understanding the control of robot hand. This paper presents an improved approach for tracking and recovering the 3D position and orientation of a human hand using the Kinect sensor. The basic idea of the proposed method is to solve an optimization problem that minimizes the discrepancy in 3D shape between an actual hand observed by Kinect and a hypothesized 3D hand model. Since each of the 3D hand pose has 23 degrees of freedom, the hand articulation tracking needs computational excessive burden in minimizing the 3D shape discrepancy between an observed hand and a 3D hand model. For this, we first created a 3D hand model which represents the hand with 17 different parts. Secondly, Random Forest classifier was trained on the synthetic depth images generated by animating the developed 3D hand model, which was then used for Haar-like feature-based classification rather than performing per-pixel classification. Classification results were used for estimating the joint positions for the hand skeleton. Through the experiment, we were able to prove that the proposed method showed improvement rates in hand part recognition and a performance of 20-30 fps. The results confirmed its practical use in classifying hand area and successfully tracked and recovered the 3D hand pose in a real time fashion.

A Random Forest Model Based Pollution Severity Classification Scheme of High Voltage Transmission Line Insulators

  • Kannan, K.;Shivakumar, R.;Chandrasekar, S.
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권4호
    • /
    • pp.951-960
    • /
    • 2016
  • Tower insulators in electric power transmission network play a crucial role in preserving the reliability of the system. Electrical utilities frequently face the problem of flashover of insulators due to pollution deposition on their surface. Several research works based on leakage current (LC) measurement has been already carried out in developing diagnostic techniques for these insulators. Since the LC signal is highly intermittent in nature, estimation of pollution severity based on LC signal measurement over a short period of time will not produce accurate results. Reports on the measurement and analysis of LC signals over a long period of time is scanty. This paper attempts to use Random Forest (RF) classifier, which produces accurate results on large data bases, to analyze the pollution severity of high voltage tower insulators. Leakage current characteristics over a long period of time were measured in the laboratory on porcelain insulator. Pollution experiments were conducted at 11 kV AC voltage. Time domain analysis and wavelet transform technique were used to extract both basic features and histogram features of the LC signal. RF model was trained and tested with a variety of LC signals measured over a lengthy period of time and it is noticed that the proposed RF model based pollution severity classifier is efficient and will be helpful to electrical utilities for real time implementation.

Default Prediction of Automobile Credit Based on Support Vector Machine

  • Chen, Ying;Zhang, Ruirui
    • Journal of Information Processing Systems
    • /
    • 제17권1호
    • /
    • pp.75-88
    • /
    • 2021
  • Automobile credit business has developed rapidly in recent years, and corresponding default phenomena occur frequently. Credit default will bring great losses to automobile financial institutions. Therefore, the successful prediction of automobile credit default is of great significance. Firstly, the missing values are deleted, then the random forest is used for feature selection, and then the sample data are randomly grouped. Finally, six prediction models of support vector machine (SVM), random forest and k-nearest neighbor (KNN), logistic, decision tree, and artificial neural network (ANN) are constructed. The results show that these six machine learning models can be used to predict the default of automobile credit. Among these six models, the accuracy of decision tree is 0.79, which is the highest, but the comprehensive performance of SVM is the best. And random grouping can improve the efficiency of model operation to a certain extent, especially SVM.

건설 현장에서 발생한 업무상 재해가 근로손실일수 심각도에 미치는 특징 중요도 분석 (Analysis of the Feature Importance of Occupational Accidents Occurring at Construction Sites on the Severity of Lost Workdays)

  • 강경수;최재현;류한국
    • 한국건축시공학회지
    • /
    • 제21권2호
    • /
    • pp.165-174
    • /
    • 2021
  • 건설업은 전체 산업 분야 중에서 가장 많은 재해와 사망자를 발생시키는 산업 분야이다. 건설안전 재해를 줄이기 위한 큰 노력이 진행되어왔지만, 사망사고를 제외한 근로자의 업무복귀시간까지 회복되는 근로손실일수에 관한 연구는 매우 적은 편이다. 따라서 본 연구는 근로손실일수를 심각도로 정의하여 이를 분류하는 모형을 제안하고 학습된 모형을 통해 특징 중요도를 도출하고 중요한 특징을 분석하고자 하였다. 블랙박스 모형인 랜덤 포레스트의 학습 과정을 해석하고 추출된 특징 중요도를 통해 근로손실일수 심각도에 영향력을 행사하는 중요 변수를 추출하였다. 추출된 특징을 통해 내부에 존재하는 요인들을 분석하였다. 본 연구의 목적은 건설 현장에서 발생한 사고 사례 데이터를 랜덤 포레스트 모형을 통해 분석하고자 하였다. 근로손실일수의 심각도에 미치는 중요한 특징을 도출해 체계적으로 관리한다면 건설 재해를 예방할 수 있다.