• Title/Summary/Keyword: 랜덤 포레스트 모형

Search Result 100, Processing Time 0.028 seconds

Prediction of electricity consumption in A hotel using ensemble learning with temperature (앙상블 학습과 온도 변수를 이용한 A 호텔의 전력소모량 예측)

  • Kim, Jaehwi;Kim, Jaehee
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.319-330
    • /
    • 2019
  • Forecasting the electricity consumption through analyzing the past electricity consumption a advantageous for energy planing and policy. Machine learning is widely used as a method to predict electricity consumption. Among them, ensemble learning is a method to avoid the overfitting of models and reduce variance to improve prediction accuracy. However, ensemble learning applied to daily data shows the disadvantages of predicting a center value without showing a peak due to the characteristics of ensemble learning. In this study, we overcome the shortcomings of ensemble learning by considering the temperature trend. We compare nine models and propose a model using random forest with the linear trend of temperature.

Predicting and Reviewing the Amount of Snow Damage in Korea using Statistical and Machine Learning Techniques (통계기법 및 기계학습 기법을 이용한 우리나라 대설피해액 예측 및 적용성 검토)

  • Lee, Hyeong Joo;Lee, Keun Woo;Jang, Hyeon Bin;Chung, Gun Hui
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.384-384
    • /
    • 2022
  • 과거의 우리나라 대설피해 양상을 살펴보면 지역적으로 집중되어 피해가 발생하는 것이 특징이다. 그러나 현재는 전국적으로 대설피해가 가중되는 추세이며, 이에 따라 대설피해에 대비 가능한 대책의 강구가 필요한 실정이다. 그러나 피해 발생 시 정확한 피해 예측으로 사전에 재난을 대비가 가능한 수준의 연구는 미흡한 실정이다. 따라서 본 연구에서는 다양한 통계기법과 기계학습 기법을 이용하여 대설로 인해 발생한 피해액을 개략적으로 예측이 가능한 모형을 개발하고자 하였다. 대설피해액 예측 모형은 다중회귀분석, 서포트 벡터 머신, 인공신경망 기법, 랜덤포레스트 기법을 이용하여 총 4가지 기법으로 개발하였으며, 독립변수로 사회·경제적 요소, 기상요소를 사용하였고, 종속변수로는 1994년부터 2020년까지 발생한 대설피해 이력의 대설피해액을 사용하였다. 결과적으로 4가지 예측 모형의 예측력 검증 및 기법 간의 예측력을 비교하여 개발한 모형의 적용성을 검토하였다. 본 연구 결과에서 제시한 모형의 개선방안 및 업데이트 방안을 참고하여 후속 연구가 진행된다면 미래에 전국적으로 확대될 대설피해에 대한 대비가 가능할 것으로 기대되며 복구비 및 예방비 투자의 지역적 우선순위를 분석하여 선제적인 대비가 가능할 것으로 판단된다.

  • PDF

Real-time flood prediction applying random forest regression model in urban areas (랜덤포레스트 회귀모형을 적용한 도시지역에서의 실시간 침수 예측)

  • Kim, Hyun Il;Lee, Yeon Su;Kim, Byunghyun
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1119-1130
    • /
    • 2021
  • Urban flooding caused by localized heavy rainfall with unstable climate is constantly occurring, but a system that can predict spatial flood information with weather forecast has not been prepared yet. The worst flood situation in urban area can be occurred with difficulties of structural measures such as river levees, discharge capacity of urban sewage, storage basin of storm water, and pump facilities. However, identifying in advance the spatial flood information can have a decisive effect on minimizing flood damage. Therefore, this study presents a methodology that can predict the urban flood map in real-time by using rainfall data of the Korea Meteorological Administration (KMA), the results of two-dimensional flood analysis and random forest (RF) regression model. The Ujeong district in Ulsan metropolitan city, which the flood is frequently occurred, was selected for the study area. The RF regression model predicted the flood map corresponding to the 50 mm, 80 mm, and 110 mm rainfall events with 6-hours duration. And, the predicted results showed 63%, 80%, and 67% goodness of fit compared to the results of two-dimensional flood analysis model. It is judged that the suggested results of this study can be utilized as basic data for evacuation and response to urban flooding that occurs suddenly.

A study on entertainment TV show ratings and the number of episodes prediction (국내 예능 시청률과 회차 예측 및 영향요인 분석)

  • Kim, Milim;Lim, Soyeon;Jang, Chohee;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.809-825
    • /
    • 2017
  • The number of TV entertainment shows is increasing. Competition among programs in the entertainment market is intensifying since cable channels air many entertainment TV shows. There is now a need for research on program ratings and the number of episodes. This study presents predictive models for entertainment TV show ratings and number of episodes. We use various data mining techniques such as linear regression, logistic regression, LASSO, random forests, gradient boosting, and support vector machine. The analysis results show that the average program ratings before the first broadcast is affected by broadcasting company, average ratings of the previous season, starting year and number of articles. The average program ratings after the first broadcast is influenced by the rating of the first broadcast, broadcasting company and program type. We also found that the predicted average ratings, starting year, type and broadcasting company are important variables in predicting of the number of episodes.

Comparative study of prediction models for corporate bond rating (국내 회사채 신용 등급 예측 모형의 비교 연구)

  • Park, Hyeongkwon;Kang, Junyoung;Heo, Sungwook;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.367-382
    • /
    • 2018
  • Prediction models for a corporate bond rating in existing studies have been developed using various models such as linear regression, ordered logit, and random forest. Financial characteristics help build prediction models that are expected to be contained in the assigning model of the bond rating agencies. However, the ranges of bond ratings in existing studies vary from 5 to 20 and the prediction models were developed with samples in which the target companies and the observation periods are different. Thus, a simple comparison of the prediction accuracies in each study cannot determine the best prediction model. In order to conduct a fair comparison, this study has collected corporate bond ratings and financial characteristics from 2013 to 2017 and applied prediction models to them. In addition, we applied the elastic-net penalty for the linear regression, the ordered logit, and the ordered probit. Our comparison shows that data-driven variable selection using the elastic-net improves prediction accuracy in each corresponding model, and that the random forest is the most appropriate model in terms of prediction accuracy, which obtains 69.6% accuracy of the exact rating prediction on average from the 5-fold cross validation.

Novel two-stage hybrid paradigm combining data pre-processing approaches to predict biochemical oxygen demand concentration (생물화학적 산소요구량 농도예측을 위하여 데이터 전처리 접근법을 결합한 새로운 이단계 하이브리드 패러다임)

  • Kim, Sungwon;Seo, Youngmin;Zakhrouf, Mousaab;Malik, Anurag
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1037-1051
    • /
    • 2021
  • Biochemical oxygen demand (BOD) concentration, one of important water quality indicators, is treated as the measuring item for the ecological chapter in lakes and rivers. This investigation employed novel two-stage hybrid paradigm (i.e., wavelet-based gated recurrent unit, wavelet-based generalized regression neural networks, and wavelet-based random forests) to predict BOD concentration in the Dosan and Hwangji stations, South Korea. These models were assessed with the corresponding independent models (i.e., gated recurrent unit, generalized regression neural networks, and random forests). Diverse water quality and quantity indicators were implemented for developing independent and two-stage hybrid models based on several input combinations (i.e., Divisions 1-5). The addressed models were evaluated using three statistical indices including the root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and correlation coefficient (CC). It can be found from results that the two-stage hybrid models cannot always enhance the predictive precision of independent models confidently. Results showed that the DWT-RF5 (RMSE = 0.108 mg/L) model provided more accurate prediction of BOD concentration compared to other optimal models in Dosan station, and the DWT-GRNN4 (RMSE = 0.132 mg/L) model was the best for predicting BOD concentration in Hwangji station, South Korea.

Basic Study on Safety Accident Prediction Model Using Random Forest in Construction Field (랜덤 포레스트 기법을 이용한 건설현장 안전재해 예측 모형 기초 연구)

  • Kang, Kyung-Su;Ryu, Han-Guk
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2018.11a
    • /
    • pp.59-60
    • /
    • 2018
  • The purpose of this study is to predict and classify the accident types based on the KOSHA (Korea Occupational Safety & Health Agency) and weather data. We also have an effort to suggest an important management method according to accident types by deriving feature importance. We designed two models based on accident data and weather data (model(a)) and only weather data (model(b)). As a result of random forest method, the model(b) showed a lack of accuracy in prediction. However, the model(a) presented more accurate prediction results than the model(b). Thus we presented safety management plan based on the results. In the future, this study will continue to carry out real time prediction to occurrence types to prevent safety accidents by supplementing the real time accident data and weather data.

  • PDF

Analysis of Factors Related To Elderly Pedestrian Traffic Accients : Centered on Seoul Metropolitan City (노인보행자교통사고 요인 분석 : 서울특별시 중심으로)

  • Seong, Je Min;Yoon, Byoung-Jo
    • Proceedings of the Korean Society of Disaster Information Conference
    • /
    • 2023.11a
    • /
    • pp.261-262
    • /
    • 2023
  • 보행자 교통사고는 보행자와 운행 중인 차량 간 발생한 충돌사고로 도로 및 주변 환경 등에 영항을 받는다. 이 연구에서는 2018년부터 2022년까지 서울특별시에서 발생한 노인 보행자 교통사고 자료를 수집하여 보행자 교통사고의 사고 요인을 분석하였다. 분석에 있어서 고려된 연구모형은 랜덤포레스트, Gradient Boosting regression(GBR)이다. 분석 결과 서울특별시의 지리적 특성과 교통 통행 패턴을 반영하여 교통약자를 대상으로 하는 교통정책을 보완하고, 보행 안전을 강화하는 것이 필요하다.

  • PDF

Development of prediction model identifying high-risk older persons in need of long-term care (장기요양 필요 발생의 고위험 대상자 발굴을 위한 예측모형 개발)

  • Song, Mi Kyung;Park, Yeongwoo;Han, Eun-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.457-468
    • /
    • 2022
  • In aged society, it is important to prevent older people from being disability needing long-term care. The purpose of this study is to develop a prediction model to discover high-risk groups who are likely to be beneficiaries of Long-Term Care Insurance. This study is a retrospective study using database of National Health Insurance Service (NHIS) collected in the past of the study subjects. The study subjects are 7,724,101, the population over 65 years of age registered for medical insurance. To develop the prediction model, we used logistic regression, decision tree, random forest, and multi-layer perceptron neural network. Finally, random forest was selected as the prediction model based on the performances of models obtained through internal and external validation. Random forest could predict about 90% of the older people in need of long-term care using DB without any information from the assessment of eligibility for long-term care. The findings might be useful in evidencebased health management for prevention services and can contribute to preemptively discovering those who need preventive services in older people.

Analysis of Horse Races: Prediction of Winning Horses in Horse Races Using Statistical Models (서울 경마 경기 우승마 예측 모형 연구)

  • Choe, Hyemin;Hwang, Nayoung;Hwang, Chankyoung;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1133-1146
    • /
    • 2015
  • The Horse race industry has the largest proportion of the domestic legal gambling industry. However, there is limited statistical analysis on horse races versus other sports. We propose prediction models for winning horses in horse races using data mining techniques such as logistic regression, linear regression, and random forest. Horse races data are from the Korea Racing Authority and we use horse racing reports, information of racehorses, jockeys, and horse trainers. We consider two models based on ranks and time records. The analysis results show that prediction of ranks is affected by information on racehorses, number of wins of racehorses and jockeys. We place wagers for the last month of races based on our prediction models that produce serious profits.