• 제목/요약/키워드: Random Forest Regression

검색결과 271건 처리시간 0.027초

성별에 따른 대사증후군의 위험요인 탐색을 위한 융복합 연구 (Convergence study to detect metabolic syndrome risk factors by gender difference)

  • 이소은;이현실
    • 디지털융복합연구
    • /
    • 제19권12호
    • /
    • pp.477-486
    • /
    • 2021
  • 본 연구의 목적은 국민건강영양조사 2016-2019년 자료 중 성인을 대상으로 대사증후군의 위험요인 탐색하고, 성별에 따른 위험요인의 차이를 규명하여 대사증후군 예방 및 치료에 기초자료로 제공하기 위함이다. 다양한 선행연구를 통해 대사증후군 위험요인을 수집하고, 4개의 머신러닝(Logistic Regression, Decision Tree, Naïve Bayes, Random Forest)의 방법을 이용하여 분석하였다. 남성과 여성 모두에서 Random Forest의 대사증후군 예측 정확도가 높았다. 대사증후군 유병에 영향을 주는 상위 위험요인으로는 여성과 남성 모두에서 BMI, 식이(지방, 비타민 C, 비타민 A, 단백질, 에너지 섭취), 기저질환의 개수, 연령으로 나타났다. 여성의 경우 교육수준과 초경 연령, 폐경 여부가 추가적으로 주요 위험요인으로 나타났고, 남성에 비해 연령과 기저질환의 개수에서 영향력이 큰 것으로 나타났다. 대사증후군을 예방하기 위해선 BMI, 식이, 질환의 이환, 초경 및 폐경여부를 고려하여 접근해야하며 후속 연구를 통해 다양한 중재 전략을 수립하고 검증해야 할 것이다.

Predicting Gross Box Office Revenue for Domestic Films

  • Song, Jongwoo;Han, Suji
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.301-309
    • /
    • 2013
  • This paper predicts gross box office revenue for domestic films using the Korean film data from 2008-2011. We use three regression methods, Linear Regression, Random Forest and Gradient Boosting to predict the gross box office revenue. We only consider domestic films with a revenue size of at least KRW 500 million; relevant explanatory variables are chosen by data visualization and variable selection techniques. The key idea of analyzing this data is to construct the meaningful explanatory variables from the data sources available to the public. Some variables must be categorized to conduct more effective analysis and clustering methods are applied to achieve this task. We choose the best model based on performance in the test set and important explanatory variables are discussed.

도시가스 배관 위험 예측 모델 개발 (A development of the gas pipeline risk prediction models)

  • 박길주;김영찬;이창열;조영도;정원희
    • 한국재난정보학회:학술대회논문집
    • /
    • 한국재난정보학회 2017년 정기학술대회
    • /
    • pp.360-361
    • /
    • 2017
  • 도시가스 배관의 안전을 위해 다양한 시스템이 가동되고 있지만 대부분 현장점검에 의존하는 한계점을 가지고 있다. 본 연구에서는 국내 도시가스 공급업체들 중 하나인 중부도시가스사의 실시간 배관운영 데이터를 분석해 배관의 위험을 예측한다. 배관의 압력, 출력전압, 출력전류, 방식전위, 전위값 데이터와 기타 도시가스 관련요인 데이터를 통합해 상관분석을 진행한다. 그리고 특정 공급권역의 실시간 배관 압력 데이터를 분석해 압력 수치를 예측한다. Random forest regression과 support vector regression(SVR) 알고리즘을 사용해 모델을 구성한 결과 배관 데이터의 시계열 정보를 추가한 데이터 셋과 random forest regression을 사용한 모델에서 가장 우수한 예측 성능을 보인다.

  • PDF

Ensemble approach for improving prediction in kernel regression and classification

  • Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제23권4호
    • /
    • pp.355-362
    • /
    • 2016
  • Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.

Study on the ensemble methods with kernel ridge regression

  • Kim, Sun-Hwa;Cho, Dae-Hyeon;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권2호
    • /
    • pp.375-383
    • /
    • 2012
  • The purpose of the ensemble methods is to increase the accuracy of prediction through combining many classifiers. According to recent studies, it is proved that random forests and forward stagewise regression have good accuracies in classification problems. However they have great prediction error in separation boundary points because they used decision tree as a base learner. In this study, we use the kernel ridge regression instead of the decision trees in random forests and boosting. The usefulness of our proposed ensemble methods was shown by the simulation results of the prostate cancer and the Boston housing data.

Crop Yield and Crop Production Predictions using Machine Learning

  • Divya Goel;Payal Gulati
    • International Journal of Computer Science & Network Security
    • /
    • 제23권9호
    • /
    • pp.17-28
    • /
    • 2023
  • Today Agriculture segment is a significant supporter of Indian economy as it represents 18% of India's Gross Domestic Product (GDP) and it gives work to half of the nation's work power. Farming segment are required to satisfy the expanding need of food because of increasing populace. Therefore, to cater the ever-increasing needs of people of nation yield prediction is done at prior. The farmers are also benefited from yield prediction as it will assist the farmers to predict the yield of crop prior to cultivating. There are various parameters that affect the yield of crop like rainfall, temperature, fertilizers, ph level and other atmospheric conditions. Thus, considering these factors the yield of crop is thus hard to predict and becomes a challenging task. Thus, motivated this work as in this work dataset of different states producing different crops in different seasons is prepared; which was further pre-processed and there after machine learning techniques Gradient Boosting Regressor, Random Forest Regressor, Decision Tree Regressor, Ridge Regression, Polynomial Regression, Linear Regression are applied and their results are compared using python programming.

도시가스 배관압력 예측모델 (City Gas Pipeline Pressure Prediction Model)

  • 정원희;박길주;구영현;김성현;유성준;조영도
    • 한국전자거래학회지
    • /
    • 제23권2호
    • /
    • pp.33-47
    • /
    • 2018
  • 도시가스 배관은 지중에 매설되어 있기 때문에 세부 관리가 어렵고 다양한 위험에 노출되어 있다. 본 연구에서는 도시가스 배관압력 실시간 데이터를 분석해 배관압력 이상을 예측하고 전문가의 의사결정을 돕는 모델을 제안한다. 국내 도시가스 공급업체들 중 하나인 중부도시가스사의 정압기에서 수집하는 실시간 배관압력 데이터와 시간변수, 외부환경변수를 통합해 분석 데이터로 사용한다. 아산시와 천안시에 위치하는 11개 정압기를 분석 대상으로 하며 분 단위 배관압력 예측모델을 구현한다. Random forest, support vector regression(SVR), long-short term memory(LSTM) 알고리즘을 사용해 회귀모델을 구현한 결과 LSTM 모델에서 우수한 성능을 보인다. 아산시 배관압력 예측모델의 경우 LSTM 모델에서 RMSE가 0.011, MAPE가 0.494이며, 천안시 배관압력 예측모델의 경우 LSTM 모델에서 평균제곱근오차(root mean square error, RMSE)가 0.015, 절대평균백분율오차(mean absolute percentage error, MAPE)가 0.668로 가장 낮은 오류율을 보인다.

기계학습을 이용한 유동가속부식 모델링: 랜덤 포레스트와 비선형 회귀분석과의 비교 (Modeling of Flow-Accelerated Corrosion using Machine Learning: Comparison between Random Forest and Non-linear Regression)

  • 이경근;이은희;김성우;김경모;김동진
    • Corrosion Science and Technology
    • /
    • 제18권2호
    • /
    • pp.61-71
    • /
    • 2019
  • Flow-Accelerated Corrosion (FAC) is a phenomenon in which a protective coating on a metal surface is dissolved by a flow of fluid in a metal pipe, leading to continuous wall-thinning. Recently, many countries have developed computer codes to manage FAC in power plants, and the FAC prediction model in these computer codes plays an important role in predictive performance. Herein, the FAC prediction model was developed by applying a machine learning method and the conventional nonlinear regression method. The random forest, a widely used machine learning technique in predictive modeling led to easy calculation of FAC tendency for five input variables: flow rate, temperature, pH, Cr content, and dissolved oxygen concentration. However, the model showed significant errors in some input conditions, and it was difficult to obtain proper regression results without using additional data points. In contrast, nonlinear regression analysis predicted robust estimation even with relatively insufficient data by assuming an empirical equation and the model showed better predictive power when the interaction between DO and pH was considered. The comparative analysis of this study is believed to provide important insights for developing a more sophisticated FAC prediction model.

심층 신경망모형을 사용한 미세먼지 PM10의 예측 (Prediction of fine dust PM10 using a deep neural network model)

  • 전성현;손영숙
    • 응용통계연구
    • /
    • 제31권2호
    • /
    • pp.265-285
    • /
    • 2018
  • 본 연구에서는 미세먼지 $PM_{10}$의 4가지 분류 등급인 '좋음, 보통, 나쁨, 매우 나쁨' 그리고 2가지 분류 등급인 '좋음 혹은 보통, 나쁨 혹은 매우 나쁨'을 예측하기 위해서 심층 신경망모형을 사용하였다. 2010년부터 2015년까지 국내 6개 대도시 지역에서 관측한 일별 미세먼지 데이터에 대하여 기존 분류기법인 신경망모형, 다항 로지스틱 회귀모형, Support Vector Machine, Random Forest을 적용했을 때에 비해서 심층 신경망모형의 정확도는 더 높아졌다.

Prediction of Academic Performance of College Students with Bipolar Disorder using different Deep learning and Machine learning algorithms

  • Peerbasha, S.;Surputheen, M. Mohamed
    • International Journal of Computer Science & Network Security
    • /
    • 제21권7호
    • /
    • pp.350-358
    • /
    • 2021
  • In modern years, the performance of the students is analysed with lot of difficulties, which is a very important problem in all the academic institutions. The main idea of this paper is to analyze and evaluate the academic performance of the college students with bipolar disorder by applying data mining classification algorithms using Jupiter Notebook, python tool. This tool has been generally used as a decision-making tool in terms of academic performance of the students. The various classifiers could be logistic regression, random forest classifier gini, random forest classifier entropy, decision tree classifier, K-Neighbours classifier, Ada Boost classifier, Extra Tree Classifier, GaussianNB, BernoulliNB are used. The results of such classification model deals with 13 measures like Accuracy, Precision, Recall, F1 Measure, Sensitivity, Specificity, R Squared, Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, TPR, TNR, FPR and FNR. Therefore, conclusion could be reached that the Decision Tree Classifier is better than that of different algorithms.