• Title/Summary/Keyword: Feature Importance Analysis

Search Result 135, Processing Time 0.026 seconds

An interpretable machine learning approach for forecasting personal heat strain considering the cumulative effect of heat exposure

  • Seo, Seungwon;Choi, Yujin;Koo, Choongwan
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.6
    • /
    • pp.81-90
    • /
    • 2023
  • Climate change has resulted in increased frequency and intensity of heat waves, which poses a significant threat to the health and safety of construction workers, particularly those engaged in labor-intensive and heat-stress vulnerable working environments. To address this challenge, this study aimed to propose an interpretable machine learning approach for forecasting personal heat strain by considering the cumulative effect of heat exposure as a situational variable, which has not been taken into account in the existing approach. As a result, the proposed model, which incorporated the cumulative working time along with environmental and personal variables, was found to have superior forecast performance and explanatory power. Specifically, the proposed Multi-Layer Perceptron (MLP) model achieved a Mean Absolute Error (MAE) of 0.034 (℃) and an R-squared of 99.3% (0.933). Feature importance analysis revealed that the cumulative working time, as a situational variable, had the most significant impact on personal heat strain. These findings highlight the importance of systematic management of personal heat strain at construction sites by comprehensively considering the cumulative working time as a situational variable as well as environmental and personal variables. This study provided a valuable contribution to the construction industry by offering a reliable and accurate heat strain forecasting model, enhancing the health and safety of construction workers.

Comparison of Feature Selection Methods Applied on Risk Prediction for Hypertension (고혈압 위험 예측에 적용된 특징 선택 방법의 비교)

  • Khongorzul, Dashdondov;Kim, Mi-Hye
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.107-114
    • /
    • 2022
  • In this paper, we have enhanced the risk prediction of hypertension using the feature selection method in the Korean National Health and Nutrition Examination Survey (KNHANES) database of the Korea Centers for Disease Control and Prevention. The study identified various risk factors correlated with chronic hypertension. The paper is divided into three parts. Initially, the data preprocessing step of removes missing values, and performed z-transformation. The following is the feature selection (FS) step that used a factor analysis (FA) based on the feature selection method in the dataset, and feature importance (FI) and multicollinearity analysis (MC) were compared based on FS. Finally, in the predictive analysis stage, it was applied to detect and predict the risk of hypertension. In this study, we compare the accuracy, f-score, area under the ROC curve (AUC), and mean standard error (MSE) for each model of classification. As a result of the test, the proposed MC-FA-RF model achieved the highest accuracy of 80.12%, MSE of 0.106, f-score of 83.49%, and AUC of 85.96%, respectively. These results demonstrate that the proposed MC-FA-RF method for hypertension risk predictions is outperformed other methods.

Explainable Machine Learning Based a Packed Red Blood Cell Transfusion Prediction and Evaluation for Major Internal Medical Condition

  • Lee, Seongbin;Lee, Seunghee;Chang, Duhyeuk;Song, Mi-Hwa;Kim, Jong-Yeup;Lee, Suehyun
    • Journal of Information Processing Systems
    • /
    • v.18 no.3
    • /
    • pp.302-310
    • /
    • 2022
  • Efficient use of limited blood products is becoming very important in terms of socioeconomic status and patient recovery. To predict the appropriateness of patient-specific transfusions for the intensive care unit (ICU) patients who require real-time monitoring, we evaluated a model to predict the possibility of transfusion dynamically by using the Medical Information Mart for Intensive Care III (MIMIC-III), an ICU admission record at Harvard Medical School. In this study, we developed an explainable machine learning to predict the possibility of red blood cell transfusion for major medical diseases in the ICU. Target disease groups that received packed red blood cell transfusions at high frequency were selected and 16,222 patients were finally extracted. The prediction model achieved an area under the ROC curve of 0.9070 and an F1-score of 0.8166 (LightGBM). To explain the performance of the machine learning model, feature importance analysis and a partial dependence plot were used. The results of our study can be used as basic data for recommendations related to the adequacy of blood transfusions and are expected to ultimately contribute to the recovery of patients and prevention of excessive consumption of blood products.

A Study on the Analysis of Factors for the Golden Glove Award by using Machine Learning (머신러닝을 이용한 골든글러브 수상 요인 분석에 대한 연구)

  • Uem, Daeyeob;Kim, Seongyong
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.48-56
    • /
    • 2022
  • The importance of data analysis in baseball has been increasing after the success of MLB's Oakland which applied Billy Beane's money ball theory, and the 2020 KBO winner NC Dinos. Various studies using data in baseball has been conducted not only in the United States but also in Korea, In particular, the models using deep learning and machine learning has been suggested. However, in the previous studies using deep learning and machine learning, the focus is only on predicting the win or loss of the game, and there is a limitation in that it is difficult to interpret the results of which factors have an important influence on the game. In this paper, to investigate which factors is important by position, the prediction model for the Golden Glove award which is given for the best player by position is developed. To develop the prediction model, XGBoost which is one of boosting method is used, which also provide the feature importance which can be used to interpret the factors for prediction results. From the analysis, the important factors by position are identified.

Predicting Forest Fires Using Machine Learning Considering Human Factors (인적요인을 고려한 머신러닝 활용 산림화재 예측)

  • Jin-Myeong Jang;Joo-Chan Kim;Hwa-Joong Kim;Kwang-Tae Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.5
    • /
    • pp.109-126
    • /
    • 2023
  • Early detection of forest fires is essential in preventing large-scale forest fires. Predicting forest fires serves as a vital early detection method, leading to various related studies. However, many previous studies focused solely on climate and geographic factors, overlooking human factors, which significantly contribute to forest fires. This study aims to develop forest fire prediction models that take into account human, weather and geographical factors. This study conducted a comparative analysis of four machine learning models alongside the logistic regression model, using forest fire data from Gangwon-do spanning 2003 to 2020. The results indicate that XG Boost models performed the best (AUC=0.925), closely followed by Random Forest (AUC=0.920), both of which are machine learning techniques. Lastly, the study analyzed the relative importance of various factors through permutation feature importance analysis to derive operational insights. While meteorological factors showed a greater impact compared to human factors, various human factors were also found to be significant.

The Trend Analysis of Outdoor Lighting Design in Residential Areas (주거건축물 경관조명 디자인의 트랜드 분석)

  • Park, Ji-Ae;Choi, An-Seop
    • Proceedings of the Korean Institute of IIIuminating and Electrical Installation Engineers Conference
    • /
    • 2006.05a
    • /
    • pp.16-19
    • /
    • 2006
  • Due to its increased importance, outdoor lighting has been installed even for residental area. For the most part, more outdoor lightings are being installed for those houses designed to accommodate people since 2000. On this respect, quantitative analysis is conducted after finding the cases of outdoor lighting installed for residental area until nu. The cases of 24 residental areas we investigated to find the specific feature of outdoor lightings. This type of design analysis will serve as an attempt to set the stage for future studies.

  • PDF

Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning (머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석)

  • Lee, Hyeong-Tak;Lee, Sang-Won;Cho, Jang-Won;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.139-148
    • /
    • 2020
  • The most important factor affecting the berthing energy generated when a ship berths is the berthing velocity. Thus, an accident may occur if the berthing velocity is extremely high. Several ship features influence the determination of the berthing velocity. However, previous studies have mostly focused on the size of the vessel. Therefore, the aim of this study is to analyze various features that influence berthing velocity and determine their respective importance. The data used in the analysis was based on the berthing velocity of a ship on a jetty in Korea. Using the collected data, machine learning classification algorithms were compared and analyzed, such as decision tree, random forest, logistic regression, and perceptron. As an algorithm evaluation method, indexes according to the confusion matrix were used. Consequently, perceptron demonstrated the best performance, and the feature importance was in the following order: DWT, jetty number, and state. Hence, when berthing a ship, the berthing velocity should be determined in consideration of various features, such as the size of the ship, position of the jetty, and loading condition of the cargo.

Study on predictive model and mechanism analysis for martensite transformation temperatures through explainable artificial intelligence (설명가능한 인공지능을 통한 마르텐사이트 변태 온도 예측 모델 및 거동 분석 연구)

  • Junhyub Jeon;Seung Bae Son;Jae-Gil Jung;Seok-Jae Lee
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.37 no.3
    • /
    • pp.103-113
    • /
    • 2024
  • Martensite volume fraction significantly affects the mechanical properties of alloy steels. Martensite start temperature (Ms), transformation temperature for martensite 50 vol.% (M50), and transformation temperature for martensite 90 vol.% (M90) are important transformation temperatures to control the martensite phase fraction. Several researchers proposed empirical equations and machine learning models to predict the Ms temperature. These numerical approaches can easily predict the Ms temperature without additional experiment and cost. However, to control martensite phase fraction more precisely, we need to reduce prediction error of the Ms model and propose prediction models for other martensite transformation temperatures (M50, M90). In the present study, machine learning model was applied to suggest the predictive model for the Ms, M50, M90 temperatures. To explain prediction mechanisms and suggest feature importance on martensite transformation temperature of machine learning models, the explainable artificial intelligence (XAI) is employed. Random forest regression (RFR) showed the best performance for predicting the Ms, M50, M90 temperatures using different machine learning models. The feature importance was proposed and the prediction mechanisms were discussed by XAI.

Enhancing prediction accuracy of concrete compressive strength using stacking ensemble machine learning

  • Yunpeng Zhao;Dimitrios Goulias;Setare Saremi
    • Computers and Concrete
    • /
    • v.32 no.3
    • /
    • pp.233-246
    • /
    • 2023
  • Accurate prediction of concrete compressive strength can minimize the need for extensive, time-consuming, and costly mixture optimization testing and analysis. This study attempts to enhance the prediction accuracy of compressive strength using stacking ensemble machine learning (ML) with feature engineering techniques. Seven alternative ML models of increasing complexity were implemented and compared, including linear regression, SVM, decision tree, multiple layer perceptron, random forest, Xgboost and Adaboost. To further improve the prediction accuracy, a ML pipeline was proposed in which the feature engineering technique was implemented, and a two-layer stacked model was developed. The k-fold cross-validation approach was employed to optimize model parameters and train the stacked model. The stacked model showed superior performance in predicting concrete compressive strength with a correlation of determination (R2) of 0.985. Feature (i.e., variable) importance was determined to demonstrate how useful the synthetic features are in prediction and provide better interpretability of the data and the model. The methodology in this study promotes a more thorough assessment of alternative ML algorithms and rather than focusing on any single ML model type for concrete compressive strength prediction.

Online Social Capital Analysis on the Yeungnam Local Presses : Website and Social Media (영남지역 언론사의 온라인 사회자본 분석 : 웹사이트와 소셜미디어를 중심으로)

  • Kim, Ji Young;Ha, Young Ji;Park, Han Woo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.4
    • /
    • pp.73-85
    • /
    • 2013
  • This study examines the online social capital of local press using the website and social media. Moreover, the paper respectively visualizes web feature as Web 1.0 and social feature analysis as Web 2.0 by applying correspondence analysis. For data, the study analyzes 10 representative local press in Yeungnam areas. To collect the data, two coders coded web features from the websites and we employed NodeXL, an open-source software tool, for social media data. The results reveal that local websites expend online social capital using social media account. Especially, the social features of local presses attach importance to Twitter as the main press keep the well-balance use among all platforms.