• Title/Summary/Keyword: Partial Dependence Plot

Search Result 7, Processing Time 0.022 seconds

Explainable Machine Learning Based a Packed Red Blood Cell Transfusion Prediction and Evaluation for Major Internal Medical Condition

  • Lee, Seongbin;Lee, Seunghee;Chang, Duhyeuk;Song, Mi-Hwa;Kim, Jong-Yeup;Lee, Suehyun
    • Journal of Information Processing Systems
    • /
    • v.18 no.3
    • /
    • pp.302-310
    • /
    • 2022
  • Efficient use of limited blood products is becoming very important in terms of socioeconomic status and patient recovery. To predict the appropriateness of patient-specific transfusions for the intensive care unit (ICU) patients who require real-time monitoring, we evaluated a model to predict the possibility of transfusion dynamically by using the Medical Information Mart for Intensive Care III (MIMIC-III), an ICU admission record at Harvard Medical School. In this study, we developed an explainable machine learning to predict the possibility of red blood cell transfusion for major medical diseases in the ICU. Target disease groups that received packed red blood cell transfusions at high frequency were selected and 16,222 patients were finally extracted. The prediction model achieved an area under the ROC curve of 0.9070 and an F1-score of 0.8166 (LightGBM). To explain the performance of the machine learning model, feature importance analysis and a partial dependence plot were used. The results of our study can be used as basic data for recommendations related to the adequacy of blood transfusions and are expected to ultimately contribute to the recovery of patients and prevention of excessive consumption of blood products.

Temperature Dependence of PCBs in Urban Area of Seoul City (서울 대기 중 PCBs의 온도 의존성)

  • 여현구;최민규;천만영;김태욱;선우영
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.18 no.3
    • /
    • pp.193-204
    • /
    • 2002
  • To investigate the relationships between the atmospheric concentrations of PCBs and temperature, quantity of both parameters was performed at an urban site in Korea from July 1999 to January 2000. The strength of correlations between total PCB and temperature was found to be significant (r = 0.752, p < 0.001). It hence indicates that total PCB contents were affected sensitively by temperature change during the sampling period. The ratio of PCB homologs and Deca-CB(PCB 209) also behaved quite similarly to the change of temperature (r>0.60, p<0.05). This may be inferable with the progress of the gas/particle partitioning to the gas phase, especially for fri-and tetra-CBs. Because they have high vapor pressure, they generally exist in the gas phase. The Clausius-Clapeyron equation was applied to the atmospheric PCB data, relating PCB partial vapor pressure to inverse temperature. This may essentially represent the temperature-controlled transition between condensed phase and atmospheric gas phase. The slopes of the resulting plot with International Council for the Explanation of the Seas (ICES) congener ranged from -2810 to -5887, with significantly steep slope and $R^2$(p< 0.005) It was inferred that the PCB atmospheric concentration was also affected by change in the surrounding conditions such as soil, lakes and trees.

Effects of seed sources and shade on vigor of Brant's oak seedling

  • Taghvaei, Mansour
    • Journal of Ecology and Environment
    • /
    • v.33 no.4
    • /
    • pp.299-306
    • /
    • 2010
  • The use of local seed provenance is often recommended in forest restoration. Early vigor is a combination of the performance of seed germination and emergence after planting. The ability of young Brant's oak plants to grow and develop after emergence and its dependence on local habitat conditions was investigated in this study. The effects of seed source and shade on early growing seedlings of Brant's oak (Quercus brantii L.) were determined in field measurements. Seeds of Quercus brantii L. were collected from 4 forest areas (seed sources) in southern Zagros (Provinces of Kohkilouyeh-Bouyer Ahmad and Fars) at altitudes of 850, 1,100, 1,500, 2,100 m a.s.l., and planted in a nursery constructed in southwestern Iran. According to a split-plot design consisting of four blocks, each containing two main treatment plots (no shading, partial shading), each main plot was sub-divided into four sub-plots (for elevations of 850, 1,100, 1,500 and 2,100 m). Results showed that shade treatments had significant effects on emergence percentage and rate, shoot length, shoot dry weight (SDW), root dry weight (RDW), leaf area (LA), and chlorophyll content. Ecological factors also had an effect on seed performance. Altitude of seed source had a very significant effect on root length, LA, SDW, and RDW. The seeds collected from 850 m a.s.l. elevation showed the highest performance, especially in leaf area, root length, shoot dry weight, and root dry weight. Our results showed that the altitude of 850 m a.s.l. was the best for collecting Brant's oak seeds.

Form-finding of lifting self-forming GFRP elastic gridshells based on machine learning interpretability methods

  • Soheila, Kookalani;Sandy, Nyunn;Sheng, Xiang
    • Structural Engineering and Mechanics
    • /
    • v.84 no.5
    • /
    • pp.605-618
    • /
    • 2022
  • Glass fiber reinforced polymer (GFRP) elastic gridshells consist of long continuous GFRP tubes that form elastic deformations. In this paper, a method for the form-finding of gridshell structures is presented based on the interpretable machine learning (ML) approaches. A comparative study is conducted on several ML algorithms, including support vector regression (SVR), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), AdaBoost, XGBoost, category boosting (CatBoost), and light gradient boosting machine (LightGBM). A numerical example is presented using a standard double-hump gridshell considering two characteristics of deformation as objective functions. The combination of the grid search approach and k-fold cross-validation (CV) is implemented for fine-tuning the parameters of ML models. The results of the comparative study indicate that the LightGBM model presents the highest prediction accuracy. Finally, interpretable ML approaches, including Shapely additive explanations (SHAP), partial dependence plot (PDP), and accumulated local effects (ALE), are applied to explain the predictions of the ML model since it is essential to understand the effect of various values of input parameters on objective functions. As a result of interpretability approaches, an optimum gridshell structure is obtained and new opportunities are verified for form-finding investigation of GFRP elastic gridshells during lifting construction.

Predicting As Contamination Risk in Red River Delta using Machine Learning Algorithms

  • Ottong, Zheina J.;Puspasari, Reta L.;Yoon, Daeung;Kim, Kyoung-Woong
    • Economic and Environmental Geology
    • /
    • v.55 no.2
    • /
    • pp.127-135
    • /
    • 2022
  • Excessive presence of As level in groundwater is a major health problem worldwide. In the Red River Delta in Vietnam, several million residents possess a high risk of chronic As poisoning. The As releases into groundwater caused by natural process through microbially-driven reductive dissolution of Fe (III) oxides. It has been extracted by Red River residents using private tube wells for drinking and daily purposes because of their unawareness of the contamination. This long-term consumption of As-contaminated groundwater could lead to various health problems. Therefore, a predictive model would be useful to expose contamination risks of the wells in the Red River Delta Vietnam area. This study used four machine learning algorithms to predict the As probability of study sites in Red River Delta, Vietnam. The GBM was the best performing model with the accuracy, precision, sensitivity, and specificity of 98.7%, 100%, 95.2%, and 100%, respectively. In addition, it resulted the highest AUC of 92% and 96% for the PRC and ROC curves, with Eh and Fe as the most important variables. The partial dependence plot of As concentration on the model parameters showed that the probability of high level of As is related to the low number of wells' depth, Eh, and SO4, along with high PO43- and NH4+. This condition triggers the reductive dissolution of iron phases, thus releasing As into groundwater.

Determination of Survival of Gastric Cancer Patients With Distant Lymph Node Metastasis Using Prealbumin Level and Prothrombin Time: Contour Plots Based on Random Survival Forest Algorithm on High-Dimensionality Clinical and Laboratory Datasets

  • Zhang, Cheng;Xie, Minmin;Zhang, Yi;Zhang, Xiaopeng;Feng, Chong;Wu, Zhijun;Feng, Ying;Yang, Yahui;Xu, Hui;Ma, Tai
    • Journal of Gastric Cancer
    • /
    • v.22 no.2
    • /
    • pp.120-134
    • /
    • 2022
  • Purpose: This study aimed to identify prognostic factors for patients with distant lymph node-involved gastric cancer (GC) using a machine learning algorithm, a method that offers considerable advantages and new prospects for high-dimensional biomedical data exploration. Materials and Methods: This study employed 79 features of clinical pathology, laboratory tests, and therapeutic details from 289 GC patients whose distant lymphadenopathy was presented as the first episode of recurrence or metastasis. Outcomes were measured as any-cause death events and survival months after distant lymph node metastasis. A prediction model was built based on possible outcome predictors using a random survival forest algorithm and confirmed by 5×5 nested cross-validation. The effects of single variables were interpreted using partial dependence plots. A contour plot was used to visually represent survival prediction based on 2 predictive features. Results: The median survival time of patients with GC with distant nodal metastasis was 9.2 months. The optimal model incorporated the prealbumin level and the prothrombin time (PT), and yielded a prediction error of 0.353. The inclusion of other variables resulted in poorer model performance. Patients with higher serum prealbumin levels or shorter PTs had a significantly better prognosis. The predicted one-year survival rate was stratified and illustrated as a contour plot based on the combined effect the prealbumin level and the PT. Conclusions: Machine learning is useful for identifying the important determinants of cancer survival using high-dimensional datasets. The prealbumin level and the PT on distant lymph node metastasis are the 2 most crucial factors in predicting the subsequent survival time of advanced GC.

Who Gets Government SME R&D Subsidy? Application of Gradient Boosting Model (Gradient Boosting 모형을 이용한 중소기업 R&D 지원금 결정요인 분석)

  • Kang, Sung Won;Kang, HeeChan
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.4
    • /
    • pp.77-109
    • /
    • 2020
  • In this paper, we build a gradient Boosting model to predict government SME R&D subsidy, select features of high importance, and measure the impact of each features to the predicted subsidy using PDP and SHAP value. Unlike previous empirical researches, we focus on the effect of the R&D subsidy distribution pattern to the incentive of the firms participating subsidy competition. We used the firm data constructed by KISTEP linking government R&D subsidy record with financial statements provided by NICE, and applied a Gradient Boosting model to predict R&D subsidy. We found that firms with higher R&D performance and larger R&D investment tend to have higher R&D subsidies, but firms with higher operation profit or total asset turnover rate tend to have lower R&D subsidies. Our results suggest that current government R&D subsidy distribution pattern provides incentive to improve R&D project performance, but not business performance.