• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.023 seconds

Development and Validation of 18F-FDG PET/CT-Based Multivariable Clinical Prediction Models for the Identification of Malignancy-Associated Hemophagocytic Lymphohistiocytosis

  • Xu Yang;Xia Lu;Jun Liu;Ying Kan;Wei Wang;Shuxin Zhang;Lei Liu;Jixia Li;Jigang Yang
    • Korean Journal of Radiology
    • /
    • v.23 no.4
    • /
    • pp.466-478
    • /
    • 2022
  • Objective: 18F-fluorodeoxyglucose (FDG) PET/CT is often used for detecting malignancy in patients with newly diagnosed hemophagocytic lymphohistiocytosis (HLH), with acceptable sensitivity but relatively low specificity. The aim of this study was to improve the diagnostic ability of 18F-FDG PET/CT in identifying malignancy in patients with HLH by combining 18F-FDG PET/CT and clinical parameters. Materials and Methods: Ninety-seven patients (age ≥ 14 years) with secondary HLH were retrospectively reviewed and divided into the derivation (n = 71) and validation (n = 26) cohorts according to admission time. In the derivation cohort, 22 patients had malignancy-associated HLH (M-HLH) and 49 patients had non-malignancy-associated HLH (NM-HLH). Data on pretreatment 18F-FDG PET/CT and laboratory results were collected. The variables were analyzed using the Mann-Whitney U test or Pearson's chi-square test, and a nomogram for predicting M-HLH was constructed using multivariable binary logistic regression. The predictors were also ranked using decision-tree analysis. The nomogram and decision tree were validated in the validation cohort (10 patients with M-HLH and 16 patients with NM-HLH). Results: The ratio of the maximal standardized uptake value (SUVmax) of the lymph nodes to that of the mediastinum, the ratio of the SUVmax of bone lesions or bone marrow to that of the mediastinum, and age were selected for constructing the model. The nomogram showed good performance in predicting M-HLH in the validation cohort, with an area under the receiver operating characteristic curve of 0.875 (95% confidence interval, 0.686-0.971). At an appropriate cutoff value, the sensitivity and specificity for identifying M-HLH were 90% (9/10) and 68.8% (11/16), respectively. The decision tree integrating the same variables showed 70% (7/10) sensitivity and 93.8% (15/16) specificity for identifying M-HLH. In comparison, visual analysis of 18F-FDG PET/CT images demonstrated 100% (10/10) sensitivity and 12.5% (2/16) specificity. Conclusion: 18F-FDG PET/CT may be a practical technique for identifying M-HLH. The model constructed using 18F-FDG PET/CT features and age was able to detect malignancy with better accuracy than visual analysis of 18F-FDG PET/CT images.

Managing the Reverse Extrapolation Model of Radar Threats Based Upon an Incremental Machine Learning Technique (점진적 기계학습 기반의 레이더 위협체 역추정 모델 생성 및 갱신)

  • Kim, Chulpyo;Noh, Sanguk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.4
    • /
    • pp.29-39
    • /
    • 2017
  • Various electronic warfare situations drive the need to develop an integrated electronic warfare simulator that can perform electronic warfare modeling and simulation on radar threats. In this paper, we analyze the components of a simulation system to reversely model the radar threats that emit electromagnetic signals based on the parameters of the electronic information, and propose a method to gradually maintain the reverse extrapolation model of RF threats. In the experiment, we will evaluate the effectiveness of the incremental model update and also assess the integration method of reverse extrapolation models. The individual model of RF threats are constructed by using decision tree, naive Bayesian classifier, artificial neural network, and clustering algorithms through Euclidean distance and cosine similarity measurement, respectively. Experimental results show that the accuracy of reverse extrapolation models improves, while the size of the threat sample increases. In addition, we use voting, weighted voting, and the Dempster-Shafer algorithm to integrate the results of the five different models of RF threats. As a result, the final decision of reverse extrapolation through the Dempster-Shafer algorithm shows the best performance in its accuracy.

Development of Prediction Model for Nitrogen Oxides Emission Using Artificial Intelligence (인공지능 기반 질소산화물 배출량 예측을 위한 연구모형 개발)

  • Jo, Ha-Nui;Park, Jisu;Yun, Yongju
    • Korean Chemical Engineering Research
    • /
    • v.58 no.4
    • /
    • pp.588-595
    • /
    • 2020
  • Prediction and control of nitrogen oxides (NOx) emission is of great interest in industry due to stricter environmental regulations. Herein, we propose an artificial intelligence (AI)-based framework for prediction of NOx emission. The framework includes pre-processing of data for training of neural networks and evaluation of the AI-based models. In this work, Long-Short-Term Memory (LSTM), one of the recurrent neural networks, was adopted to reflect the time series characteristics of NOx emissions. A decision tree was used to determine a time window of LSTM prior to training of the network. The neural network was trained with operational data from a heating furnace. The optimal model was obtained by optimizing hyper-parameters. The LSTM model provided a reliable prediction of NOx emission for both training and test data, showing an accuracy of 93% or more. The application of the proposed AI-based framework will provide new opportunities for predicting the emission of various air pollutants with time series characteristics.

Spatial Upscaling of Aboveground Biomass Estimation using National Forest Inventory Data and Forest Type Map (국가산림자원조사 자료와 임상도를 이용한 지상부 바이오매스의 공간규모 확장)

  • Kim, Eun-Sook;Kim, Kyoung-Min;Lee, Jung-Bin;Lee, Seung-Ho;Kim, Chong-Chan
    • Journal of Korean Society of Forest Science
    • /
    • v.100 no.3
    • /
    • pp.455-465
    • /
    • 2011
  • In order to assess and mitigate climate change, the role of forest biomass as carbon sink has to be understood spatially and quantitatively. Since existing forest statistics can not provide spatial information about forest resources, it is needed to predict spatial distribution of forest biomass under an alternative scheme. This study focuses on developing an upscaling method that expands forest variables from plot to landscape scale to estimate spatially explicit aboveground biomass(AGB). For this, forest stand variables were extracted from National Forest Inventory(NFI) data and used to develop AGB regression models by tree species. Dominant/codominant height and crown density were used as explanatory variables of AGB regression models. Spatial distribution of AGB could be estimated using AGB models, forest type map and the stand height map that was developed by forest type map and height regression models. Finally, it was estimated that total amount of forest AGB in Danyang was 6,606,324 ton. This estimate was within standard error of AGB statistics calculated by sample-based estimator, which was 6,518,178 ton. This AGB upscaling method can provide the means that can easily estimate biomass in large area. But because forest type map used as base map was produced using categorical data, this method has limits to improve a precision of AGB map.

Prediction Model for unfavorable Outcome in Spontaneous Intracerebral Hemorrhage Based on Machine Learning

  • Shengli Li;Jianan Zhang;Xiaoqun Hou;Yongyi Wang;Tong Li;Zhiming Xu;Feng Chen;Yong Zhou;Weimin Wang;Mingxing Liu
    • Journal of Korean Neurosurgical Society
    • /
    • v.67 no.1
    • /
    • pp.94-102
    • /
    • 2024
  • Objective : The spontaneous intracerebral hemorrhage (ICH) remains a significant cause of mortality and morbidity throughout the world. The purpose of this retrospective study is to develop multiple models for predicting ICH outcomes using machine learning (ML). Methods : Between January 2014 and October 2021, we included ICH patients identified by computed tomography or magnetic resonance imaging and treated with surgery. At the 6-month check-up, outcomes were assessed using the modified Rankin Scale. In this study, four ML models, including Support Vector Machine (SVM), Decision Tree C5.0, Artificial Neural Network, Logistic Regression were used to build ICH prediction models. In order to evaluate the reliability and the ML models, we calculated the area under the receiver operating characteristic curve (AUC), specificity, sensitivity, accuracy, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR). Results : We identified 71 patients who had favorable outcomes and 156 who had unfavorable outcomes. The results showed that the SVM model achieved the best comprehensive prediction efficiency. For the SVM model, the AUC, accuracy, specificity, sensitivity, PLR, NLR, and DOR were 0.91, 0.92, 0.92, 0.93, 11.63, 0.076, and 153.03, respectively. For the SVM model, we found the importance value of time to operating room (TOR) was higher significantly than other variables. Conclusion : The analysis of clinical reliability showed that the SVM model achieved the best comprehensive prediction efficiency and the importance value of TOR was higher significantly than other variables.

IDS Model using Improved Bayesian Network to improve the Intrusion Detection Rate (베이지안 네트워크 개선을 통한 탐지율 향상의 IDS 모델)

  • Choi, Bomin;Lee, Jungsik;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.495-503
    • /
    • 2014
  • In recent days, a study of the intrusion detection system collecting and analyzing network data, packet or logs, has been actively performed to response the network threats in computer security fields. In particular, Bayesian network has advantage of the inference functionality which can infer with only some of provided data, so studies of the intrusion system based on Bayesian network have been conducted in the prior. However, there were some limitations to calculate high detection performance because it didn't consider the problems as like complexity of the relation among network packets or continuos input data processing. Therefore, in this paper we proposed two methodologies based on K-menas clustering to improve detection rate by reforming the problems of prior models. At first, it can be improved by sophisticatedly setting interval range of nodes based on K-means clustering. And for the second, it can be improved by calculating robust CPT through applying weighted-leaning based on K-means clustering, too. We conducted the experiments to prove performance of our proposed methodologies by comparing K_WTAN_EM applied to proposed two methodologies with prior models. As the results of experiment, the detection rate of proposed model is higher about 7.78% than existing NBN(Naive Bayesian Network) IDS model, and is higher about 5.24% than TAN(Tree Augmented Bayesian Network) IDS mode and then we could prove excellence our proposing ideas.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

Study on Quantification Method Based on Monte Carlo Sampling for Multiunit Probabilistic Safety Assessment Models

  • Oh, Kyemin;Han, Sang Hoon;Park, Jin Hee;Lim, Ho-Gon;Yang, Joon Eon;Heo, Gyunyoung
    • Nuclear Engineering and Technology
    • /
    • v.49 no.4
    • /
    • pp.710-720
    • /
    • 2017
  • In Korea, many nuclear power plants operate at a single site based on geographical characteristics, but the population density near the sites is higher than that in other countries. Thus, multiunit accidents are a more important consideration than in other countries and should be addressed appropriately. Currently, there are many issues related to a multiunit probabilistic safety assessment (PSA). One of them is the quantification of a multiunit PSA model. A traditional PSA uses a Boolean manipulation of the fault tree in terms of the minimal cut set. However, such methods have some limitations when rare event approximations cannot be used effectively or a very small truncation limit should be applied to identify accident sequence combinations for a multiunit site. In particular, it is well known that seismic risk in terms of core damage frequency can be overestimated because there are many events that have a high failure probability. In this study, we propose a quantification method based on a Monte Carlo approach for a multiunit PSA model. This method can consider all possible accident sequence combinations in a multiunit site and calculate a more exact value for events that have a high failure probability. An example model for six identical units at a site was also developed and quantified to confirm the applicability of the proposed method.

Method of Analyzing Important Variables using Machine Learning-based Golf Putting Direction Prediction Model (머신러닝 기반 골프 퍼팅 방향 예측 모델을 활용한 중요 변수 분석 방법론)

  • Kim, Yeon Ho;Cho, Seung Hyun;Jung, Hae Ryun;Lee, Ki Kwang
    • Korean Journal of Applied Biomechanics
    • /
    • v.32 no.1
    • /
    • pp.1-8
    • /
    • 2022
  • Objective: This study proposes a methodology to analyze important variables that have a significant impact on the putting direction prediction using a machine learning-based putting direction prediction model trained with IMU sensor data. Method: Putting data were collected using an IMU sensor measuring 12 variables from 6 adult males in their 20s at K University who had no golf experience. The data was preprocessed so that it could be applied to machine learning, and a model was built using five machine learning algorithms. Finally, by comparing the performance of the built models, the model with the highest performance was selected as the proposed model, and then 12 variables of the IMU sensor were applied one by one to analyze important variables affecting the learning performance. Results: As a result of comparing the performance of five machine learning algorithms (K-NN, Naive Bayes, Decision Tree, Random Forest, and Light GBM), the prediction accuracy of the Light GBM-based prediction model was higher than that of other algorithms. Using the Light GBM algorithm, which had excellent performance, an experiment was performed to rank the importance of variables that affect the direction prediction of the model. Conclusion: Among the five machine learning algorithms, the algorithm that best predicts the putting direction was the Light GBM algorithm. When the model predicted the putting direction, the variable that had the greatest influence was the left-right inclination (Roll).