• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.028 seconds

A Study on Management Criteria of Seepage for Fill Dams Considering Rainfall Effect (강수를 고려한 필댐 침투수량의 관리기준에 관한 연구)

  • Lee, Jongeun;Yoon, Sukmin;Im, Eun-Sang;Kang, Gichun
    • Journal of the Korean Geotechnical Society
    • /
    • v.36 no.5
    • /
    • pp.5-16
    • /
    • 2020
  • The purpose of this study is to suggest the management criteria through the decision tree analysis for a seepage, which is an important instrumentation type of the fill dam. In the case of the seepage of the dam in Korea, seepage can be increased rapidly because rainfall directly flow into the downstream slope and abutment of dam during rainfalls. Therefore, it is necessary the management criteria for the seepage of the fill dam in consideration of rainfall. In this study, decision tree analysis was performed for a fill dam in Korea by setting the seepage as the response variable and the rainfall and water level of dam as explanatory variables. As the study results, the water level acted as an explanatory variable from the conditions under daily rainfall of 34.75 mm/day, and the branch conditions of the water level were analyzed to be 37.4 m and 35.23 m. 98% of the rainfall data is distributed under the conditions of the daily rainfall of 34.75 mm/day, and coverage of the seepage is indicated from 13.25 L/min to 24.24 L/min. When the rainfall and water level as the influence factors for the seepage were selected, the influence of the rainfall was dominant. Finally, the seepage of fill dam by considering the rainfall and water level was suggested as a management criteria.

Development and Validation of 18F-FDG PET/CT-Based Multivariable Clinical Prediction Models for the Identification of Malignancy-Associated Hemophagocytic Lymphohistiocytosis

  • Xu Yang;Xia Lu;Jun Liu;Ying Kan;Wei Wang;Shuxin Zhang;Lei Liu;Jixia Li;Jigang Yang
    • Korean Journal of Radiology
    • /
    • v.23 no.4
    • /
    • pp.466-478
    • /
    • 2022
  • Objective: 18F-fluorodeoxyglucose (FDG) PET/CT is often used for detecting malignancy in patients with newly diagnosed hemophagocytic lymphohistiocytosis (HLH), with acceptable sensitivity but relatively low specificity. The aim of this study was to improve the diagnostic ability of 18F-FDG PET/CT in identifying malignancy in patients with HLH by combining 18F-FDG PET/CT and clinical parameters. Materials and Methods: Ninety-seven patients (age ≥ 14 years) with secondary HLH were retrospectively reviewed and divided into the derivation (n = 71) and validation (n = 26) cohorts according to admission time. In the derivation cohort, 22 patients had malignancy-associated HLH (M-HLH) and 49 patients had non-malignancy-associated HLH (NM-HLH). Data on pretreatment 18F-FDG PET/CT and laboratory results were collected. The variables were analyzed using the Mann-Whitney U test or Pearson's chi-square test, and a nomogram for predicting M-HLH was constructed using multivariable binary logistic regression. The predictors were also ranked using decision-tree analysis. The nomogram and decision tree were validated in the validation cohort (10 patients with M-HLH and 16 patients with NM-HLH). Results: The ratio of the maximal standardized uptake value (SUVmax) of the lymph nodes to that of the mediastinum, the ratio of the SUVmax of bone lesions or bone marrow to that of the mediastinum, and age were selected for constructing the model. The nomogram showed good performance in predicting M-HLH in the validation cohort, with an area under the receiver operating characteristic curve of 0.875 (95% confidence interval, 0.686-0.971). At an appropriate cutoff value, the sensitivity and specificity for identifying M-HLH were 90% (9/10) and 68.8% (11/16), respectively. The decision tree integrating the same variables showed 70% (7/10) sensitivity and 93.8% (15/16) specificity for identifying M-HLH. In comparison, visual analysis of 18F-FDG PET/CT images demonstrated 100% (10/10) sensitivity and 12.5% (2/16) specificity. Conclusion: 18F-FDG PET/CT may be a practical technique for identifying M-HLH. The model constructed using 18F-FDG PET/CT features and age was able to detect malignancy with better accuracy than visual analysis of 18F-FDG PET/CT images.

Exploring Sport Consumption Style of Generation Z that the 4th Industrial revolution paid attention to: Applying Decision Tree Analysis based on Data Mining (4차 산업혁명이 주목한 Z세대의 스포츠 소비 스타일 탐색: 데이터마이닝 기반 의사결정 나무 분석 적용)

  • Shin, Jin-Ho;Lim, Young-Sam;Kim, Ji-Sun
    • Journal of the Korean Applied Science and Technology
    • /
    • v.37 no.5
    • /
    • pp.1208-1221
    • /
    • 2020
  • The purpose of this study was to provide basic data for predicting the sports consumption market that Generation Z will lead by applying data mining based decision tree analysis to explore Generation Z sports consumption style. Therefore, the survey was conducted by selecting males and females aged 19 or older as a sample among Generation Z, and data of 429 people were used for the final analysis. For data processing, frequency analysis, exploratory factor analysis, retest and reliability analysis, and decision tree analysis were performed using the SPSS statistics (ver. 21.0) program. The main results of this study are as follows. First, if the rational efficiency index is high and the aesthetic consumption index is low, the probability of being classified as a group of female was 96.8%. On the other hand, if the rational efficiency and perception of price index were low, the probability of being classified as a male group was 100%. Second, if the brand orientation, perception of price, and rational efficiency index were high, the probability of being classified as a capital area group was 97.3%. Contrary to the results presented above, the probability of being classified as a other area group was 82.1% when the brand orientation, commemoration rites, and status symbol index were low. Third, the status symbol and trend oriented index were high, and if the functionality index was low, the probability of being classified into daily life and fashion groups was 77.6%. On the contrary, if the status symbol index is low, the retention of membership and enjoy consumption index is high, the probability of being classified into exercise and competition groups was 81.0%.

Program Plagiarism Detection based on X-treeDiff+ (X-treeDiff+ 기반의 프로그램 복제 탐지)

  • Lee, Suk-Kyoon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.44-53
    • /
    • 2010
  • Program plagiarism is a significant factor to reduce the quality of education in computer programming. In this paper, we propose the technique of identifying similar or identical programs in order to prevent students from reckless copying their programming assignments. Existing approaches for identifying similar programs are mainly based on fingerprints or pattern matching for text documents. Different from those existing approaches, we propose an approach based on the program structur. Using paring progrmas, we first transform programs into XML documents by representing syntactic components in the programs with elements in XML document, then run X-tree Diff+, which is the change detection algorithm for XML documents, and produce an edit script as a change. The decision of similar or identical programs is made on the analysis of edit scripts in terms of program plagiarism. Analysis of edit scripts allows users to understand the process of conversion between two programs so that users can make qualitative judgement considering the characteristics of program assignment and the degree of plagiarism.

Study on Development of Classification Model and Implementation for Diagnosis System of Sasang Constitution (사상체질 분류모형 개발 및 진단시스템의 구현에 관한 연구)

  • Beum, Soo-Gyun;Jeon, Mi-Ran;Oh, Am-Suk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.08a
    • /
    • pp.155-159
    • /
    • 2008
  • In this thesis, in order to develop a new classification model of Sasang Constitutional medical types, which is helpful for improving the accuracy of diagnosis of medical types. various data-mining classification models such as discriminant analysis. decision trees analysis, neural networks analysis, logistics regression analysis, clustering analysis which are main classification methods were applied to the questionnaires of medical type classification. In this manner, a model which scientifically classifies constitutional medical types in the field of Sasang Constitutional Medicine, one of a traditional Korean medicine, has been developed. Also, the above-mentioned analysis models were systematically compared and analyzed. In this study, a classification of Sasang constitutional medical types was developed based on the discriminate analysis model and decision trees analysis model of which accuracy is relatively high, of which analysis procedure is easy to understand and to explain and which are easy to implement. Also, a diagnosis system of Sasang constitution was implemented applying the two analysis models.

  • PDF

Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test (의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용)

  • Yun, Tae-Gyun;Yi, Gwan-Su
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

A Study on the Stereotype of ICT SMEs' R&D: Empirical Evidence from Korea (ICT 중소기업 R&D의 스테레오타입에 대한 연구 : 한국의 사례를 중심으로)

  • Jun, Seung-pyo;Choi, San;Jung, JaeOong
    • Journal of Korea Technology Innovation Society
    • /
    • v.20 no.2
    • /
    • pp.334-367
    • /
    • 2017
  • The ICT industry has been the main driver of Korea's economy with international competitiveness and is expected to be the growth engine that will revitalize the currently depressed economy. A broad range of different perspectives and opinions on the industry exist in Korea and overseas. Some of these are stereotypes, not all of which are based on objective evidence. Stereotypes refer to widely-held fixed opinions on a specific group and do not necessarily have negative connotations. However, they should not be viewed lightly because they can substantially affect decision-making process. In this regard, this study sought to review the stereotypes of ICT industry and identify objective and relative stereotypes. In the study, a decision-tree analysis was conducted on a survey result of 3,300 small and medium-sized enterprises (SMEs) in order to identify Korean ICT companies' characteristics that distinguish them from other technology companies. The decision-tree analysis, a data mining process based on machine learning, took a total of 291 variables into account in 10 subjects such as: corporate business in general, technology development activities as well as organization and people in technology development. Identifying the variables that distinguish ICT companies from other technology companies with the decision-tree analysis, the study then came up with a list of objective stereotypes of ICT companies. The findings from the stereotypes of Korean ICT companies are as follows. First, the companies are in need of technology policies that help R&D planning and market penetration. Second, policies must better support the companies working to sell new products or explore new business. Third, the companies need policies that support secure protection of development outcomes and proper management of IP rights. Fourth, the administrative procedures related to governmental support for ICT companies' R&D projects must be simplified. It is hoped that the outcome of this study will provide meaningful guidance in establishment, implementation and evaluation of technology policies for ICT SMEs, particularly to policymakers or researchers in relevant government agencies who determine R&D policies for ICT SMEs.

Length of stay in PACU among surgical patients using data mining technique (데이터 마이닝을 활용한 외과수술환자의 회복실 체류시간 분석)

  • Yoo, Je-Bog;Jang, Hee Jung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3400-3411
    • /
    • 2013
  • The data mining is a new approach to extract useful information through effective analysis of huge data in numerous fields. This study was analyzed by decision making tree model using Clementine C&RT(Classification & Regression Tree, CART) as data mining technique. We utilized this data mining technique to analyze medical record of 1,500 people. Whole data were assorted by length of stay in PACU and divided into 3 groups. The result extracted by C5.0 decision tree method showed that important related factors for lengh of stay in PACU are type of operation, preoperative EKG abnormality, anesthetics, operative duration, age.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.

A study on 3-step complex data mining in society indicator survey (사회지표조사에서의 3단계 복합 데이터마이닝의 적용 방안)

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.5
    • /
    • pp.983-992
    • /
    • 2012
  • Social indicator survey can identify the state of society as a whole. When we create a policy, social indicator survey can reflect the public opinion of the region. Social indicator survey is an important measure of social change. Social indicator survey has been conducted in many municipalities (Seoul, Incheon, Busan, Ulsan, Gyeongsangnamdo, etc.). But, the result of social indicator survey analysis is mainly the basic statistical analysis. In this study, we propose a new data mining methodology for effective analysis. We propose a 3-step complex data mining in society indicator survey. 3-step complex data mining uses three data mining method (intervening association rule, clustering, decision tree).