• Title/Summary/Keyword: Decision Tree

Search Result 1,664, Processing Time 0.029 seconds

Comparison of Hospital Standardized Mortality Ratio Using National Hospital Discharge Injury Data (퇴원손상심층조사 자료를 이용한 의료기관 중증도 보정 사망비 비교)

  • Park, Jong-Ho;Kim, Yoo-Mi;Kim, Sung-Soo;Kim, Won-Joong;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.4
    • /
    • pp.1739-1750
    • /
    • 2012
  • This study was to develop the assessment of medical service outcome using administration data through compared with hospital standardized mortality ratios(HSMR) in various hospitals. This study analyzed 63,664 cases of Hospital Discharge Injury Data of 2007 and 2008, provided by Korea Centers for Disease Control and Prevention. We used data mining technique and compared decision tree and logistic regression for developing risk-adjustment model of in-hospital mortality. Our Analysis shows that gender, length of stay, Elixhauser comorbidity index, hospitalization path, and primary diagnosis are main variables which influence mortality ratio. By comparing hospital standardized mortality ratios(HSMR) with standardized variables, we found concrete differences (55.6-201.6) of hospital standardized mortality ratios(HSMR) among hospitals. This proves that there are quality-gaps of medical service among hospitals. This study outcome should be utilized more to achieve the improvement of the quality of medical service.

A Study on the Prediction Model of the Elderly Depression

  • SEO, Beom-Seok;SUH, Eung-Kyo;KIM, Tae-Hyeong
    • The Journal of Industrial Distribution & Business
    • /
    • v.11 no.7
    • /
    • pp.29-40
    • /
    • 2020
  • Purpose: In modern society, many urban problems are occurring, such as aging, hollowing out old city centers and polarization within cities. In this study, we intend to apply big data and machine learning methodologies to predict depression symptoms in the elderly population early on, thus contributing to solving the problem of elderly depression. Research design, data and methodology: Machine learning techniques used random forest and analyzed the correlation between CES-D10 and other variables, which are widely used worldwide, to estimate important variables. Dependent variables were set up as two variables that distinguish normal/depression from moderate/severe depression, and a total of 106 independent variables were included, including subjective health conditions, cognitive abilities, and daily life quality surveys, as well as the objective characteristics of the elderly as well as the subjective health, health, employment, household background, income, consumption, assets, subjective expectations, and quality of life surveys. Results: Studies have shown that satisfaction with residential areas and quality of life and cognitive ability scores have important effects in classifying elderly depression, satisfaction with living quality and economic conditions, and number of outpatient care in living areas and clinics have been important variables. In addition, the results of a random forest performance evaluation, the accuracy of classification model that classify whether elderly depression or not was 86.3%, the sensitivity 79.5%, and the specificity 93.3%. And the accuracy of classification model the degree of elderly depression was 86.1%, sensitivity 93.9% and specificity 74.7%. Conclusions: In this study, the important variables of the estimated predictive model were identified using the random forest technique and the study was conducted with a focus on the predictive performance itself. Although there are limitations in research, such as the lack of clear criteria for the classification of depression levels and the failure to reflect variables other than KLoSA data, it is expected that if additional variables are secured in the future and high-performance predictive models are estimated and utilized through various machine learning techniques, it will be able to consider ways to improve the quality of life of senior citizens through early detection of depression and thus help them make public policy decisions.

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Matching Algorithms using the Union and Division (결합과 분배를 이용한 정합 알고리즘)

  • 박종민;조범준
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.5
    • /
    • pp.1102-1107
    • /
    • 2004
  • Fingerprint Recognition System is made up of Off-line treatment and On-line treatment; the one is registering all the information of there trieving features which are retrieved in the digitalized fingerprint getting out of the analog fingerprint through the fingerprint acquisition device and the other is the treatment making the decision whether the users are approved to be accessed to the system or not with matching them with the fingerprint features which are retrieved and database from the input fingerprint when the users are approaching the system to use. In matching between On-line and Off-line treatment, the most important thing is which features we are going to use as the standard. Therefore, we have been using “Delta” and “Core” as this standard until now, but there might have been some deficits not to exist in every person when we set them up as the standards. In order to handle the users who do not have those features, we are still using the matching method which enables us to make up of the spanning tree or the triangulation with the relations of the spanned feature. However, there are some overheads of the time on these methods and it is not sure whether they make the correct matching or not. Therefore, I would like to represent the more correct matching algorism in this paper which has not only better matching rate but also lower mismatching rate compared to the present matching algorism by selecting the line segment connecting two minutiae on the same ridge and furrow structures as the reference point.

Determinants of employee's wage using hierarchical linear model (위계적 선형모형을 이용한 대졸 신규취업자 임금 결정요인 분석)

  • Park, Sungik;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.65-75
    • /
    • 2015
  • This paper analyzes the determinants of wage for the college and university graduates utilizing both individual-level and industry-level variables. We note that wage determination has multi-level structure in the sense that individual wage is influenced by individual-level variables (level-1) and industry-level (level-2) variables. Then, the assumption that individual wage is independent in the classical regression is violated. Therefore, this paper utilizes the hierarchical linear model (HLM). The major results are the followings. First, the multiple correspondence analysis including level-1 and 2 variables reveals that both level 1 and level 2 variables affects individual wages judging from the fact that the values of level 1 and level 2 variables differ across the different level of individual wage groups. Second, the decision tree analysis including level-1 and 2 variables shows that the most influential variable in wage determination is industry-level wage and the next is industry-level working hour, ages and sex in the decling order in. This suggests that the utilization of the HLM is appropriate since the characteristics of industry is important in determining the individual wage. Third, it is shown that the HLM model is the best compared to the other models which do not take level-1 and level-2 variables simultaneously into account.

A comparative analysis of the related body compositions by riding-horse breed in Korea (국내 승용마의 체형상관에 따른 품종별 비교 분석)

  • Oh, Woon-Yong;Do, Kyoung-Tag;Cho, Byung-Wook;Park, Kyung-Do;Kim, Sung-Hoon;Lee, Hak-Kyo;Shin, Young-Soo;Cho, Young-Seuk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.515-521
    • /
    • 2011
  • There are increasing demands for the producing and breeding new domestic riding horses for the vitalizations of horse riding industry in Korea, according as 'Horse Industry Support Act' became. In this study, we were to develop the functional relation through the conformation comparison & body composition analysis. 76 heads of 5 breeds utilized for riding horses in Korea were used and their body measurements on 12 items were measured and cluster analysis was conducted to determine the correlation relation among them. The measurements were standardized that (height, croup height, pelvis length), and (hip width, width of pelvis) were highly correlated. In these results of the decision tree, we confirmed to classify the breed type determination by their body measurements (hip height, hip width, head length, croup height). This result can be used as basic data for the development of horse type determination (racing, riding, Riding for the Disabled, Working, or fattening) through the analysis of body composition, and be utilized as the basic data for the producing and breeding new domestic riding horses through the 3D Stereosocpic image system analyze.

Taxonomy of Performance Shaping Factors for Human Error Analysis of Railway Accidents (철도사고의 인적오류 분석을 위한 수행도 영향인자 분류)

  • Baek, Dong-Hyun;Koo, Lock-Jo;Lee, Kyung-Sun;Kim, Dong-San;Shin, Min-Ju;Yoon, Wan-Chul;Jung, Myung-Chul
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.31 no.1
    • /
    • pp.41-48
    • /
    • 2008
  • Enhanced machine reliability has dramatically reduced the rate and number of railway accidents but for further reduction human error should be considered together that accounts for about 20% of the accidents. Therefore, the objective of this study was to suggest a new taxonomy of performance shaping factors (PSFs) that could be utilized to identify the causes of a human error associated with railway accidents. Four categories of human factor, task factor, environment factor, and organization factor and 14 sub-categories of physical state, psychological state, knowledge/experience/ability, information/communication, regulation/procedure, specific character of task, infrastructure, device/MMI, working environment, external environment, education, direction/management, system/atmosphere, and welfare/opportunity along with 131 specific factors was suggested by carefully reviewing 8 representative published taxonomy of Casualty Analysis Methodology for Maritime Operations (CASMET), Cognitive Reliability and Error Analysis Method (CREAM), Human Factors Analysis and Classification System (HFACS), Integrated Safety Investigation Methodology (ISIM), Korea-Human Performance Enhancement System (K-HPES), Rail safety and Standards Board (RSSB), $TapRoot^{(R)}$, and Technique for Retrospective and Predictive Analysis of Cognitive Errors (TRACEr). Then these were applied to the case of the railway accident occurred between Komo and Kyungsan stations in 2003 for verification. Both cause decision chart and why-because tree were developed and modified to aid the analyst to find causal factors from the suggested taxonomy. The taxonomy was well suited so that eight causes were found to explain the driver's error in the accident. The taxonomy of PSFs suggested in this study could cover from latent factors to direct causes of human errors related with railway accidents with systematic categorization.

Designing of the Statistical Models for Imprinting Patterns of Quantitative Traits Loci (QTL) in Swine (돼지에 있어서 양적 형질 유전자좌(QTL) 발현 특성 분석을 위한 통계적 검정 모형 설정)

  • Yoon D. H.;Kong H. S.;Cho Y. M.;Lee J. W.;Choi I. S.;Lee H. K.;Jeon G. J.;Oh S. J.;Cheong I. C.
    • Journal of Embryo Transfer
    • /
    • v.19 no.3
    • /
    • pp.291-299
    • /
    • 2004
  • Characterization of quantitative trait loci (QTL) was investigated in the experimental cross population between Berkshire and Yorkshire breed. A total of 512 F$_2$ offspring from 65 matting of F$_1$ parents were phenotyped the carcass traits included average daily gain (ADG), average backfat thickness (ABF), tenth rip backfat thickness (TRF), loin eye area (LEA), and last rip backfat thickness (LRF). All animals were genotyped for 125 markers across the genome. Marker linkage maps were derived and used in QTL analysis based on line cross least squares regression interval mapping. A decision tree to identify QTL with imprinting effects was developed based on tests against the Mendelian mode of QTL expression. To set the evidence of QTL presence, empirical significance thresholds were derived at chromosome-wise and genome-wise levels using specialized permutation strategies. Significance thresholds derived by the permutation test were validated in the data set based on simulation of a pedigree and data structure similar to the Berkshire-Yorkshire population. Genome scan revealed significant evidences for 13 imprinted QTLs affecting growth and body compositions of which nine were identified to be QTL with paternally expressed inheritance mode. Four of QTLs in the loin eye area (LEA), and tenth rip backfat thickness (TRF), a maternally expressed QTL were found on chromosome 10 and 12. These results support the useful statistical models to analyse the imprinting far the QTLs related carcass trait.

A Cause Analysis of Learning Environment Variables of Change in Science Attitudes on Elementary and Secondary School Students (초.중.고 학생들의 과학 태도 변화에 대한 학습환경의 원인 분석)

  • Kwon, Chi-Soon;Hur, Myung;Yang, Il-Ho;Kim, Young-Shin
    • Journal of The Korean Association For Science Education
    • /
    • v.24 no.6
    • /
    • pp.1256-1271
    • /
    • 2004
  • The importance of science attitudes is more increasing in science education. Science attitudes may influence students' attainment, consistency and quality of classwork as well as their later views of science education and scientific occupations. According to the international comparative researches and longitudinal studies on Korean students' science attitudes, it has shown that the more grade, the less science attitude. This research was survey the science attitudes and learning environment variables, and then make a inquiry that causes of decline of science attitudes. To study this purpose, the participating students in this study will be selected from 3th to 11th grade. 6,925 participants were administered 3 times in questionnaires of science attitudes and learning environment variables during a year. The result of this study showed that science attitude got low after June. Science attitude was changed from 4th grade to 8th grade students. Science attitude much more decrease second semester than first semester, high school students' science attitude fell much. It was experience about science that cause the biggest effect in science attitude and other learning environment variables influence in science attitude change. Learning environment variables made different influence from students of increased and declined science attitude. As category that influence in science attitude, in elementary school were gender, area and grade, in middle school were grade and area, and in high school was area.

Taper Equations and Stem Volume Table of Eucalyptus pellita and Acacia mangium Plantations in Indonesia (인도네시아 유칼립투스 및 아카시아 조림지의 수간곡선식 및 수간재적표 조제)

  • Son, Yeong Mo;Kim, Hoon;Lee, Ho Young;Kim, Cheol Min;Kim, Cheol Sang;Kim, Jae Weon;Joo, Rin Won;Lee, Kyeong Hak
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.6
    • /
    • pp.633-638
    • /
    • 2009
  • This study was conducted to develop stem taper equations and stem volume tables for Eucalyptus pellita and Acacia mangium plantations in Kalimantan, Indonesia. To derive a most adequate taper equation for the plantations, three models - Max & Burkhart, Kozak, and Lee models - were applied and their fitness were statistically analyzed by using fitness index, bias, and standard error of bias. The result showed that there is no significant difference between the three models, but the fitness index was slightly higher in the Kozak model. Therefore, the Kozak model was chosen for generating stem taper equations and stem volume tables for the Eucalyptus pellita and Acacia mangium plantations. The resulted stem volume table was compared to the local volume table used in Kalimantan regions, but no significant difference was found in the stem volume estimation. It is expected that the results of this study would provide a good information about the tree growth in abroad plantations and support a reliable decision-making for their management.