• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.034 seconds

Forecasting Export & Import Container Cargoes using a Decision Tree Analysis (의사결정나무분석을 이용한 컨테이너 수출입 물동량 예측)

  • Son, Yongjung;Kim, Hyunduk
    • Journal of Korea Port Economic Association
    • /
    • v.28 no.4
    • /
    • pp.193-207
    • /
    • 2012
  • The of purpose of this study is to predict export and import container volumes using a Decision Tree analysis. Factors which can influence the volume of container cargo are selected as independent variables; producer price index, consumer price index, index of export volume, index of import volume, index of industrial production, and exchange rate(won/dollar). The period of analysis is from january 2002 to December 2011 and monthly data are used. In this study, CRT(Classification and Regression Trees) algorithm is used. The main findings are summarized as followings. First, when index of export volume is larger than 152.35, monthly export volume is predicted with 858,19TEU. However, when index of export volume is between 115.90 and 152.35, monthly export volume is predicted with 716,582TEU. Second, when index of import volume is larger than 134.60, monthly import volume is predicted with 869,227TEU. However, when index of export volume is between 116.20 and 134.60, monthly import volume is predicted with 738,724TEU.

Convergence analysis of determinants affecting on geographic variations in the prevalence of arthritis in Korean women using data mining (데이터마이닝을 이용한 여성 관절염 유병률 소지역 간 변이의 융복합 요인분석)

  • Kim, Yoo-Mi;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.5
    • /
    • pp.277-288
    • /
    • 2015
  • This study aims to identify determinants affecting on geographic variations in the prevalence of arthritis in Korean women using data mining. Data from Korean Community Health Survey 2012 with 249 small districts were analyzed. Socio-demographic, health behavior and status, and morbidity status measures were analyzed using conventional regression model and convergence analysis method such as decision tree for convergence analysis. Rate of workers in agriculture, forestry, and fishing, salaried workers, persons higher than high school graduates, non-treatment of needing care, non-treatment of care because of economic reason, obesity, heavy drunkers, complaining persons of chewing difficulty, persons with experiencing depression, persons with perceiving stress, and persons with diagnosing hypertension and angina pectoris were variation determinants of prevalence of arthritis in 249 small districts and these districts were classified 10 area groups by decision tree model. Our finding suggest that the approach based characteristics by small area groups rather than national wide or individual level would be effective to reduce in variations of prevalence of arthritis.

A Decision Tree Analysis-based Exploratory Study on the Effects of Using Smart Devices on the Expansion of Social Relationship (의사결정나무 분석을 활용한 스마트 기기의 사용이 사회관계 확대에 미치는 영향에 관한 탐색적 연구)

  • Son, Woong-Bee;Jang, Jae-Min
    • Informatization Policy
    • /
    • v.26 no.1
    • /
    • pp.62-82
    • /
    • 2019
  • This study attempts to make an empirical analysis on how mobile devices affect users in building their social relationship and if their influences are negative or positive. The purpose of this research is to explain the results by considering all the possibilities and exploring everyday lives of using mobile devices. We used the survey data from the "Research on Mobile Environment Awareness" conducted by Gyeonggi Research Institute(GRI). The main question was about the use of mobile devices and social network services (SNS) and users' opinions on using the devices. All of the 31 municipalities in Gyeonggi Province were included as a spatial range, and the final validity sample was 1,004 residents. The extent of the relationship with people is selected as a dependent variable through the multinomial logistic model and the decision tree model. As a result of the multinomial logistic analysis on the questionnaire, the characteristics of the respondents with some changes in the scope of the human relationship were found to have a significant (+) effect on conversation with family, SNS usage, residence in the rural area but not urban area, and device usage for obtaining news. The largest variable affecting the extent of relationship was the SNS usage. As the amount of SNS usage increases, the extent of the relationship also changes a lot.

A Study on Quality Control Using Data Mining in Steel Continuous Casting Process (철강 연주공정에서 데이터마이닝을 이용한 품질제어 방법에 관한 연구)

  • Kim, Jae-Kyeong;Kwon, Taeck-Sung;Choi, Il-Young;Kim, Hyea-Kyeong;Kim, Min-Yong
    • Journal of Information Technology Services
    • /
    • v.10 no.3
    • /
    • pp.113-126
    • /
    • 2011
  • The smelting and the continuous casting of steel are important processes that determine the quality of steel products. Especially most of quality defects occur during solidification of the steel continuous casting process. Although quality control techniques such as six sigma, SQC, and TQM can be applied to the continuous casting process for improving quality of steel products, these techniques don't provide real-time analysis to identify the causes of defect occurrence. To solve problems, we have developed a detection model using decision tree which identified abnormal transactions to have a coarse grain structure. And we have compared the proposed model with models using neural network and logistic regression. Experiments on steel data showed that the performance of the proposed model was higher than those of neural network model and logistic regression model. Thus, we expect that the suggested model will be helpful to control the quality of steel products in real-time in the continuous casting process.

A Study on Walking Analysis and Disease Prediction with Decision Tree (의사결정나무를 통한 걸음걸이 분석 및 질병 예측에 관한 연구)

  • Kim, Young-Jae;Yoo, Kwan-Hee;Nasridinov, Aziz
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.822-825
    • /
    • 2017
  • 본 연구는 키넥트를 통해 사람의 걸음걸이를 측정하고 의사결정트리(Decision Tree)를 통해 분석함으로써 현재의 걸음걸이를 통해 측정자의 허리 또는 무릎에서 발생할 가능성이 높은 문제 또는 질병들을 예측하고 해당결과를 측정자에게 알린다. 본 연구를 진행하며 첫 번째 단계에서는 관련 논문이나 병원 자료 결과들을 통해 판별할 속성들을 정하였다. 두 번째 단계에서는 키넥트를 통해 측정한 실제 데이터를 적용하기에 앞서 첫 번째 단계에서 정한 속성들이 측정자의 문제 또는 질병들을 판단해내는 연관 정도가 높은지 테스트 데이터를 이용하였고 의사결정나무를 통해 분석하였다. 그 결과 7개의 속성 중 6개로 약 85.7%정도의 연관이 있었다. 마지막 세 번째 단계에서는 판별식을 세우고 실제 데이터들을 쌓아나가며 69명의 측정한 데이터를 분석한 결과 6개의 속성 중 5개의 속성이 허리와 연관정도가 높았고 이는 두 번째 단계에서 나왔던 결과인 약85.7%에 가까운 약83%의 결과가 도출되었다. 이를 기반으로 시스템을 개발해 나가며 판별 정확도를 향상시키기 위해 계속 측정해 데이터를 쌓아가고 관련된 식들의 문제점을 보완하며 또한 어떤 환경에서 키넥트의 측정값의 정확도가 올라가는지 연구할 예정이다.

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.

The Effectiveness of CRM Approach in Improving the Profitability of Korea Professional Baseball Industry Measured by Entropy of ID3 Decision Tree Algorithm

  • Oh, Se-Kyung;Gwak, Chung-Lee;Lee, Mi-Young
    • Journal of Information Technology Applications and Management
    • /
    • v.18 no.3
    • /
    • pp.91-110
    • /
    • 2011
  • Korea professional baseball industry has grown to take the lion's share of the domestic sports industry, but still does not make break even. The purpose of this study is to examine the financial impact of adopting the Customer Relation Management (CRM) approach on the profitability of Korea professional baseball industry. We use a measuring tool called entropy used in ID3 decision tree algorithm. In the paper, we specify five the most important factors that affect spectator satisfaction based on the previous literature, perform survey analysis, calculate entropy values, and find the results. We predicted the change in revenues when we adopt CRM by checking the spectators' willingness to pay more when the conditions of each factor are improved. We find that we can reap significant fruits of the effect of CRM introduction through enhancing 'game content factor' and 'game promotion factor' among the five factors. We also find that we can increase the revenues of domestic professional baseball teams to 2.4 times or 2.1 times the current level if we manage intensively those two factors respectively. It is very surprising to see that the improvement in total revenues makes both ends meet for domestic professional baseball teams. This clearly demonstrates the effectiveness of CRM approach in improving the profitability of organizations.

The diffusion and policy options of the diagnostic imaging technologies in Korea (의사결정나무 분석을 사용한 고가의료장비의 다빈도 사용 특성 분석)

  • Choi, Yoon Jung;Kwak, Minjung;Yoon, Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.179-185
    • /
    • 2015
  • The cost of advanced medical technologies is commonly considered to be a major factor in the overall escalation of expenditures on health. The use of computed tomography (CT) scanning has increased dramatically over the past decade. CT has been rapidly adopted, despite their high cost. The aim of this study is to analysis the increasing factor of the frequency of the CT, using the decision tree model. Finally, we propose the effective policy option of diagnostic imaging technology in Korea.

A study for improving data mining methods for continuous response variables (연속형 반응변수를 위한 데이터마이닝 방법 성능 향상 연구)

  • Choi, Jin-Soo;Lee, Seok-Hyung;Cho, Hyung-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.917-926
    • /
    • 2010
  • It is known that bagging and boosting techniques improve the performance in classification problem. A number of researchers have proved the high performance of bagging and boosting through experiments for categorical response but not for continuous response. We study whether bagging and boosting improve data mining methods for continuous responses such as linear regression, decision tree, neural network through bagging and boosting. The analysis of eight real data sets prove the high performance of bagging and boosting empirically.

Evaluation on Performance for Classification of Students Leaving Their Majors Using Data Mining Technique (데이터마이닝 기법을 이용한 전공이탈자 분류를 위한 성능평가)

  • Leem, Young-Moon;Ryu, Chang-Hyun
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2006.11a
    • /
    • pp.293-297
    • /
    • 2006
  • Recently most universities are suffering from students leaving their majors. In order to make a countermeasure for reducing major separation rate, many universities are trying to find a proper solution. As a similar endeavor, this paper uses decision tree algorithm which is one of the data mining techniques which conduct grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on students leaving their majors. The dataset consists of 5,115 features through data selection from total data of 13,346 collected from a university in Kangwon-Do during seven years(2000.3.1 $\sim$ 2006.6.30). The main objective of this study is to evaluate performance of algorithms including CHAID, CART and C4.5 for classification of students leaving their majors with ROC Chart, Lift Chart and Gains Chart. Also, this study provides values about accuracy, sensitivity, specificity using classification table. According to the analysis result, CART showed the best performance for classification of students leaving their majors.

  • PDF