• Title/Summary/Keyword: decision tree induction

Search Result 38, Processing Time 0.021 seconds

Decision Tree based Disambiguation of Semantic Roles for Korean Adverbial Postpositions in Korean-English Machine Translation (한영 기계번역에서 결정 트리 학습에 의한 한국어 부사격 조사의 의미 중의성 해소)

  • Park, Seong-Bae;Zhang, Byoung-Tak;Kim, Yung-Taek
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.668-677
    • /
    • 2000
  • Korean has the characteristics that case postpositions determine the syntactic roles of phrases and a postposition may have more than one meanings. In particular, the adverbial postpositions make translation from Korean to English difficult, because they can have various meanings. In this paper, we describe a method for resolving such semantic ambiguities of Korean adverbial postpositions using decision trees. The training examples for decision tree induction are extracted from a corpus consisting of 0.5 million words, and the semantic roles for adverbial postpositions are classified into 25 classes. The lack of training examples in decision tree induction is overcome by clustering words into classes using a greedy clustering algorithm. The cross validation results show that the presented method achieved 76.2% of precision on the average, which means 26.0% improvement over the method determining the semantic role of an adverbial postposition as the most frequently appearing role.

  • PDF

PAC-Learning a Decision Tree with Pruning (의사결정나무의 현실적인 상황에서의 팩(PAC) 추론 방법)

  • Kim, Hyeon-Su
    • Asia pacific journal of information systems
    • /
    • v.3 no.1
    • /
    • pp.155-189
    • /
    • 1993
  • Empirical studies have shown that the performance of decision tree induction usually improves when the trees are pruned. Whether these results hold in general and to what extent pruning improves the accuracy of a concept have not been investigated theoretically. This paper provides a theoretical study of pruning. We focus on a particular type of pruning and determine a bound on the error due to pruning. This is combined with PAC (Probably Approximately Correct) Learning theory to determine a sample size sufficient to guarantee a probabilistic bound on the concept error. We also discuss additional pruning rules and give an analysis for the pruning error.

  • PDF

A Development of Knowledge Error Analysis Methodology for practical use of Expert Systems (전문가시스템 실용화를 위한 지식오류분석방법론 연구)

  • Kim, Hyeon-Su
    • Asia pacific journal of information systems
    • /
    • v.6 no.2
    • /
    • pp.77-105
    • /
    • 1996
  • The accuracy of knowledge is a major concern for expert system developers and users. Machine learning approaches have recently been found to be useful in knowledge acquisition for expert systems. However, the accuracy of concept acquired from machine learning could not be analyzed in most cases. In this paper we develop a comprehensive knowledge error analysis methodology for practical use of expert systems. Decision tree induction is an important type of machine learning method for business expert systems. Here we start to analyze with knowledge acquired from decision tree induction method, and extend the results to develop error analysis methodology for general machine learning methods. We give several examples and illustrations for these results. We also discuss the applicability of these results to multistrategy learning approaches.

  • PDF

A study on the comparison of descriptive variables reduction methods in decision tree induction: A case of prediction models of pension insurance in life insurance company (생명보험사의 개인연금 보험예측 사례를 통해서 본 의사결정나무 분석의 설명변수 축소에 관한 비교 연구)

  • Lee, Yong-Goo;Hur, Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.1
    • /
    • pp.179-190
    • /
    • 2009
  • In the financial industry, the decision tree algorithm has been widely used for classification analysis. In this case one of the major difficulties is that there are so many explanatory variables to be considered for modeling. So we do need to find effective method for reducing the number of explanatory variables under condition that the modeling results are not affected seriously. In this research, we try to compare the various variable reducing methods and to find the best method based on the modeling accuracy for the tree algorithm. We applied the methods on the pension insurance of a insurance company for getting empirical results. As a result, we found that selecting variables by using the sensitivity analysis of neural network method is the most effective method for reducing the number of variables while keeping the accuracy.

  • PDF

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

A Contextual Study of Public Transport Information Service Use Behavior in Daily Activity (일상 활동에서의 상황변수를 고려한 대중교통 정보서비스 이용 유형 연구)

  • Jo, Chang-Hyeon;Lee, Baek-Jin;Bin, Mi-Yeong
    • Journal of Korean Society of Transportation
    • /
    • v.28 no.4
    • /
    • pp.19-30
    • /
    • 2010
  • It has become important to have some proper guidelines of how to provide public transport information services in response to the rapid IT developments and the wide spread of public information services. The current study takes a contextual approach to the analysis of public transportation information use under a dynamic decision situation, complementing the conventional cross-sectional approaches. Using the CHAID of decision tree induction based on decision table formalism applied to the survey data of activity travel and information use, the study found that the information type and medium choices are strongly affected by the decision contexts in addition to the individuals' socio-demographic characteristics. The results suggest an important implication to the market segmentation of information services for public transportation.

SOHO Bankruptcy Prediction Using Modified Bagging Predictors (Modified Bagging Predictors를 이용한 SOHO 부도 예측)

  • Kim, Seung-Hyuk;Kim, Jong-Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.2
    • /
    • pp.15-26
    • /
    • 2007
  • In this study, a SOHO (Small Office Home Office) bankruptcy prediction model is proposed using Modified Bagging Predictors which is modification of traditional Bagging Predictors. There have been several studies on bankruptcy prediction for large and middle size companies. However, little studies have been done for SOHOs. In commercial banks, loan approval processes for SOHOs are usually less structured than those for large and middle size companies, and largely depend on partial information such as credit scores. In this study, we use a real SOHO loan approval data set of a Korean bank. First, decision tree induction techniques and artificial neural networks are applied to the data set, and the results are not satisfactory. Bagging Predictors which has been not previously applied for bankruptcy prediction and Modified Bagging Predictors which is proposed in this paper are applied to the data set. The experimental results show that Modified Bagging Predictors provides better performance than decision tree inductions techniques, artificial neural networks, and Bagging Predictors.

  • PDF

Decision Tree Induction with Imbalanced Data Set: A Case of Health Insurance Bill Audit in a General Hospital (불균형 데이터 집합에서의 의사결정나무 추론: 종합 병원의 건강 보험료 청구 심사 사례)

  • Hur, Joon;Kim, Jong-Woo
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.45-65
    • /
    • 2007
  • In medical industry, health insurance bill audit is unique and essential process in general hospitals. The health insurance bill audit process is very important because not only for hospital's profit but also hospital's reputation. Particularly, at the large general hospitals many related workers including analysts, nurses, and etc. have engaged in the health insurance bill audit process. This paper introduces a case of health insurance bill audit for finding reducible health insurance bill cases using decision tree induction techniques at a large general hospital in Korea. When supervised learning methods had been tried to be applied, one of major problems was data imbalance problem in the health insurance bill audit data. In other words, there were many normal(passing) cases and relatively small number of reduction cases in a bill audit dataset. To resolve the problem, in this study, well-known methods for imbalanced data sets including over sampling of rare cases, under sampling of major cases, and adjusting the misclassification cost are combined in several ways to find appropriate decision trees that satisfy required conditions in health insurance bill audit situation.

Podiatric Clinical Diagnosis using Decision Tree Data Mining (결정트리 데이터마이닝을 이용한 족부 임상 진단)

  • Kim, Jin-Ho;Park, In-Sik;Kim, Bong-Ok;Yang, Yoon-Seok;Won, Yong-Gwan;Kim, Jung-Ja
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.2
    • /
    • pp.28-37
    • /
    • 2011
  • With growing concerns about healthy life recently, although the podiatry which deals with the whole area for diagnosis, treatment of foot and leg, and prevention has been widely interested, research in our country is not active. Also, because most of the previous researches in data analysis performed the quantitative approaches, the reasonable level of reliability for clinical application could not be guaranteed. Clinical data mining utilizes various data mining analysis methods for clinical data, which provides decision support for expert's diagnosis and treatment for the patients. Because the decision tree can provide good explanation and description for the analysis procedure and is easy to interpret the results, it is simple to apply for clinical problems. This study investigate rules of item of diagnosis in disease types for adapting decision tree after collecting diagnosed data patients who are 2620 feet of 1310(males:633, females:677) in shoes clinic (department of rehabilitation medicine, Chungnam National University Hospital). and we classified 15 foot diseases followed factor of 22 foot diseases, which investigated diagnosis of 64 rules. Also, we analyzed and compared correlation relationship of characteristic of disease and factor in types through made decision tree from 5 class types(infants, child, adolescent, adult, total). Investigated results can be used qualitative and useful knowledge for clinical expert`s, also can be used tool for taking effective and accurate diagnosis.

Identifying prospective buyers for specific products using artificial neural network and induction rules (인공신경망과 귀납규칙기법을 이용한 제품별 예상 구매고객예측)

  • Lee Geon-Ho;Jeong Su-Mi;Jeong Byeong-Hui
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.395-398
    • /
    • 2004
  • It is effective and desirable for a proper customer relational management(CRM) to send an email of product sales' advertisement bills for the prospective customers rather than to send spam mails for non specific customers. This study identifies the prospective customers with high probability to buy the specific products using Artificial Neural Network(ANN) and Induction Rule(IR) technique. We suggest an integrated model, IRANN of ANN and IR of decision tree program C5.0 and, also compare and analyze the accuracy of ANN, IR, and IRANN each other.

  • PDF