Browse > Article
http://dx.doi.org/10.5391/JKIIS.2009.19.4.525

Design of Heuristic Decision Tree (HDT) Using Human Knowledge  

Yoon, Tae-Tok (성균관대학교 컴퓨터공학과)
Lee, Jee-Hyong (성균관대학교 컴퓨터공학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.19, no.4, 2009 , pp. 525-531 More about this Journal
Abstract
Data mining is the process of extracting hidden patterns from collected data. At this time, for collected data which take important role as the basic information for prediction and recommendation, the process to discriminate incorrect data in order to enhance the performance of analysis result, is needed. The existing methods to discriminate unexpected data from collected data, mainly relies on methods which are based on statistics or simple distance between data. However, for these methods, the problematic point that even meaningful data could be excluded from analysis due that the environment and characteristic of the relevant data are not considered, exists. This study proposes a method to endow human heuristic knowledge with weight value through the comparison between collected data and human heuristic knowledge, and to use the value for creating a decision tree. The data discrimination by the method proposed is more credible as human knowledge is reflected in the created tree. The validity of the proposed method is verified through an experiment.
Keywords
Heuristic Decision Tree; Human-Knowledge Data Mining; Outlier Data Reduction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Jingke Xi, 'Outlier Detection Algorithms in Data Mining,' IEEE Second International Symposium on Intelligent Information Technology Application, 2008
2 Yongse Kim, Taebok Yoon, Heonjin Cha, Youngmo Jung,Eric Wang and Jee-Hyong Lee, 'An Outliers Analysis of Learner's Data based on User Interface Behaviors', Proc. 7th IEEE Int'l. Conf. Advanced Learning Technologies (ICALT), 2007
3 Trong Dung Nguyen, Tu Bao Ho, Hiroshi Shimodaira, 'A Scalable Algorithm for Rule Post-pruning of Large Decision Trees', Proceedings of the 5th Pacific-Asia Conference on Knowledge, 2001
4 Emmanuel Muller, Ira Assent, Uwe Steinhausen, Thomas Seidl,'OutRank: ranking outliers in high dimensional data', International Conference on Data Engineering (ICDE) Workshop 2008   DOI
5 'http://archive.ics.uci.edu/ml/index.html', UC Irvine Machine Learning Repository Website
6 Hongwei Zhang, Yuchang Lu,'Learning Bayesian network classifiers from data with missing values', Proceedings. IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering(TENCON '02), 2002
7 Zhiqiang Zheng,'On an incomplete data problem in modeling: Evidence from Web usage mining and a general purpose solution', Dissertation, University of Pennsylvania, 2003
8 Chul-Heui Lee, Sang-Chul Choi,'Discovering Classification Knowledge for Data Mining using Rough Sets and Hierarchical Classification Structure,' Journal of Telecommunication and Information, Vol. 5, pp.79-85, 2001
9 Uwe Dick, Peter Haider, Tobias Scheffer,'Learning from Incomplete Data with Infinite Imputations', Proceedings of the 25th International Conference on Machine Learning, 2008
10 Sun-Young Hwang, H. E. Hahn,'Pre-Adjustment of Incomplete Group Variable via K-Means Clustering', Journal of Korea Data & Information Science Society, Vol. 15, No. 3, 2004
11 Sheng-yi Jiang, Qing-bo An,'Clustering-based Outlier Detection Method', Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008
12 Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth,'Knowledge Discorvery and Data Mining : Towards a Unifying Framework', Proc. KDD-96, 1996