DOI QR코드

DOI QR Code

Design of Heuristic Decision Tree (HDT) Using Human Knowledge

인간 지식을 이용한 경험적 의사결정트리의 설계

  • 윤태복 (성균관대학교 컴퓨터공학과) ;
  • 이지형 (성균관대학교 컴퓨터공학과)
  • Received : 2009.04.06
  • Accepted : 2009.07.28
  • Published : 2009.08.25

Abstract

Data mining is the process of extracting hidden patterns from collected data. At this time, for collected data which take important role as the basic information for prediction and recommendation, the process to discriminate incorrect data in order to enhance the performance of analysis result, is needed. The existing methods to discriminate unexpected data from collected data, mainly relies on methods which are based on statistics or simple distance between data. However, for these methods, the problematic point that even meaningful data could be excluded from analysis due that the environment and characteristic of the relevant data are not considered, exists. This study proposes a method to endow human heuristic knowledge with weight value through the comparison between collected data and human heuristic knowledge, and to use the value for creating a decision tree. The data discrimination by the method proposed is more credible as human knowledge is reflected in the created tree. The validity of the proposed method is verified through an experiment.

데이터 마이닝(Data Mining)은 수집된 데이터로 부터 감춰진 패턴을 찾는 작업이다. 여기에서 수집된 데이터는 예측 및 추천을 위한 기반 정보로 중요한 역할을 하며, 분석 결과의 성능을 향상시키기 위해 잘못된(Missing value) 데이터를 선별하는 과정을 필요로 한다. 수집한 데이터에서 의도하지 못한 데이터를 선별하기 위한 기존의 방법은 주로 통계적이거나 단순 거리(Distance)에 기반을 둔 방법을 이용하였다. 하지만 환경 및 데이터의 특성을 고려하지 못하여, 의미 있는 데이터도 함께 분석에서 제외 될 수 있는 문제점을 가지고 있다. 본 논문은 인간의 경험적 지식을 수집된 데이터와 비교하여 가중치로 변환하고, 의사결정트리(Decision Tree)의 생성에 이용한다. 생성된 트리는 인간의 지식이 반영되어 기존의 분석 방법보다 신뢰성이 높다고 할 수 있으며, 실험을 통하여 제안하는 방법의 유효성을 확인하였다.

Keywords

References

  1. Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth,'Knowledge Discorvery and Data Mining : Towards a Unifying Framework', Proc. KDD-96, 1996
  2. Sun-Young Hwang, H. E. Hahn,'Pre-Adjustment of Incomplete Group Variable via K-Means Clustering', Journal of Korea Data & Information Science Society, Vol. 15, No. 3, 2004
  3. Uwe Dick, Peter Haider, Tobias Scheffer,'Learning from Incomplete Data with Infinite Imputations', Proceedings of the 25th International Conference on Machine Learning, 2008
  4. Jingke Xi, 'Outlier Detection Algorithms in Data Mining,' IEEE Second International Symposium on Intelligent Information Technology Application, 2008
  5. Yongse Kim, Taebok Yoon, Heonjin Cha, Youngmo Jung,Eric Wang and Jee-Hyong Lee, 'An Outliers Analysis of Learner's Data based on User Interface Behaviors', Proc. 7th IEEE Int'l. Conf. Advanced Learning Technologies (ICALT), 2007
  6. Chul-Heui Lee, Sang-Chul Choi,'Discovering Classification Knowledge for Data Mining using Rough Sets and Hierarchical Classification Structure,' Journal of Telecommunication and Information, Vol. 5, pp.79-85, 2001
  7. Emmanuel Muller, Ira Assent, Uwe Steinhausen, Thomas Seidl,'OutRank: ranking outliers in high dimensional data', International Conference on Data Engineering (ICDE) Workshop 2008 https://doi.org/10.1109/ICDEW.2008.4498387
  8. Sheng-yi Jiang, Qing-bo An,'Clustering-based Outlier Detection Method', Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008
  9. Hongwei Zhang, Yuchang Lu,'Learning Bayesian network classifiers from data with missing values', Proceedings. IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering(TENCON '02), 2002
  10. Zhiqiang Zheng,'On an incomplete data problem in modeling: Evidence from Web usage mining and a general purpose solution', Dissertation, University of Pennsylvania, 2003
  11. Trong Dung Nguyen, Tu Bao Ho, Hiroshi Shimodaira, 'A Scalable Algorithm for Rule Post-pruning of Large Decision Trees', Proceedings of the 5th Pacific-Asia Conference on Knowledge, 2001
  12. 'http://archive.ics.uci.edu/ml/index.html', UC Irvine Machine Learning Repository Website

Cited by

  1. A Study on Propriety of Pilot Aptitude Test Using Phased Analysis of Pilot Training vol.26, pp.3, 2016, https://doi.org/10.5391/JKIIS.2016.26.3.218