Design of Heuristic Decision Tree (HDT) Using Human Knowledge

Yoon, Tae-Tok;Lee, Jee-Hyong;

doi:10.5391/JKIIS.2009.19.4.525

한국지능시스템학회논문지 (Journal of the Korean Institute of Intelligent Systems)

제19권4호
/
Pages.525-531
/
2009
/
1976-9172(pISSN)
/
2288-2324(eISSN)

한국지능시스템학회 (Korean Institute of Intelligent Systems)

DOI QR Code

인간 지식을 이용한 경험적 의사결정트리의 설계

Design of Heuristic Decision Tree (HDT) Using Human Knowledge

윤태복 (성균관대학교 컴퓨터공학과) ;
이지형 (성균관대학교 컴퓨터공학과)

투고 : 2009.04.06
심사 : 2009.07.28
발행 : 2009.08.25

https://doi.org/10.5391/JKIIS.2009.19.4.525 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

데이터 마이닝(Data Mining)은 수집된 데이터로 부터 감춰진 패턴을 찾는 작업이다. 여기에서 수집된 데이터는 예측 및 추천을 위한 기반 정보로 중요한 역할을 하며, 분석 결과의 성능을 향상시키기 위해 잘못된(Missing value) 데이터를 선별하는 과정을 필요로 한다. 수집한 데이터에서 의도하지 못한 데이터를 선별하기 위한 기존의 방법은 주로 통계적이거나 단순 거리(Distance)에 기반을 둔 방법을 이용하였다. 하지만 환경 및 데이터의 특성을 고려하지 못하여, 의미 있는 데이터도 함께 분석에서 제외 될 수 있는 문제점을 가지고 있다. 본 논문은 인간의 경험적 지식을 수집된 데이터와 비교하여 가중치로 변환하고, 의사결정트리(Decision Tree)의 생성에 이용한다. 생성된 트리는 인간의 지식이 반영되어 기존의 분석 방법보다 신뢰성이 높다고 할 수 있으며, 실험을 통하여 제안하는 방법의 유효성을 확인하였다.

Data mining is the process of extracting hidden patterns from collected data. At this time, for collected data which take important role as the basic information for prediction and recommendation, the process to discriminate incorrect data in order to enhance the performance of analysis result, is needed. The existing methods to discriminate unexpected data from collected data, mainly relies on methods which are based on statistics or simple distance between data. However, for these methods, the problematic point that even meaningful data could be excluded from analysis due that the environment and characteristic of the relevant data are not considered, exists. This study proposes a method to endow human heuristic knowledge with weight value through the comparison between collected data and human heuristic knowledge, and to use the value for creating a decision tree. The data discrimination by the method proposed is more credible as human knowledge is reflected in the created tree. The validity of the proposed method is verified through an experiment.

키워드

참고문헌

Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth,'Knowledge Discorvery and Data Mining : Towards a Unifying Framework', Proc. KDD-96, 1996
Sun-Young Hwang, H. E. Hahn,'Pre-Adjustment of Incomplete Group Variable via K-Means Clustering', Journal of Korea Data & Information Science Society, Vol. 15, No. 3, 2004
Uwe Dick, Peter Haider, Tobias Scheffer,'Learning from Incomplete Data with Infinite Imputations', Proceedings of the 25th International Conference on Machine Learning, 2008
Jingke Xi, 'Outlier Detection Algorithms in Data Mining,' IEEE Second International Symposium on Intelligent Information Technology Application, 2008
Yongse Kim, Taebok Yoon, Heonjin Cha, Youngmo Jung,Eric Wang and Jee-Hyong Lee, 'An Outliers Analysis of Learner's Data based on User Interface Behaviors', Proc. 7th IEEE Int'l. Conf. Advanced Learning Technologies (ICALT), 2007
Chul-Heui Lee, Sang-Chul Choi,'Discovering Classification Knowledge for Data Mining using Rough Sets and Hierarchical Classification Structure,' Journal of Telecommunication and Information, Vol. 5, pp.79-85, 2001
Emmanuel Muller, Ira Assent, Uwe Steinhausen, Thomas Seidl,'OutRank: ranking outliers in high dimensional data', International Conference on Data Engineering (ICDE) Workshop 2008 https://doi.org/10.1109/ICDEW.2008.4498387
Sheng-yi Jiang, Qing-bo An,'Clustering-based Outlier Detection Method', Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008
Hongwei Zhang, Yuchang Lu,'Learning Bayesian network classifiers from data with missing values', Proceedings. IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering(TENCON '02), 2002
Zhiqiang Zheng,'On an incomplete data problem in modeling: Evidence from Web usage mining and a general purpose solution', Dissertation, University of Pennsylvania, 2003
Trong Dung Nguyen, Tu Bao Ho, Hiroshi Shimodaira, 'A Scalable Algorithm for Rule Post-pruning of Large Decision Trees', Proceedings of the 5th Pacific-Asia Conference on Knowledge, 2001
'http://archive.ics.uci.edu/ml/index.html', UC Irvine Machine Learning Repository Website

피인용 문헌

A Study on Propriety of Pilot Aptitude Test Using Phased Analysis of Pilot Training vol.26, pp.3, 2016, https://doi.org/10.5391/JKIIS.2016.26.3.218

한국지능시스템학회논문지 (Journal of the Korean Institute of Intelligent Systems)

인간 지식을 이용한 경험적 의사결정트리의 설계

Design of Heuristic Decision Tree (HDT) Using Human Knowledge

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)