Tree-structured Clustering for Mixed Data

Yang Kyung-Sook;Huh Myung-Hoe;

doi:10.5351/KJAS.2006.19.2.271

The Korean Journal of Applied Statistics (응용통계연구)

Volume 19 Issue 2
/
Pages.271-282
/
2006
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Tree-structured Clustering for Mixed Data

혼합형 데이터에 대한 나무형 군집화

Yang Kyung-Sook (Brain Korea 21 The Education and Research Group for Korean Studies, Korea University) ;
Huh Myung-Hoe (Dept. of Statistics, Korea University)

양경숙 (BK21 한국학 교육.연구단) ;
허명회 (고려대학교 통계학과)

Published : 2006.07.01

https://doi.org/10.5351/KJAS.2006.19.2.271 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The aim of this study is to propose a tree-structured clustering for mixed data. We suggest a scaling method to reduce the variable selection bias among categorical variables. In numerical examples such as credit data, German credit data, we note several differences between tree-structured clustering and K-means clustering.

본 논문에서는 범주형과 연속형 변수들이 혼합된 데이터에 적용할 수 있는 나무형 군집화 알고리즘을 제안하였다. 특히 혼합된 변수들이 공통의 의미를 갖도록 하기 위해 범주형 변수들을 전처리하는 방법을 고안하였다. 수치 예로서 SPSS의 신용(credit) 데이터와 독일신용자료(German credit data)에 알고리즘을 적용하고 그 결과를 검토하였다.

Keywords

References

김보화, 김규성 (2002). K-모드 알고리즘과 ROCK 알고리즘의 개선, <응용통계연구 >, 15, 381-393
송문섭, 윤영주 (2001). 데이터마이닝 패키지에서 변수선택 편의에 관한 연구, <응용통계연구>, 14, 475-486
정성석, 김순영, 임한필 (2004). 의사결정나무에서 분리변수 선택에 관한 연구, <응용통계연구>, 17, 347-357
최대우, 구자용, 최용석 (2004). 배경자료를 이용한 나무구조의 군집분석, <응용통계연구>, 17, 535-545
허명회, 양경숙 (2005). 연속형 자료에 대한 나무형 군집화, <응용통계연구>, 18, 661-671 https://doi.org/10.5351/KJAS.2005.18.3.661
Berkhin, P. (2002). Survey of Clustering Data Mining Technique. Technical Report, Accrue Software
Boley, D. (1998). Principal directions divisive partitioning, Data Mining and Knowledge Discovery, 2, 325-344 https://doi.org/10.1023/A:1009740529316
Ganti, V., Gehrke, J., Ramakrishnan, R. (1999). CACTUS-clustering categorical data using summary, In Proceedings of the ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA. 73-83
He Z., Xu X., Deng S., and Song Y. (2002). dNumber: A fast clustering algorithm for very large categorical data sets, 1-13. (http://citeseer.ist.psu.edu/ he02dnumber.html)
Lee, Y. M. and Song, M. S. (2002). A study on unbiased methods in constructing classification trees, The Korean Communications in Statistics, 9, 809-824 https://doi.org/10.5351/CKSS.2002.9.3.809
Liu, B., Xia, Y. and Yu, P. S. (2000). Clustering through decision tree construction, IBM Research Report RC21695
Loh, W. Y. and Shih,Y. S. (1997). Split selection methods for classification trees, Statistica Sinica, 7, 815-840
Song, H. I., Song, E. T. and Song, M. S. (2004). A study on the bias reduction in split variable selection in CART, The Korean Communications in Statistics, 11, 553-562 https://doi.org/10.5351/CKSS.2004.11.3.553
TomChiu, DongPing Fang, John Chen, Yao Wang, Christopher Jeris. (2001). A robust and scalable clustering algorithm for mixed type attributes in large database environment, Proceedings of the seventh ACM SIG KDD international conference on knowledge discovering and data mining. 263-268
Zhang, T., Ramakrishnan, R., and Livny, M. (1997). BIRCH: A new data clustering algorithms and its applications, Data Mining and Knowledge Discovery, 1, 141-182 https://doi.org/10.1023/A:1009783824328

The Korean Journal of Applied Statistics (응용통계연구)

Tree-structured Clustering for Mixed Data

혼합형 데이터에 대한 나무형 군집화

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)