DOI QR코드

DOI QR Code

Tree-structured Clustering for Mixed Data

혼합형 데이터에 대한 나무형 군집화

  • Yang Kyung-Sook (Brain Korea 21 The Education and Research Group for Korean Studies, Korea University) ;
  • Huh Myung-Hoe (Dept. of Statistics, Korea University)
  • Published : 2006.07.01

Abstract

The aim of this study is to propose a tree-structured clustering for mixed data. We suggest a scaling method to reduce the variable selection bias among categorical variables. In numerical examples such as credit data, German credit data, we note several differences between tree-structured clustering and K-means clustering.

본 논문에서는 범주형과 연속형 변수들이 혼합된 데이터에 적용할 수 있는 나무형 군집화 알고리즘을 제안하였다. 특히 혼합된 변수들이 공통의 의미를 갖도록 하기 위해 범주형 변수들을 전처리하는 방법을 고안하였다. 수치 예로서 SPSS의 신용(credit) 데이터와 독일신용자료(German credit data)에 알고리즘을 적용하고 그 결과를 검토하였다.

Keywords

References

  1. 김보화, 김규성 (2002). K-모드 알고리즘과 ROCK 알고리즘의 개선, <응용통계연구 >, 15, 381-393
  2. 송문섭, 윤영주 (2001). 데이터마이닝 패키지에서 변수선택 편의에 관한 연구, <응용통계연구>, 14, 475-486
  3. 정성석, 김순영, 임한필 (2004). 의사결정나무에서 분리변수 선택에 관한 연구, <응용통계연구>, 17, 347-357
  4. 최대우, 구자용, 최용석 (2004). 배경자료를 이용한 나무구조의 군집분석, <응용통계연구>, 17, 535-545
  5. 허명회, 양경숙 (2005). 연속형 자료에 대한 나무형 군집화, <응용통계연구>, 18, 661-671 https://doi.org/10.5351/KJAS.2005.18.3.661
  6. Berkhin, P. (2002). Survey of Clustering Data Mining Technique. Technical Report, Accrue Software
  7. Boley, D. (1998). Principal directions divisive partitioning, Data Mining and Knowledge Discovery, 2, 325-344 https://doi.org/10.1023/A:1009740529316
  8. Ganti, V., Gehrke, J., Ramakrishnan, R. (1999). CACTUS-clustering categorical data using summary, In Proceedings of the ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA. 73-83
  9. He Z., Xu X., Deng S., and Song Y. (2002). dNumber: A fast clustering algorithm for very large categorical data sets, 1-13. (http://citeseer.ist.psu.edu/ he02dnumber.html)
  10. Lee, Y. M. and Song, M. S. (2002). A study on unbiased methods in constructing classification trees, The Korean Communications in Statistics, 9, 809-824 https://doi.org/10.5351/CKSS.2002.9.3.809
  11. Liu, B., Xia, Y. and Yu, P. S. (2000). Clustering through decision tree construction, IBM Research Report RC21695
  12. Loh, W. Y. and Shih,Y. S. (1997). Split selection methods for classification trees, Statistica Sinica, 7, 815-840
  13. Song, H. I., Song, E. T. and Song, M. S. (2004). A study on the bias reduction in split variable selection in CART, The Korean Communications in Statistics, 11, 553-562 https://doi.org/10.5351/CKSS.2004.11.3.553
  14. TomChiu, DongPing Fang, John Chen, Yao Wang, Christopher Jeris. (2001). A robust and scalable clustering algorithm for mixed type attributes in large database environment, Proceedings of the seventh ACM SIG KDD international conference on knowledge discovering and data mining. 263-268
  15. Zhang, T., Ramakrishnan, R., and Livny, M. (1997). BIRCH: A new data clustering algorithms and its applications, Data Mining and Knowledge Discovery, 1, 141-182 https://doi.org/10.1023/A:1009783824328