Cluster Feature Selection using Entropy Weighting and SVD

엔트로피 가중치 및 SVD를 이용한 군집 특징 선택

  • Published : 2002.04.01

Abstract

Clustering is a method for grouping objects with similar properties into a same cluster. SVD(Singular Value Decomposition) is known as an efficient preprocessing method for clustering because of dimension reduction and noise elimination for a high dimensional and sparse data set like E-Commerce data set. However, it is hard to evaluate the worth of original attributes because of information loss of a converted data set by SVD. This research proposes a cluster feature selection method, called ENTROPY-SVD, to find important attributes for each cluster based on entropy weighting and SVD. Using SVD, one can take advantage of the latent structures in the association of attributes with similar objects and, using entropy weighting one can find highly dense attributes for each cluster. This paper also proposes a model-based collaborative filtering recommendation system with ENTROPY-SVD, called CFS-CF and evaluates its efficiency and utilization.

군집화는 객체들의 특성을 분석하여 유사한 성질을 갖고 있는 객체들을 동일한 집단으로 분류하는 방법이다. 전자 상거래 자료처럼 차원 수가 많고 누락 값이 많은 자료의 경우 입력 자료의 차원축약, 잡음제거를 목적으로 SVD를 사용하여 군집화를 수행하는 것이 효과적이지만, SVD를 통해 변환된 자료는 원래의 속성 정보를 상실하기 때문에 군집 결과분석에서 원본 속성의 가치 해석이 어렵다. 따라서 본 연구는 군집화 수행 후 엔트로피 가중치 및 SVD를 이용하여 군집의 중요한 속성을 발견하기 위한 군집 특징 선택 기법 ENTROPY-SVD를 제안한다. ENTROPY-SVD는 자료의 속성들과 유사객체 군과의 묵시적인 은닉 구조를 활용하기 위하여 SVD를 이용하고 유사객체 군에 포함된 응집도가 높은 속성들을 발견하기 위하여 엔트로피 가중치를 사용한다. 또한 ENTROPY-SVD를 적용한 모델 기반의 협력적 여과기법의 추천 시스템 CFS-CF를 제안하고 그 효용성 및 효과를 평가한다.

Keywords

References

  1. Yang, Y., Pedersen, J.O., A Comparative Study on Feature Selection in Text Categorization, Proc.of the 14th International Conference on Machine Learning ICML97, pp. 412-420, 1997
  2. Jachims, T., A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, Proc. of the 14th International Conference on Machine Learning ICML97, pp. 143-151, 1997
  3. Lewis, D. D., Feature selection and feature extraction for text categorization, Proceedings of Speech and Natural Language Workshop, pp. 212-217, 1992 https://doi.org/10.3115/1075527.1075574
  4. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R., Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41(6), pp. 391-407, 1990 https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  5. Berry, M. W., Dumais, S. T., and O'Brien G. W., Using linear algebra for intelligent information retrieval, SIAM Review, 37(4), pp. 573-595, 1995 https://doi.org/10.1137/1037127
  6. Kolda, T. G. and O'Leary, D. P., A semidiscrete matrix decomposition for latent semantic indexing in information retrieval, ACM Trans. Inf. Syst., 16, pp. 322-346, 1998 https://doi.org/10.1145/291128.291131
  7. M.W. Berry, Z. Drmac, E.R. Jessup, Matrices, vector spaces, and information retrieval, SIAM Rev., 41(2), pp. 335-362, 1999 https://doi.org/10.1137/S0036144598347035
  8. Landauer, T. K., Foltz, P. W., and Laham, D., An introduction to Latent Semantic Analysis, In Discourse Processes 25, pp. 259-284, 1998
  9. Sarwar, B. M., Karypis, G., Konstan, J. A., and Riedl J., Application of Dimensionality Reduction in Recommender System-A Case Study, In ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000
  10. Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim., ROCK : a robust clustering algorithm for categorical attributes, In Information Systems, 25(5), pp.345-366, 2000 https://doi.org/10.1016/S0306-4379(00)00022-3
  11. Strehl, A., Ghosh and J., Mooney, R., Impact of similarity measures on web-page clustering, In Proc. AAAI Workshop on AI for Web Search, pp. 58-64, 2000
  12. M. Devaney and A. Ram., Efficient feature selection in conceptual clustering, In Machine Learning: Proceedings of the Fourteenth International Conference, pp. 92-97, Nashville, TN, 1997
  13. Paul Resnick ,Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom and John Riedl, GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the conference on Computer supported cooperative work, pp. 22-26, October 1994
  14. Sarwar, B. M., Karypis, G., Konstan, J. A., Riedl, J., Item-based Collaborative Filtering Recommender Algorithms, In WWW10 Conference, pp. 285-295, May 2001 https://doi.org/10.1145/371920.372071
  15. D. Billsus and M. J. Pazzani, Learning collaborative information filters, In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 46-54, July 1998
  16. Sonny HS Chee, RecTree: A Linear Collaborative Filtering Algorithm, M.S thesis, Computing Science, Simon Fraser University, 2000