Web Mining Using Fuzzy Integration of Multiple Structure Adaptive Self-Organizing Maps

다중 구조적응 자기구성지도의 퍼지결합을 이용한 웹 마이닝

  • Published : 2004.01.01

Abstract

It is difficult to find an appropriate web site because exponentially growing web contains millions of web documents. Personalization of web search can be realized by recommending proper web sites using user profile but more efficient method is needed for estimating preference because user's evaluation on web contents presents many aspects of his characteristics. As user profile has a property of non-linearity, estimation by classifier is needed and combination of classifiers is necessary to anticipate diverse properties. Structure adaptive self-organizing map (SASOM) that is suitable for Pattern classification and visualization is an enhanced model of SOM and might be useful for web mining. Fuzzy integral is a combination method using classifiers' relevance that is defined subjectively. In this paper, estimation of user profile is conducted by using ensemble of SASOM's teamed independently based on fuzzy integral and evaluated by Syskill & Webert UCI benchmark data. Experimental results show that the proposed method performs better than previous naive Bayes classifier as well as voting of SASOM's.

폭발적으로 성장하고 있는 웹은 수백만 개의 웹 문서를 포함하고 있기 때문에, 적절한 웹사이트를 찾기 어렵다. 사용자 프로파일을 사용하여 적절한 웹사이트를 추천함으로써 웹의 탐색을 개인화 할 수도 있지만 웹 컨텐츠에 대한 사용자의 평가는 사용자의 성격에 관한 다양한 측면을 표현하므로 사용자의 선호도를 예측하기 위해서는 보다 효과적인 방법이 필요하다. 사용자 프로파일은 비선형적인 특성을 가지고 있으므로 분류기를 사용하여 예측하여야 하며 다양한 특성을 예측하기 위해 분류기의 결합이 필요하다. 패턴분류와 시각화에 유용한 구조적응 자기구성지도(SASOM)는 개선된 SOM 모델로서 웹 마이닝에 적절하다. 퍼지 적분은 주관적으로 정의된 분류기의 중요도를 이용하여 결합하는 방법이다. 본 논문에서는 독립적으로 학습된 SASOM의 퍼지적분(fuzzy integral)기반 결합을 이용하여 사용자의 프로파일을 예측하고 UCI 벤치마크 데이타인 Syskill & Webert 데이타를 사용하여 그 성능을 평가한다. 실험결과 제안한 방법이 기존의 naive Bayes 분류기뿐만 아니라 SASOM의 투표결합보다 우수한 성능을 보였다.

Keywords

References

  1. J. Vesanto, 'SOM-based data visualization methods,' Intelligent Data Analysis, vol. 3, no. 2, pp. 111-126, August 1999 https://doi.org/10.1016/S1088-467X(99)00013-X
  2. T. Kohonen, 'The self-organizing map,' Proceedings of the IEEE, vol. 78, no. 9, pp. 1464-1480, 1990 https://doi.org/10.1109/5.58325
  3. P. N. Suganthan, 'Pattern classification using multiple hierarchical overlapped self-organising maps,' Pattern Recognition, vol. 34, no. 11, pp. 2173-2179, Nov 2001 https://doi.org/10.1016/S0031-3203(00)00147-3
  4. S.-B. Cho, 'Neural-network classifiers for recognizing totally unconstrained handwritten numerals,' IEEE Transactions on Neural Networks, vol. 8, no. 1, pp. 43-53, Jan 1997 https://doi.org/10.1109/72.554190
  5. S.-B. Cho, 'Self-organizing map with dynamical node splitting: Application to handwritten digit recognition,' Neural Computation, vol. 9, no. 6, pp. 1343-1353, 1997 https://doi.org/10.1162/neco.1997.9.6.1345
  6. S.-B. Cho, 'Ensemble of structure-adaptive self-organizing maps for high performance classfication,' Information Science, vol. 123, no. 1-2, pp. 103-114, March 2000 https://doi.org/10.1016/S0020-0255(99)00112-7
  7. J. R. Quinlan, 'Induction of decision trees,' Machine Learning, vol. 1, pp. 81-106, 1986 https://doi.org/10.1023/A:1022643204877
  8. D. Mladenic and M. Grobelnik, 'Feature selection on hierarchy of web documents,' Decision Support Systems, vol. 35, pp. 45-87, 2003 https://doi.org/10.1016/S0167-9236(02)00097-0
  9. A. Verikas, A. Lipnickas, K. Malmqvist, M. Bacauskiene, and A. Gelzinis, 'Soft combination of neural classifiers: A comparative study,' Pattern Recognition Letters, vol. 20, pp. 429-444, 1999 https://doi.org/10.1016/S0167-8655(99)00012-4
  10. S.-B. Cho, and J.-H. Kim, 'Combining multiple neural networks by fuzzy integral for robust classification,' IEEE Transaction on Systems, Man and Cybernetics, vol. 25, no. 2, pp. 380-384, February 1995 https://doi.org/10.1109/21.364825
  11. S. Hettich and S. D. Bay, The UCI KDD Archive, http://kdd.ics.uci.edu
  12. M. Pazzani and D. Billsus, 'Learning and revising user profiles: The identification of interesting web sites,' Machine Learning, vol. 27, pp. 313-331, 1997 https://doi.org/10.1023/A:1007369909943
  13. A. R. Mirhosseini, H. Yan, K.-M. Lam, and T. Pham, 'Human face image recognition: An evidence aggregation approach,' Computer Vision and Image Understanding, vol. 71, no. 2, pp. 213-230, 1998 https://doi.org/10.1006/cviu.1998.0710
  14. S.-B. Cho and J.-H. Kim, 'Multiple network fusion using fuzzy logic,' IEEE Transaction on Neural Networks, vol. 6, no. 2, pp. 497-501, 1995 https://doi.org/10.1109/72.363487
  15. T. D. Pham, 'Combination of multiple classfiers using adaptive fuzzy integral,' Proceedings of the 2002 IEEE International Conference on Artificial Intelligence System (ICAIS'02), pp. 50-55, 2002 https://doi.org/10.1109/ICAIS.2002.1048051
  16. A. S. Kumar, S. K. Basu and K. L. Majumdar, 'Robust classfication of multispectral data using multiple neural networks and fuzzy integral,' IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 3, pp. 787-790, May 1997 https://doi.org/10.1109/36.582004
  17. S. Mitra, S. K. Pal, and P. Mitra, 'Data mining in soft computing framework: A survey,' IEEE Trans. on Neural Networks, vol. 13,no. 1, pp. 3-14, January 2002 https://doi.org/10.1109/72.977258
  18. T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero and A. Saarela, 'Self-organization of a massive document collection,' IEEE Transactions on Neural Networks, vol. 11, pp. 574-585, 2000 https://doi.org/10.1109/72.846729
  19. H.-U. Bauer and T. Villmann, 'Growing a hypercubical output space in a self-organising feature map,' IEEE Transactions on Neural Networks, vol. 8, no. 2, pp. 218-226, 1997 https://doi.org/10.1109/72.557659
  20. P. N. Sugathan, 'Hierachical overlapped SOM-based multiple classifiers combination,' In the 5th International Conference on Control,Automation, Robotics & Vision (ICARCV'98), pp. 924-927, 1998
  21. D. Lewis, 'Feature selection and feature extraction for text categorization,' Proceedings of the DARPA Workshop on Speech and Natural Language, pp. 212-217, 1992 https://doi.org/10.3115/1075527.1075574
  22. N. R. Pal, 'Soft computing for feature analysis,' Fuzzy Sets and Systems, vol. 103, pp. 201-221, 1999 https://doi.org/10.1016/S0165-0114(98)00222-X
  23. M. Sugeno, 'Fuzzy measures and fuzzy integrals: A survey,' Fuzzy Automata and Decision Processes, Amsterdam: North Holland, pp. 89-102, 1977
  24. K. Leszeynski, P. Penczek and W. Grochulskki, 'Sugeno's fuzzy measures and fuzzy clustering,' Fuzzy Sets and Systems, vol. 15, pp. 147-158, 1985 https://doi.org/10.1016/0165-0114(85)90043-0
  25. R. R. Yager, 'Element selection from a fuzzy subset using the fuzzy integral,' IEEE Transactions on Systems, Man and Cybernetics, vol. 23, pp. 467-477, 1993 https://doi.org/10.1109/21.229459