범주별 태그 안정성을 이용한 태그 부착 자원의 SVM 기반 분류 기법

A SVM-based Method for Classifying Tagged Web Resources using Tag Stability of Folksonomy in Categories

  • 고병걸 (서울대학교 컴퓨터공학부) ;
  • 이강표 (서울대학교 컴퓨터공학부) ;
  • 김형주 (서울대학교 컴퓨터공학부)
  • 발행 : 2009.06.15

초록

폭소노미(Folksonomy)는 자유롭게 선택된 키워드의 집합인 태그를 이용하여 이루어지는 협업적 분류로서 웹 2.0의 대표 요소이다. 폭소노미는 기존 분류 방법인 택소노미(Taxonomy)에 비해 적은 비용으로 구축할 수 있다는 장점이 있으나 택소노미에 비해 계층적, 체계적 구조가 부족하다는 단점을 가지고 있다. 이에 폭소노미에 존재하는 집단 지성을 학습하여 웹 자원을 분류할 수 있는 분류기를 구축할 수 있다면 기존 방법인 택소노미를 적은 비용으로 구축할 수 있을 것이다. 본 논문에서는 Slashdot.org에 구축되어 있는 폭소노미를 대상으로 일반적 모델을 정의하고 이 안에서 안정성이 존재함을 보임으로써 분류기를 생성할 수 있는 집단 지성이 폭소노미에 실제로 존재함을 보인다. 그리고 이 집단 지성으로부터 형성되는 범주 별 태그의 특징인 안정성 값을 이용하여 SVM으로 분류기를 구축하는 방법을 제안한다. 실제로 우리가 제안하는 방법으로 폭소노미로부터 높은 정확도로 택소노미를 구축하였음을 실험을 통해 확인하였다.

Folksonomy, which is collaborative classification created by freely selected keywords, is one of the driving factors of the web 2.0. Folksonomy has advantage of being built at low cost while its weakness is lack of hierarchical or systematic structure in comparison with taxonomy. If we can build classifier that is able to classify web resources from collective intelligence in taxonomy, we can build taxonomy at low cost. In this paper, targeting folksonomy in Slashdot.org, we define a general model and show that collective intelligence, which can build classifier, really exists in folksonomy using a stability value. We suggest method that builds SVM classifier using stability that is result from this collective intelligence. The experiment shows that our proposed method managed to build taxonomy from folksonomy with high accuracy.

키워드

참고문헌

  1. Mathes, A., Folksonomies-Cooperative Classifica-tion and Communication Through Shared Meta-data. Computer Mediated Communication, LIS590 CMC (Doctoral Seminar), Graduate School of Lib-rary and Information Science, University of Illinois Urbana-Champaign, December, 2004
  2. Smith, G., Atomiq: Folksonomy: social classifica-tion. Information Architecture, 2004. 3
  3. Sinha, R., A cognitive analysis of tagging. Rashmi Sinha's weblog, available at: www. rashmisinha.com/archivcs/05_09/tagging-cognitive.html, 2005
  4. Slashdot. httpt/slashdot.org
  5. Brooks, C.H. and N. Montanez, Improved annota-tion of the blogosphere via autotagging and hiera-rchical clustering. Proceedings of the 15th inter-national conference on World Wide Web, pp. 625-632, 2006 https://doi.org/10.1145/1135777.1135869
  6. Chirita, P.A., et al., P-TAG: large scale automatic generation of personalized annotation tags for the web. Proceedings of the 16th international con-ference on World Wide Web, pp. 845-854, 2007 https://doi.org/10.1145/1242572.1242686
  7. Golder, S. and B.A. Huberman, Usage Patterns of Collaborative Tagging Systems. Journal of Infor-mation Science, 32(2), pp. 198-208, 2006 https://doi.org/10.1177/0165551506062337
  8. Halpin, H., V. Rohu, and H. Shepherd, The com-plex dynamics of collaborative tagging. Proceed-ings of the 16th international conference on World Wide Web, pp. 211-220, 2007 https://doi.org/10.1145/1242572.1242602
  9. Schmitz, P. Inducing ontology from Flickr tags. in Proceedings the Collaborative Web Tagging Workshop, WWW. 2006
  10. Li, X., L. Guo, and Y. Zhao. Tag-based Social Interest Discovery. in Proceedings of the 17th International World Wide Web Conference. 2008 https://doi.org/10.1145/1367497.1367589
  11. Sebastiani, F., Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), pp. 1-47, 2002 https://doi.org/10.1145/505282.505283
  12. Yang, Y. and J.O. Pedersen, A Comparative Study on Feature Selection in Text Categorization. Pro-ceedings of the Fourteenth International Confer-ence on Machine Learning table of contents, pp. 412-420, 1997
  13. Boser, B.E., I.M. Guyon, and V.N. Vapnik, A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on Com-putational learning theory, pp. 144-152, 1992 https://doi.org/10.1145/130385.130401
  14. Joachims, T., Text categorization with support vector machines: Learning with many relevant features. Proceedings of ECML-98, 10th European Conference on Machine Learning, 1398, pp. 137-142, 1998
  15. Kohavi, R., A study of cross-validation and boot-strap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 2(12), pp. 1137-1143, 1995