DOI QR코드

DOI QR Code

BPNN Algorithm with SVD Technique for Korean Document categorization

한글문서분류에 SVD를 이용한 BPNN 알고리즘

  • 리청화 (전북대학교 전자정보공학부) ;
  • 변동률 (전북대학교 전자정보공학부) ;
  • 박순철 (전북대학교 전자정보공학부)
  • Received : 2010.03.11
  • Accepted : 2010.06.10
  • Published : 2010.06.30

Abstract

This paper proposes a Korean document. categorization algorithm using Back Propagation Neural Network(BPNN) with Singular Value Decomposition(SVD). BPNN makes a network through its learning process and classifies documents using the network. The main difficulty in the application of BPNN to document categorization is high dimensionality of the feature space of the input documents. SVD projects the original high dimensional vector into low dimensional vector, makes the important associative relationship between terms and constructs the semantic vector space. The categorization algorithm is tested and compared on HKIB-20000/HKIB-40075 Korean Text Categorization Test Collections. Experimental results show that BPNN algorithm with SVD achieves high effectiveness for Korean document categorization.

본 논문에서는 역전파 신경망 알고리즘(BPNN: Back Propagation Neural Network)과 Singular Value Decomposition(SVD)를 이용하는 한글 문서 분류 시스템을 제안한다. BPNN은 학습을 통하여 만들어진 네트워크를 이용하여 문서분류를 수행한다. 이 방법의 어려움은 분류기에 입력되는 특징 공간이 너무 크다는 것이다. SVD를 이용하면 고차원의 벡터를 저차원으로 줄일 수 있고, 또한 의미있는 벡터 공간을 만들어 단어 사이의 중요한 관계성을 구축할 수 있다. 본 논문에서 제안한 BPNN의 성능 평가를 위하여 한국일보-2000/한국일보-40075 문서범주화 실험문서집합의 데이터 셋을 이용하였다. 실험결과를 통하여 BPNN과 SVD를 사용한 시스템이 한글 문서 분류에 탁월한 성능을 가지는 것을 보여준다.

Keywords

References

  1. C. Apte and F.Damerau, "Automated learning of decision rules for text categorization", ACM Transactions on Information Systems, Vol. 12, No.3, pp.233-251, 1994 https://doi.org/10.1145/183422.183423
  2. E. D. Wiener. A neural network approach to topic spotting in text. Master's thesis, Department of Computer Science, University of Colorado at Boulder, Boulder, US, 1995.
  3. D. E. Rumelhart, R. Durbin, R. Goldenand, and Y. Chauvin. Backpropagation: The basic theory. In M. C. Mozer and D. E. Rumelhart, editors, Mathematical Perspectives on Neural Networks, Lawrence Associates, Hillsdale, NJ, pp 533–566. 1996.
  4. D. E. Rumelhart and J. L. McClelland. Parallel distributed processing: exploration in the microstructure cognition, volume vols. 1 & 2. MIT Press, 1986.
  5. Ruiz, M. E., Srinivasan, P. Automatic Text Categorization Using Neural Network, in: Proceedings of the 8th ASIS SIG/CR Workshop on Classification Research, pp. 59-72. 1998.
  6. Noorinaeini, A. and Lehto, M.R. "Hybrid singular value decomposition; a model of human text classification", Int. J. of Human Factors Modelling and Simulation, Vol. 1,No.1, pp.95–118. 2006. https://doi.org/10.1504/IJHFMS.2006.011684
  7. Y. Yany, "Noise reduction in a statistical approach to text categorization," in Proc. of the 18th ACM International Conference on Rexorch ond Development in Informorion Retrieval, New York, pp. 256.263. 1995.
  8. Dudani, S.A. The distance-weighted k-nearest -neighbor rule. IEEE Trans. Syst. Man Cybern., SMC-6: 325–327, 1976. https://doi.org/10.1109/TSMC.1976.5408784
  9. Li BL, Yu SW, Qin Lu. An improved k-nearest neighbor algorithm for text categorization. In: Sun MS, Yao TS, Yuan CF, eds. Proc. of the 20th Int'l Conf. on Computer Processing of Oriental Languages. Beijing: Tsinghua University Press, 2003.
  10. Songbo Tan. An effective refinement strategy for KNN text classifier. Expert Systems with Applications, Volume 30, Issue 2, February, pp. 290-298. 2006. https://doi.org/10.1016/j.eswa.2005.07.019
  11. 김대수, 신경망 이론과 응용(I, II) , 하이테크 정보, 1992
  12. 오일석, 패턴 인식, 교보문고, 2008
  13. 한국일보 문서범주화 실험문서집합. http://www.kristalinfo.com/TestCollections/readme_hkib.pdf
  14. Nakayama, M., & Shimizu, Y. Subject categorization for web educational resources using MLP. In Proceedings of 11th European symposium on artificial neural networks, pp. 9–14. 2003.