Korean Document Classification using Characteristics of Word Information

  • Kim, Seok-Ki (Department of Computer Science and Statistis, Chonbuk National University) ;
  • Han, Kyung-Soo (Division of Mathematics and Statistical Informatics, Chonbuk National University) ;
  • Ahn, Jeong-Yong (Department of Computer Science and Statistics, Seonam University)
  • Published : 2003.05.31

Abstract

In document classification, target of analysis is not document itself but words appeared in the document. Word information, therefore, is a significant factor in document classification. In this study, we are dealing with the classification of Korean document based on words and feature vectors. First, we present the performance of document classification using nouns and keywords. Second, we compare to the results for the size of feature vectors.

Keywords

References

  1. 정보처리 v.5 no.5 정보 검색 강현규;박세영
  2. Journal of the Korean Data & Information Science Society v.11 no.1 베이지안 학습을 이용한 문서의 자동분류 김진상;신양규
  3. 응용통계연구 v.13 no.1 논리적 패턴을 이용한 확률화 정보검색 시스템의 연구 이윤오;이정진
  4. 정보처리학회논문지 B v.8-B no.5 색인어 연관성을 이용한 의료정보문서 분류에 관한 연구 임형근;장덕성
  5. 한국정보처리학회논문지 v.7 no.9 인터넷 문서 자동 분류 시스템 개발에 관한 연구 한광록;선복근;한상태;임기욱
  6. 자연언어이해 황도삼;최기선;김태석
  7. Norwegian Computing Center, Report No. 941 Text Categorisation: A Survey Aas K.;Eikvil L.
  8. Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval Distributional Clustering of Words for Text Classification Baker, L.D.;McCallum A.K.
  9. Annual Review of Information Science and Technology Applications of Machine Learning in Information Retrieval Cunningham S.J.;Littin J.;Witten I.H.
  10. 한국어 형태소 분석기와 한국어 분석 모듈 HAM
  11. 정보검색 시스템 평가를 위한 한글 테스트 컬렉션 HANTEC
  12. Proceedings of the 14th International Conference Machine Learning A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization Joachims T.
  13. Proceedings of the 5th International Conference on Document Analysis and Recognition On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy Junker M.;Hoch R.
  14. Proceedings of Speech and Natural Language Workshop Feature Selection and Feature Extraction for Text Categorization Lewis D.D.
  15. Machine Learning Mitchell T. M.
  16. Information Retrieval Rijsbergen C.J.
  17. Institute for Language, Logic and Computation, University of Amsterdam, Masters Thesis An Information Theoretic Approach to Finding Word Groups for Text Classification Verbeek J.J.
  18. Proceedings of the 14th International Conference on Machine Learning A Comparative Study on Feature Selection in Text Categorization Yang, Y.;Pedersen J.O.