Normalized Term Frequency Weighting Method in Automatic Text Categorization

자동 문서분류에서의 정규화 용어빈도 가중치방법

  • 김수진 (전남대학교 자연과학대학 전산학과) ;
  • 박혁로 (전남대학교 자연과학대학 전산학과)
  • Published : 2003.11.01

Abstract

This paper defines Normalized Term Frequency Weighting method for automatic text categorization by using Box-Cox, and then it applies automatic text categorization. Box-Cox transformation is statistical transformation method which makes normalized data. This paper applies that and suggests new term frequency weighting method. Because Normalized Term Frequency is different from every term compared by existing term frequency weighting method, it is general method more than fixed weighting method such as log or root. Normalized term frequency weighting method's reasonability has been proved though experiments, used 8000 newspapers divided in 4 groups, which resulted high categorization correctness in all cases.

Keywords