An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

Mikawa, Kenta;Ishida, Takashi;Goto, Masayuki;

doi:10.7232/iems.2012.11.1.087

Industrial Engineering and Management Systems

Volume 11 Issue 1
/
Pages.87-93
/
2012
/
1598-7248(pISSN)
/
2234-6473(eISSN)

Korean Institute of Industrial Engineers (대한산업공학회)

DOI QR Code

An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

Mikawa, Kenta (Graduate School of Creative Science and Engineering, Waseda University) ;
Ishida, Takashi (Media Network Center, Waseda University) ;
Goto, Masayuki (Faculty of Science and Engineering, Waseda University)

Received : 2011.11.15
Accepted : 2012.02.03
Published : 2012.03.01

https://doi.org/10.7232/iems.2012.11.1.087 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.

Keywords

References

Aizawa, A. (2000), The Feature Quantity: An Information Theoretic Perspective of Tfidf-like Measures, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 104-111.
Aizawa, A. (2003), An Information-theoric perspective tf-idf Measure, Information Processing and Management, 39, 45-65. https://doi.org/10.1016/S0306-4573(02)00021-3
Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer-Verlag.
Goto, M., Ishida, T., and Hirasawa, S. (2007), Statistical Evaluation of Measure and Distance on Document Classification Problems in Text Mining, IEEE International Conference on Computer and Information Technology, 674-679.
Goto, M., Ishida, T., Suzuki, M., and Hirasawa, S. (2008), Asymptotic Evaluation of Distance Measure on High Dimensional Vector Space in Text Mining, International Symposium on Information Theory and its Applications.
Hearst, M. A. (1999), Untangling text data mining, ACL '99 Proceedings, 3-10.
Hofmann, T. (1999), Probabilistic Latent Semantic Indexing, Proceeding of the 22nd International Conference on Research and Development in Information Retrieval, 50-57.
Manning, C. D., Raghavan, P., and Schuetze, H. (2008), Introduction to Information Retrieval, Cambridge University Press.
McCallum, A. and Nigam, K. (1998), A Comparison of Event Models for Naive Bayes Text Classification, Proceeding of AAAI-98 Workshop on Learning for Text Categorization, 41-48.
Mikawa, K., Ishida, T., and Goto, M. (2012), A Proposal of Extended Cosine Measure for Distance Metric Learning in Text Classification, Proceeding of 2011 IEEE International Conference on the Systems, Man, Cybernetics (SMC), 1741-1746.
Nagata, M. (1994), A Stochastic Japanese morphological analyzer using a forward-DP backward-A* best search algorithm, Proceeding of the 15th International Conference on Computational Linguistics, 201-207.
Salton, G. and Buckley, C. (1988), Term-Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24(5), 513-523. https://doi.org/10.1016/0306-4573(88)90021-0

Cited by

Metric Learning vol.9, pp.1, 2015, https://doi.org/10.2200/S00626ED1V01Y201501AIM030
Identifying Emerging Trends of Financial Business Method Patents vol.9, pp.9, 2017, https://doi.org/10.3390/su9091670
Business Model Mining: Analyzing a Firm's Business Model with Text Mining of Annual Report vol.13, pp.4, 2012, https://doi.org/10.7232/iems.2014.13.4.432

Industrial Engineering and Management Systems

An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)