Browse > Article
http://dx.doi.org/10.9728/dcs.2013.14.4.581

Sentiment Classification of Movie Reviews using Levenshtein Distance  

Ahn, Kwang-Mo (충북대학교 컴퓨터공학과)
Kim, Yun-Suk (충북대학교 컴퓨터공학과)
Kim, Young-Hoon (청강문화산업대학교 모바일스쿨)
Seo, Young-Hoon (충북대학교 컴퓨터공학과)
Publication Information
Journal of Digital Contents Society / v.14, no.4, 2013 , pp. 581-587 More about this Journal
Abstract
In this paper, we propose a method of sentiment classification which uses Levenshtein distance. We generate BOW(Bag-Of-Word) applying Levenshtein daistance in sentiment features and used it as the training set. Then the machine learning algorithms we used were SVMs(Support Vector Machines) and NB(Naive Bayes). As the data set, we gather 2,385 reviews of movies from an online movie community (Daum movie service). From the collected reviews, we pick sentiment words up manually and sorted 778 words. In the experiment, we perform the machine learning using previously generated BOW which was applied Levenshtein distance in sentiment words and then we evaluate the performance of classifier by a method, 10-fold-cross validation. As the result of evaluation, we got 85.46% using Multinomial Naive Bayes as the accuracy when the Levenshtein distance was 3. According to the result of the experiment, we proved that it is less affected to performance of the classification in spelling errors in documents.
Keywords
Sentiment Classfication; Opinion Mining; Levenshtein Distance; SVMs; Naive Bayes;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 C. Lee, D. Choi, S. Kim, J. Kang, "Classification and Analysis of Emotion in Korean Microblog Texts," KIISE : Databases, Vol.40, No.3, 2013 (in Korean)
2 V. I. Levenshtein, "Binary Codes Capable of Correcting Deletions, Insertions, and Reversals," Soviet Physics Doklady, Vol.10, pp.707-710, 1965
3 P. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," In Proceeding of the 40th Annual Meeting of the Association for Computational Linguistic, Philadelphia, pp.417-424, 2002
4 T. Nasukawa, J. Yi, "Sentiment Analysis: Capturing Favorability using Natural Language Processing," In Proceedings of the K-CAP-03, 2nd International Co-nference on Knowledge Capure, pp.70-77, 2003
5 J. Yi, W. Niblack, "Sentiment Mining in Web-Fountain," International Conference on Data Engineering (ICDE'05), pp.1073-1083, 2005
6 N. Godbole, M. Srinivasaiah, S. Skiena, "Large-Scale Sentiment Analysis for News and Blogs," Intel AAAI Conference on Weblogs and Social Media (ICWSM 2007), 2007
7 M. Gamon, A. Aue, S. Corston-Oliver, E. Ringger, "Pulse: Mining Customer Opinions from Free Text," In Lecture Notes in Computer Science, Vol.3646, Springer Verlag (IDA 2005), 2005
8 X. Ding, B. Liu, "The Utility of Linguistic Rules in Opinion Mining," pp.811-812, SIGIR2007, 2007
9 H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, C. Watkins, "Text Classification using String Kernels," Journal of Machine Learning Research, Vol.2, pp.419-444, 2002
10 H. Kim, S. Lee, "The Phoneme Kernel Technique based on Support Vector Machine for Emotion Classification of Mobile Texts," Journal of KIISE : Software and Application, Vol.40, No.6, pp.350-355, 2013 (in Korean)   과학기술학회마을
11 J. Kim, S. Lee, H. Yong, "Automatic Classification Scheme of Opinions Written in Korean," Journal of KIISE : Databases, Vol.38, No.6, pp.423-428, 2011   과학기술학회마을
12 E. Boiy, P. Hens, K. Deschacht, M. Moens, "Automatic Sentiment Analysis in On-line Text," ELPUB2007 Conference on Electronic Publishing, 2007
13 S. Kim, S. Park, S. Park, S. Lee, K. Kim, "A Syllable Kernel based Sentiment Classification for Movie Reviews," Journal of Korean Institute of Intelligent Systems, Vol.20, No.2, pp.202-207, 2010 (in Korean)   과학기술학회마을   DOI   ScienceOn