Browse > Article

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews  

Shin, Jun-Soo (강원대학교 컴퓨터정보통신공학과)
Kim, Hark-Soo (강원대학교 컴퓨터정보통신공학과)
Abstract
Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.
Keywords
Sentiment categorization of customer reviews; Feature extraction based on longest matching; Eojeol pattern; Phoneme pattern;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 http:νwww.sejong.or.kr
2 S.M. Kim and E. Hovy, "Determining the Sentiment of Opinions," In Proceedings of the COLING conference, pp.1367-1373, 2004.
3 http://shopping.naver.com
4 J. Hwang and Y. Ko, "A Korean Document Sentiment Classification System based on Semantic Properties of Sentirnent Words," Journal of KIISE : Software and Applications, vol.37, no.4, pp.317-322, Apr. 2010. (in Korean)   과학기술학회마을
5 B. Pang, L. Lee and S. Vaithyanathan, "Thumbs up? Sentirnent CIassification Using Machine Learning Techniques," In Proceedings od the EMNLP, pp.79-86, 2002.
6 A. Esuli, F. Sebastiani, "PageRanking WordNet Synsets: An Application to Opinion Mining," In Proceedings of the ACL, pp.424-431, 2007.
7 S. Kim, S. Park, S. Park, S. Lee and K. Kim, "A Syllable Kernel based Sentiment Classification for Movie Reviews," Journal of KIISS, vol.20, no.2, pp.202-207, Jun. 2010. (jn Korean)   과학기술학회마을   DOI   ScienceOn
8 M. Bae and J. Cha "Comments Classification System using Topic Signature," Journal of KIISE Sofrware and Applications, vol.35, no.12, pp.774- 779, Dec. 2008. (in Korean)   과학기술학회마을
9 H. Yune, H. Kim and J. Chang, "An Efficient Search Method of Product Reviews using Opinion Mining Techniques," Journal of KIISE : Computing Practices and Letter, vol.16, no.2, pp.222-226, Feb. 2010. (in Korean)   과학기술학회마을
10 J. Shin, J. Lee and H. Kim, "Sentiment Categorization of Korean Customer Reviews using CRFs," Proc. HCLT(Human & Cognitive Language Technology) vol.20, no. 1(C), pp.58-62, 2008. (in Korean)
11 J. Myung, D. Lee and S. Lee, "A Korean Product Review Analysis System using a Semi-Automatically Constructed Semantic Dìctionary," Journal of KIISE : Software and Applications, vol.35, no.6, pp.392-403, Jun. 2008. (in Korean)   과학기술학회마을