[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.1633/JISTaP.2013.1.1.1

Domain Adaptation for Opinion Classification: A Self-Training Approach

Yu, Ning (School of Library and Information Science University of Kentucky)

Publication Information

Journal of Information Science Theory and Practice / v.1, no.1, 2013 , pp. 10-26 More about this Journal

Abstract

Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Keywords

Domain adaptation; Opinion classification; Self-training; Semi-supervised learning; Sentiment analysis; Machine learning;

Citations & Related Records

Reference

1	Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Transactions on Information Systems, 26(3).
2	Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. In Proceedings of International Conference Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria, 21-23 September 2005.
3	Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 440-447). Association for Computational Linguistics.
4	Bloom, K., Garg, N., & Argamon, S. (2007). Extracting appraisal expressions. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), Rochester, NY (pp. 308-315). Morristown, NJ: Association for Computational Linguistics.
5	Bollen, J., Mao, H., & Zeng, X. J. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8. DOI ScienceOn
6	Breck, E., Choi, Y., & Cardie, C. (2007). Identifying expressions of opinion in context. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 6-12 January 2007 (pp. 2683-2688).
7	Chesley, P., Vincent, B., Xu, L., & Srihari, R. K. (2006). Using verbs and adjectives to automatically classify blog sentiment. In Proceedings of AAAICAAW-06, the Spring Symposia on Computational Approaches to Analyzing Weblogs, Stanford University, CA., 27-29 March 2006, Menlo Park, CA: AAAI Press.
8	Conrad, J. G., & Schilder, F. (2007). Opinion mining in legal blogs. In Proceedings of the 11th International Conference on Artificial Intelligence and Law, Stanford, CA (pp. 231-236). New York, NY: ACM.
9	Constant, N., Davis, C., Potts, C., & Schwarz, F. (2009). The pragmatics of expressive content: Evidence from large corpora. Sprache und Datenverarbeitung 33, 5-21.
10	Fillmore, C. J., & Baker, C. F. (2001). Frame semantics for text understanding. In Proceedings of WordNet and Other Lexical Resources Workshop, Pittsburgh, PA.
11	Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations news letter, 11(1), 10-18. DOI
12	Gamon, M. (2004). Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23-27 August 2004. Stroudsburg, PA, USA: Association for Computational Linguistics.
13	Hatzivassiloglou, V., & Wiebe, J. (2000). Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of the 18th Conference on Computational Linguistics, SaarbruBcken, Germany, 31 July-4 August 2000 (pp. 299-305). Stroudsburg, PA, USA: Association for Computational Linguistics.
14	He, Y., & Zhou, D. (2011). Self-training from labeled features for sentiment analysis. Information Processing and Management, 47(4), 606-616. DOI ScienceOn
15	Kessler, J. S., Eckert, M., Clark, L., & Nicolov, N. (2010). The ICWSM 2010 JDPA sentiment corpus for the automotive domain. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media Data Workshop Challenge (ICWSM-DWC), Washington, D.C., USA.
16	Ku, L. W., & Chen, H. H. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850. DOI ScienceOn
17	Levin, B. (1993). English verb classes and alternations. Chicago, IL: University of Chicago Press.
18	Marcus, M. P., Santorini, B., Marcinkiewicz, M. A., & Taylor, A. (1999). Treebank-3. Linguistic Data Consortium, Philadelphia.
19	Nigam, K., & Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management (pp. 86-93). New York, NY, USA: ACM.
20	Ounis, I., Macdonald, C., & Soboroff, I. (2008). Overview of the TREC-2008 Blog Track. In Proceedings of the 17th Text REtrieval Conference (TREC 2008).
21	Ounis, I., Rijke, M. D., Macdonald, C., Mishne, G., & Soboroff, I. (2007). Overview of the TREC-2006 Blog track. In Proceedings of the 15th Text REtrieval Conference.
22	Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain, 21- 26 July 2004, (pp. 271-278). Stroudsburg, PA,USA: Association for Computational Linguistics.
23	Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, 6-7 July 2002, (pp. 79-86). Stroudsburg, PA, USA: Association for Computational Linguistics.
24	Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. New York, NY: Bloomsbury Press.
25	Potts, C., & Schwarz, F. (2008). Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora. Ms.: UMass Amherst.
26	Riloff, E., & Jones, R. (1999). Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, Orlando, FL (pp. 474-479). Menlo Park, CA, USA: American Association for Artificial Intelligence.
27	Stone, P. J. (1997). Thematic text analysis: New agendas agendas for analyzing text content. In C. Roberts (Ed.), Text analysis for the social sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
28	Tan, S., Cheng, X., Wang, Y., & Xu, H. (2009). Adapting Na1 Bve Bayes to domain adaptation for sentiment analysis. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, (pp. 337-349).
29	Wiebe, J., Bruce, R., & O'Hara, T. P. (1999). Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, MD, 20-26 June 1999 (pp. 246-253). Stroudsburg, PA, USA: Association for Computational Linguistics.
30	Tsou, B. K. Y., Yuen, R. W. M., Kwong, O. Y., Lai, T. B. Y., & Wong, W. L. (2005). Polarity classification of celebrity coverage in the Chinese press. In Proceedings of the International Conference on Intelligence Analysis, McLean, VA, 2-4 May 2005.
31	Wiebe, J., & Riloff, E. (2005). Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005), Mexico City, Mexico, 13-19 February 2005 (pp. 486-497). Heidelberg, Berlin: Springer-Verlag.
32	Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning subjective language. Computational Linguistics, 30(3), 277-308. DOI ScienceOn
33	Wilson, T., Pierce, D. R., & Wiebe, J. (2003). Identifying opinionated sentences. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations, Edmonton, Canada (pp. 33-34). Stroudsburg, PA, USA: Association for Computational Linguistics.
34	Yang, K., Yu, N., & Zhang, H. (2007). WIDIT in TREC- 2007 Blog track: Combining lexicon-based methods to detect opinionated blogs. In Proceedings of the 16th Text REtrieval Conference (TREC 2007).
35	Yu, N., & KuBbler, S. (2010). Semi-supervised learning for opinion detection. In Proceeding of the IEEE/ WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, Toronto, ON, Canada, 31 August - 3 September 2010 (pp. 249-252). Stroudsburg, PA, USA: Association for Computa-tional Linguistics.
36	Zhu, X. (2008). Semi-supervised learning literature survey: Department of Computer Sciences, University of Wisconsin, Madison. (Technical Report No. 1530).
37	Yu, N., & KuBbler, S. (2011). Filling the gap: Semisupervised learning for opinion detection across domains. In Proceeding of the Fifteenth Conference on Computational Natural Language Learning (CoNLL 2011), Portland, OR, 23-24 June 2011 (pp. 200-209).
38	Yu, N., KuBbler, S., Herring, J., Hsu, Y. Y., Israel, R., & Smiley, C. (2012). LASSA: Emotion detection via information fusion. Biomedical Informatics Insights, 5(Suppl. 1), 71-76.
39	Zhang, W., & Yu, C. (2007). UIC at TREC 2007 Blog track. In Proceedings of the 16th Text REtrieval Conference (TREC 2007).

6	D. A. Devyatkin. (2014) Scientific and Technical Information Processing Intelligent analysis of manifestations of verbal aggressiveness in network community texts / 41 (6) , 377
10	Ning Yu. (2014) Journal of the Association for Information Science and Technology Exploring Co-training strategies for opinion detection / 65 (10) , 2098