• Title/Summary/Keyword: Opinion classification

Search Result 157, Processing Time 0.028 seconds

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • v.1 no.1
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Opinion Extraction based on Syntactic Pieces

  • Aoki, Suguru;Yamamoto, Kazuhide
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.76-85
    • /
    • 2007
  • This paper addresses a task of opinion extraction from given documents and its positive/negative classification. We propose a sentence classification method using a notion of syntactic piece. Syntactic piece is a minimum unit of structure, and is used as an alternative processing unit of n-gram and whole tree structure. We compute its semantic orientation, and classify opinion sentences into positive or negative. We have conducted an experiment on more than 5000 opinion sentences of multiple domains, and have proven that our approach attains high performance at 91% precision.

  • PDF

Efficient Retrieval of Short Opinion Documents Using Learning to Rank (기계학습을 이용한 단문 오피니언 문서의 효율적 검색 기법)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.117-126
    • /
    • 2013
  • Recently, as Social Network Services(SNS), such as Twitter, Facebook, are becoming more popular, much research has been doing on opinion mining. However, current related researches are mostly focused on sentiment classification or feature selection, but there were few studies about opinion document retrieval. In this paper, we propose a new retrieval method of short opinion documents. Proposed method utilizes previous sentiment classification methodology, and applies several features of documents for evaluating the quality of the opinion documents. For generating the retrieval model, we adopt Learning-to-rank technique and integrate sentiment classification model to Learning-to-rank. Experimental results show that proposed method can be applied successfully in opinion search.

Feature Weighting for Opinion Classification of Comments on News Articles (뉴스 댓글의 감정 분류를 위한 자질 가중치 설정)

  • Lee, Kong-Joo;Kim, Jae-Hoon;Seo, Hyung-Won;Rhyu, Keel-Soo
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.34 no.6
    • /
    • pp.871-879
    • /
    • 2010
  • In this paper, we present a system that classifies comments on a news article into a user opinion called a polarity (positive or negative). The system is a kind of document classification system for comments and is based on machine learning techniques like support vector machine. Unlike normal documents, comments have their body that can influence classifying their opinions as polarities. In this paper, we propose a feature weighting scheme using such characteristics of comments and several resources for opinion classification. Through our experiments, the weighting scheme have turned out to be useful for opinion classification in comments on Korean news articles. Also Korean character n-grams (bigram or trigram) have been revealed to be helpful for opinion classification in comments including lots of Internet words or typos. In the future, we will apply this scheme to opinion analysis of comments of product reviews as well as news articles.

An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency (단어패턴 빈도를 이용한 단문 오피니언 문서 분류기법의 실험적 평가)

  • Chang, Jae-Young;Kim, Ilmin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.5
    • /
    • pp.243-253
    • /
    • 2012
  • An opinion mining technique which was developed from document classification in area of data mining now becomes a common interest in domestic as well as international industries. The core of opinion mining is to decide precisely whether an opinion document is a positive or negative one. Although many related approaches have been previously proposed, a classification accuracy was not satisfiable enough to applying them in practical applications. A opinion documents written in Korean are not easy to determine a polarity automatically because they often include various and ungrammatical words in expressing subjective opinions. Proposed in this paper is a new approach of classification of opinion documents, which considers only a frequency of word patterns and excludes the grammatical factors as much as possible. In proposed method, we express a document into a bag of words and then apply a learning algorithm using a frequency of word patterns, and finally decide the polarity of the document using a score function. Additionally, we also present the experiment results for evaluating the accuracy of the proposed method.

Automatic Retrieval of SNS Opinion Document Using Machine Learning Technique (기계학습을 이용한 SNS 오피니언 문서의 자동추출기법)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.5
    • /
    • pp.27-35
    • /
    • 2013
  • Recently, as Social Network Services(SNS) are becoming more popular, much research has been doing on analyzing public opinions from SNS. One of the most important tasks for solving such a problem is to separate opinion(subjective) documents from others(e.g. objective documents) in SNS. In this paper, we propose a new method of retrieving the opinion documents from Twitter. The reason why it is not easy to search or classify the opinion documents in Twitter is due to a lack of publicly available Twitter documents for training. To tackle the problem, at first, we build a machine-learned model for sentiment classification using the external documents similar to Twitter, and then modify the model to separate the opinion documents from Twitter. Experimental results show that proposed method can be applied successfully in opinion classification.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Empirical Sentiment Classification Using Psychological Emotions and Social Web Data (심리학적 감정과 소셜 웹 자료를 이용한 감성의 실증적 분류)

  • Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.5
    • /
    • pp.563-569
    • /
    • 2012
  • The studies of opinion mining or sentiment analysis have been the focus with social web proliferation. Sentiment analysis requires sentiment resources to decide its polarity. In the existing sentiment analysis, they have been built resources designed with intensity of sentiment polarity and decided polarity of opinion using the ones. In this paper, I will present sentiment categories for not only polarity of opinion but also the basis of positive/negative opinion. I will define psychological emotions to primary sentiments for the reasonable classification. And I will extract the informations of sentiment from social web texts for the actual distribution of sentiments in social web. Re-classifying primary sentiments based on extracted sentiment information, I will organize sentiment categories for the social web. In this paper, I will present 23 categories of sentiment by using proposed method.

Sentiment Classification considering Korean Features (한국어 특성을 고려한 감성 분류)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.3
    • /
    • pp.449-458
    • /
    • 2010
  • As occasion demands to obtain efficient information from many documents and reviews on the Internet in many kinds of fields, automatic classification of opinion or thought is required. These automatic classification is called sentiment classification, which can be divided into three steps, such as subjective expression classification to extract subjective sentences from documents, sentiment classification to classify whether the polarity of documents is positive or negative, and strength classification to classify whether the documents have weak polarity or strong polarity. The latest studies in Opinion Mining have used N-gram words, lexical phrase pattern, and syntactic phrase pattern, etc. They have not used single word as feature for classification. Especially, patterns have been used frequently as feature because they are more flexible than N-gram words and are also more deterministic than single word. Theses studies are mainly concerned with English, other studies using patterns for Korean are still at an early stage. Although Korean has a slight difference in the meaning between predicates by the change of endings, which is 'Eomi' in Korean, of declinable words, the earlier studies about Korean opinion classification removed endings from predicates only to extract stems. Finally, this study introduces the earlier studies and methods using pattern for English, uses extracted sentimental patterns from Korean documents, and classifies polarities of these documents. In this paper, it also analyses the influence of the change of endings on performances of opinion classification.

  • PDF

User Information Collection of Weibo Network Public Opinion under Python

  • Changhua Liu;Yanlin Han
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.310-322
    • /
    • 2023
  • Although the network environment is gradually improving, the virtual nature of the network is still the same fact, which has brought a great influence on the supervision of Weibo network public opinion dissemination. In order to reduce this influence, the user information of Weibo network public opinion dissemination is studied by using Python technology. Specifically, the 2019 "Ethiopian air crash" event was taken as the research subject, the relevant data were collected by using Python technology, and the data from March 10, 2019 to June 20, 2019 were constructed by using the implicit Dirichlet distribution topic model and the naive Bayes classifier. The Weibo network public opinion user identity graph model under the "Ethiopian air crash" on June 20 found that the public opinion users of ordinary netizens accounted for the highest proportion and were easily influenced by media public opinion users. This influence is not limited to ordinary netizens. Public opinion users have an influence on other types of public opinion users. That is to say, in the network public opinion space of the "Ethiopian air crash," media public opinion users play an important role in the dissemination of network public opinion information. This research can lay a foundation for the classification and identification of user identity information types under different public opinion life cycles. Future research can start from the supervision of public opinion and the type of user identity to improve the scientific management and control of user information dissemination through Weibo network public opinion.