• Title/Summary/Keyword: Text Sentiment Analysis

Search Result 233, Processing Time 0.026 seconds

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

Topics and Sentiment Analysis Based on Reviews of Omni-Channel Retailing

  • KIM, Soon-Hong;YOO, Byong-Kook
    • Journal of Distribution Science
    • /
    • v.19 no.4
    • /
    • pp.25-35
    • /
    • 2021
  • Purpose: This study aims to analyze the factors affecting customer satisfaction in the customer reviews of omni-channel, posted on Internet blogs, cafes, and YouTube using text mining analysis. Research, data, and Methodology: In this study, frequency analysis is performed and the LDA (Latent Dirichlet Allocation) is used to analyze social big data to respond to reviewers' reaction to the recently opened omni-channel shopping reviews by L Shopping Company. Additionally, based on the topic analysis, we conduct a sentiment analysis on purchase reviews and analyze the characteristics of each topic on the positive or negative sentiments of omni-channel app users. Results: As a result of a topic analysis, four main topics are derived: delivery and events, economic value, recommendations and convenience, and product quality and brand awareness. The emotional analysis reveals that the reviewers have many positive evaluations for price policy and product promotion, but negative evaluations for app use, delivery, and product quality. Conclusions: Retailers can establish customized marketing strategies by identifying the customer's major interests through text mining analysis. Additionally, the analysis of sentiment by subject becomes an important indicator for developing products and services that customers want by identifying areas that satisfy customers and areas that evoke negative reactions.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Unified Psycholinguistic Framework: An Unobtrusive Psychological Analysis Approach Towards Insider Threat Prevention and Detection

  • Tan, Sang-Sang;Na, Jin-Cheon;Duraisamy, Santhiya
    • Journal of Information Science Theory and Practice
    • /
    • v.7 no.1
    • /
    • pp.52-71
    • /
    • 2019
  • An insider threat is a threat that comes from people within the organization being attacked. It can be described as a function of the motivation, opportunity, and capability of the insider. Compared to managing the dimensions of opportunity and capability, assessing one's motivation in committing malicious acts poses more challenges to organizations because it usually involves a more obtrusive process of psychological examination. The existing body of research in psycholinguistics suggests that automated text analysis of electronic communications can be an alternative for predicting and detecting insider threat through unobtrusive behavior monitoring. However, a major challenge in employing this approach is that it is difficult to minimize the risk of missing any potential threat while maintaining an acceptable false alarm rate. To deal with the trade-off between the risk of missed catches and the false alarm rate, we propose a unified psycholinguistic framework that consolidates multiple text analyzers to carry out sentiment analysis, emotion analysis, and topic modeling on electronic communications for unobtrusive psychological assessment. The user scenarios presented in this paper demonstrated how the trade-off issue can be attenuated with different text analyzers working collaboratively to provide more comprehensive summaries of users' psychological states.

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

  • Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.818-833
    • /
    • 2021
  • To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.

A Sentiment Analysis of Customer Reviews on the Connected Car using Text Mining: Focusing on the Comparison of UX Factors between Domestic-Overseas Brands (텍스트 마이닝을 활용한 커넥티드 카 고객 리뷰의 감성 분석: 국내-해외 브랜드간 UX 요인 비교를 중심으로)

  • Youjung Shin;Junho Choi;Sung Woo Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.517-528
    • /
    • 2023
  • The purpose of this study is to analyze and compare UX factors of connectivity systems of domestic and overseas car brands. Using a text mining analysis, UX factors of domestic and overseas brands were compared through positive-negative sentiment index. After collecting 120,000 reviews on Hyundai Motor Group (Hyundai, Kia, Genesis) and 190,000 on Tesla, BMW, and Mercedes, pre-processing was performed. Keywords were classified into 11 UX factors in 3 dimensions of the system connection, information, and service. For domestic brands, sentiment index for 'safety' was the highest. For overseas brands, 'entertainment' was the most positive UX factor.

Sentiment analysis of nuclear energy-related articles and their comments on a portal site in Rep. of Korea in 2010-2019

  • Jeong, So Yun;Kim, Jae Wook;Kim, Young Seo;Joo, Han Young;Moon, Joo Hyun
    • Nuclear Engineering and Technology
    • /
    • v.53 no.3
    • /
    • pp.1013-1019
    • /
    • 2021
  • This paper reviewed the temporal changes in the public opinions on nuclear energy in Korea with a big data analysis of nuclear energy-related articles and their comments posted on the portal site NAVER. All articles that included at least one of "nuclear energy," "nuclear power plant (NPP)," "nuclear power phase-out," or "anti-nuclear" in their titles or main text were extracted from those posted on NAVER in January 2010-December 2019. First, we performed annual word frequency analysis to identify what words had appeared most frequently in the articles. For that period, the most frequent words were "NPP," "nuclear energy," and "energy." In addition, "safety" has remained in the upper ranks since the Fukushima NPP accident. Then, we performed sentiment analysis of the pre-processed articles. The sentiment analysis showed that positive-tone articles have been reported more frequently than negativetone over the entire analysis period. Last, we performed sentiment analysis of the comments on the articles to examine the public's intention regarding nuclear issues. The analysis showed that the number of negative comments to articles each month-irrespective of positive or negative tone-was always larger than that of positive comments over the entire analysis period.

A Study on Social Perceptions of Public Libraries Utilizing the sentiment analysis

  • Noh, Younghee;Kim, Dongseok
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.12 no.4
    • /
    • pp.41-65
    • /
    • 2022
  • This study would understand the overall perception of our society about public libraries, analyzing the texts related to public libraries, utilizing the semantic connection network & sentiment analysis. For this purpose, this study collected data from the last five years with keywords, 'Library' and 'Lifelong Learning Center' from January 1, 2016 through November 30, 2020 through the blogs and cafés of major domestic portal sites. With the collected data, text mining, centrality of keywords, network structure, structural equipotentiality, and sensitivity analyses were conducted. As a result of the analysis, First, 'reading' and 'book' were identified as representative keywords that form the social perception of public libraries. Second, it turned out that there were keywords related to the use of the library and the untact service due to the recent spread of COVID-19. Third, in seeking a plan for the development of public libraries through the keywords drawn to have positive meanings, it is necessary to create continuous services that can form a new image of the library, breaking away from the existing fixed role and image of the library and increase the convenience of use. Fourth, facilities and facilities for library services were recognized from a neutral point of view. Fifth, the spread of infectious diseases, social distancing, and temporary closure and closure of libraries are negatively related to public libraries, and awareness of librarians has been identified as negative keywords.

Text Categorization with Improved Deep Learning Methods

  • Wang, Xingfeng;Kim, Hee-Cheol
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.106-113
    • /
    • 2018
  • Although deep learning methods of convolutional neural networks (CNNs) and long-/short-term memory (LSTM) are widely used for text categorization, they still have certain shortcomings. CNNs require that the text retain some order, that the pooling lengths be identical, and that collateral analysis is impossible; In case of LSTM, it requires the unidirectional operation and the inputs/outputs are very complex. Against these problems, we thus improved these traditional deep learning methods in the following ways: We created collateral CNNs accepting disorder and variable-length pooling, and we removed the input/output gates when creating bidirectional LSTMs. We have used four benchmark datasets for topic and sentiment classification using the new methods that we propose. The best results were obtained by combining LTSM regional embeddings with data convolution. Our method is better than all previous methods (including deep learning methods) in terms of topic and sentiment classification.

Developing Sentimental Analysis System Based on Various Optimizer

  • Eom, Seong Hoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.100-106
    • /
    • 2021
  • Over the past few decades, natural language processing research has not made much. However, the widespread use of deep learning and neural networks attracted attention for the application of neural networks in natural language processing. Sentiment analysis is one of the challenges of natural language processing. Emotions are things that a person thinks and feels. Therefore, sentiment analysis should be able to analyze the person's attitude, opinions, and inclinations in text or actual text. In the case of emotion analysis, it is a priority to simply classify two emotions: positive and negative. In this paper we propose the deep learning based sentimental analysis system according to various optimizer that is SGD, ADAM and RMSProp. Through experimental result RMSprop optimizer shows the best performance compared to others on IMDB data set. Future work is to find more best hyper parameter for sentimental analysis system.