• Title/Summary/Keyword: sentiment word

Search Result 148, Processing Time 0.049 seconds

Sentiment analysis of nuclear energy-related articles and their comments on a portal site in Rep. of Korea in 2010-2019

  • Jeong, So Yun;Kim, Jae Wook;Kim, Young Seo;Joo, Han Young;Moon, Joo Hyun
    • Nuclear Engineering and Technology
    • /
    • v.53 no.3
    • /
    • pp.1013-1019
    • /
    • 2021
  • This paper reviewed the temporal changes in the public opinions on nuclear energy in Korea with a big data analysis of nuclear energy-related articles and their comments posted on the portal site NAVER. All articles that included at least one of "nuclear energy," "nuclear power plant (NPP)," "nuclear power phase-out," or "anti-nuclear" in their titles or main text were extracted from those posted on NAVER in January 2010-December 2019. First, we performed annual word frequency analysis to identify what words had appeared most frequently in the articles. For that period, the most frequent words were "NPP," "nuclear energy," and "energy." In addition, "safety" has remained in the upper ranks since the Fukushima NPP accident. Then, we performed sentiment analysis of the pre-processed articles. The sentiment analysis showed that positive-tone articles have been reported more frequently than negativetone over the entire analysis period. Last, we performed sentiment analysis of the comments on the articles to examine the public's intention regarding nuclear issues. The analysis showed that the number of negative comments to articles each month-irrespective of positive or negative tone-was always larger than that of positive comments over the entire analysis period.

Burmese Sentiment Analysis Based on Transfer Learning

  • Mao, Cunli;Man, Zhibo;Yu, Zhengtao;Wu, Xia;Liang, Haoyuan
    • Journal of Information Processing Systems
    • /
    • v.18 no.4
    • /
    • pp.535-548
    • /
    • 2022
  • Using a rich resource language to classify sentiments in a language with few resources is a popular subject of research in natural language processing. Burmese is a low-resource language. In light of the scarcity of labeled training data for sentiment classification in Burmese, in this study, we propose a method of transfer learning for sentiment analysis of a language that uses the feature transfer technique on sentiments in English. This method generates a cross-language word-embedding representation of Burmese vocabulary to map Burmese text to the semantic space of English text. A model to classify sentiments in English is then pre-trained using a convolutional neural network and an attention mechanism, where the network shares the model for sentiment analysis of English. The parameters of the network layer are used to learn the cross-language features of the sentiments, which are then transferred to the model to classify sentiments in Burmese. Finally, the model was tuned using the labeled Burmese data. The results of the experiments show that the proposed method can significantly improve the classification of sentiments in Burmese compared to a model trained using only a Burmese corpus.

A Crowdsourcing-based Emotional Words Tagging Game for Building a Polarity Lexicon in Korean (한국어 극성 사전 구축을 위한 크라우드소싱 기반 감성 단어 극성 태깅 게임)

  • Kim, Jun-Gi;Kang, Shin-Jin;Bae, Byung-Chull
    • Journal of Korea Game Society
    • /
    • v.17 no.2
    • /
    • pp.135-144
    • /
    • 2017
  • Sentiment analysis refers to a way of analyzing the writer's subjective opinions or feelings through text. For effective sentiment analysis, it is essential to build emotional word polarity lexicon. This paper introduces a crowdsourcing-based game that we have developed for efficiently building a polarity lexicon in Korean. First, we collected a corpus from the relating Internet communities using a crawler, and we classified them into words using the Twitter POS analyzer. These POS-tagged words are provided as a form of mobile platform based tagging game in which the players voluntarily tagged the polarities of the words, and then the result was collected into the database. So far we have tagged the polarities of about 1200 words. We expect that our research can contribute to the Korean sentiment analysis research especially in the game domain by collecting more emotional word data in the future.

Sentiment Analysis using Robust Parallel Tri-LSTM Sentence Embedding in Out-of-Vocabulary Word (Out-of-Vocabulary 단어에 강건한 병렬 Tri-LSTM 문장 임베딩을 이용한 감정분석)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.16-24
    • /
    • 2021
  • The exiting word embedding methodology such as word2vec represents words, which only occur in the raw training corpus, as a fixed-length vector into a continuous vector space, so when mapping the words incorporated in the raw training corpus into a fixed-length vector in morphologically rich language, out-of-vocabulary (OOV) problem often happens. Even for sentence embedding, when representing the meaning of a sentence as a fixed-length vector by synthesizing word vectors constituting a sentence, OOV words make it challenging to meaningfully represent a sentence into a fixed-length vector. In particular, since the agglutinative language, the Korean has a morphological characteristic to integrate lexical morpheme and grammatical morpheme, handling OOV words is an important factor in improving performance. In this paper, we propose parallel Tri-LSTM sentence embedding that is robust to the OOV problem by extending utilizing the morphological information of words into sentence-level. As a result of the sentiment analysis task with corpus in Korean, we empirically found that the character unit is better than the morpheme unit as an embedding unit for Korean sentence embedding. We achieved 86.17% accuracy on the sentiment analysis task with the parallel bidirectional Tri-LSTM sentence encoder.

Sentiment Analysis System Using Stanford Sentiment Treebank (스탠포드 감성 트리 말뭉치를 이용한 감성 분류 시스템)

  • Lee, Songwook
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.39 no.3
    • /
    • pp.274-279
    • /
    • 2015
  • The main goal of this research is to build a sentiment analysis system which automatically determines user opinions of the Stanford Sentiment Treebank in terms of three sentiments such as positive, negative, and neutral. Firstly, sentiment sentences are POS tagged and parsed to dependency structures. All nodes of the Treebank and their polarities are automatically extracted from the Treebank. We train two Support Vector Machines models. One is for a node level classification and the other is for a sentence level. We have tried various type of features such as word lexicons, POS tags, Sentiment lexicons, head-modifier relations, and sibling relations. Though we acquired 74.2% in accuracy on the test set for 3 class node level classification and 67.0% for 3 class sentence level classification, our experimental results for 2 class classification are comparable to those of the state of art system using the same corpus.

Construction of Consumer Confidence index based on Sentiment analysis using News articles (뉴스기사를 이용한 소비자의 경기심리지수 생성)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.1-27
    • /
    • 2017
  • It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.

A Sentence Sentiment Classification reflecting Formal and Informal Vocabulary Information (형식적 및 비형식적 어휘 정보를 반영한 문장 감정 분류)

  • Cho, Sang-Hyun;Kang, Hang-Bong
    • The KIPS Transactions:PartB
    • /
    • v.18B no.5
    • /
    • pp.325-332
    • /
    • 2011
  • Social Network Services(SNS) such as Twitter, Facebook and Myspace have gained popularity worldwide. Especially, sentiment analysis of SNS users' sentence is very important since it is very useful in the opinion mining. In this paper, we propose a new sentiment classification method of sentences which contains formal and informal vocabulary such as emoticons, and newly coined words. Previous methods used only formal vocabulary to classify sentiments of sentences. However, these methods are not quite effective because internet users use sentences that contain informal vocabulary. In addition, we construct suggest to construct domain sentiment vocabulary because the same word may represent different sentiments in different domains. Feature vectors are extracted from the sentiment vocabulary information and classified by Support Vector Machine(SVM). Our proposed method shows good performance in classification accuracy.

Sensitivity of abacus and Chasdaq in the Chinese stock market through analysis of Weibo sentiment related to Corona-19 (코로나-19관련 웨이보 정서 분석을 통한 중국 주식시장의 주판 및 차스닥의 민감도 예측 기법)

  • Li, Jiaqi;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Investor mood from social media is gaining increasing attention for leading a price movement in stock market. Based on the behavioral finance theory, this study argues that sentiment extracted from social media using big data technique can predict a real-time (short-run) price momentum in Chinese stock market. Collecting Sina Weibo posts that related to COVID-19 using keyword method, a daily influential weighted sentiment factors is extracted from the sizable raw data of over 2 millions of posts. We examine one supervised and 4 unsupervised sentiment analysis model, and use the best performed word-frequency and BiLSTM mdoel. The test result shows a similar movement between stock price change and sentiment factor. It indicates that public mood extracted from social media can in some extent represent the investors' sentiment and make a difference in stock market fluctuation when people are concentrating on a special events that can cause effect on the stock market.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Comparative Study of Tokenizer Based on Learning for Sentiment Analysis (고객 감성 분석을 위한 학습 기반 토크나이저 비교 연구)

  • Kim, Wonjoon
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.3
    • /
    • pp.421-431
    • /
    • 2020
  • Purpose: The purpose of this study is to compare and analyze the tokenizer in natural language processing for customer satisfaction in sentiment analysis. Methods: In this study, a supervised learning-based tokenizer Mecab-Ko and an unsupervised learning-based tokenizer SentencePiece were used for comparison. Three algorithms: Naïve Bayes, k-Nearest Neighbor, and Decision Tree were selected to compare the performance of each tokenizer. For performance comparison, three metrics: accuracy, precision, and recall were used in the study. Results: The results of this study are as follows; Through performance evaluation and verification, it was confirmed that SentencePiece shows better classification performance than Mecab-Ko. In order to confirm the robustness of the derived results, independent t-tests were conducted on the evaluation results for the two types of the tokenizer. As a result of the study, it was confirmed that the classification performance of the SentencePiece tokenizer was high in the k-Nearest Neighbor and Decision Tree algorithms. In addition, the Decision Tree showed slightly higher accuracy among the three classification algorithms. Conclusion: The SentencePiece tokenizer can be used to classify and interpret customer sentiment based on online reviews in Korean more accurately. In addition, it seems that it is possible to give a specific meaning to a short word or a jargon, which is often used by users when evaluating products but is not defined in advance.