• Title/Summary/Keyword: lexicon-based sentiment analysis

Search Result 33, Processing Time 0.022 seconds

A Method of Constructing Large-Scale Train Set Based on Sentiment Lexicon for Improving the Accuracy of Deep Learning Model (딥러닝 모델의 정확도 향상을 위한 감성사전 기반 대용량 학습데이터 구축 방안)

  • Choi, Min-Seong;Park, Sang-Min;On, Byung-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.106-111
    • /
    • 2018
  • 감성분석(Sentiment Analysis)은 텍스트에 나타난 감성을 분석하는 기술로 자연어 처리 분야 중 하나이다. 한국어 텍스트를 감성분석하기 위해 다양한 기계학습 기법이 많이 연구되어 왔으며 최근 딥러닝의 발달로 딥러닝 기법을 이용한 감성분석도 활발해지고 있다. 딥러닝을 이용해 감성분석을 수행할 경우 좋은 성능을 얻기 위해서는 충분한 양의 학습데이터가 필요하다. 하지만 감성분석에 적합한 학습데이터를 얻는 것은 쉽지 않다. 본 논문에서는 이와 같은 문제를 해결하기 위해 기존에 구축되어 있는 감성사전을 활용한 대용량 학습데이터 구축 방안을 제안한다.

  • PDF

Multimodal Sentiment Analysis Using Review Data and Product Information (리뷰 데이터와 제품 정보를 이용한 멀티모달 감성분석)

  • Hwang, Hohyun;Lee, Kyeongchan;Yu, Jinyi;Lee, Younghoon
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.15-28
    • /
    • 2022
  • Due to recent expansion of online market such as clothing, utilizing customer review has become a major marketing measure. User review has been used as a tool of analyzing sentiment of customers. Sentiment analysis can be largely classified with machine learning-based and lexicon-based method. Machine learning-based method is a learning classification model referring review and labels. As research of sentiment analysis has been developed, multi-modal models learned by images and video data in reviews has been studied. Characteristics of words in reviews are differentiated depending on products' and customers' categories. In this paper, sentiment is analyzed via considering review data and metadata of products and users. Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Self Attention-based Multi-head Attention models and Bidirectional Encoder Representation from Transformer (BERT) are used in this study. Same Multi-Layer Perceptron (MLP) model is used upon every products information. This paper suggests a multi-modal sentiment analysis model that simultaneously considers user reviews and product meta-information.

WellnessWordNet: A Word Net for Unconstrained Subjective Well-Being Monitor ing Based on Unstructured Data and Contextual Polarity (웰니스워드넷: 비정형데이터와 상황적 긍부정성에 기반하여 주관적 웰빙 상태를 무구속적으로 모니터링하기 위한 워드넷 개발)

  • Song, Yeongeun;Nam, Suhyun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.1-21
    • /
    • 2016
  • IT-based subjective well-being (SWB) services, a main part of wellness IT, should measure the SWB state of individuals in an unrestrained, cost-effective manner. The dictionaries for sentiment analysis available in the market may be useful for this purpose, but obtaining proper sentiment values using only words from the sentiment lexicon is impossible; therefore, a new dictionary including wellness vocabulary is needed. The existing sentiment dictionaries link only a single sentiment value to a single sentiment word, although sentiment values may vary depending on personal traits. In this study, we develop an extended version of the SenticNet sentiment dictionary dubbed WellnessWordNet. SenticNet is considered the best and most expressive among the already existing sentiment dictionaries. Using the information provided by SenticNet, we created a database including the wellness states (estimated values) of stress, depression, and anger to develop the WellnessWordNet system. The accuracy of the system was validated through actual tests with live subjects. This study is unique and unprecedented in that i) an extended sentiment dictionary, WellnessWordNet, is developed; ii) values for wellness state language are offered; and iii) different sentiment values, namely contextual polarity, for people of the same gender or age group are suggested.

Rating Individual Food Items of Restaurant Menu based on Online Customer Reviews using Text Mining Technique (신뢰성있는 온라인 고객 리뷰 텍스트 마이닝 기반 식당 개별 음식 아이템 평가)

  • Syed, Muzamil Hussain;Chung, Sun-Tae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.389-392
    • /
    • 2020
  • The growth in social media, blogs and restaurant listing directories have led to increasing customer reviews about restaurants, their quality of food items and services available on the internet. These user reviews offer a massive amount of valuable information that can be used for various decision-making purposes. Currently, most food recommendation sites provide recommendation scores about restaurants rather than food items of the restaurant and the provided recommendation scores may be biased since they are calculated only from user reviews listed only in their sites. Usually, people wants a reliable recommendation about foods, not restaurant. In this paper, we present a reliable Korean food items rating method; we first extract food items by applying NER technique to restaurant reviews collected from many Korean restaurant recommendation web sites, blogs and web data. Then, we apply lexicon-based sentiment analysis on collected user reviews and predict people's opinions as sentiment polarity scores (+1 for positive; -1 for negative; 0 for neutral). Finally, by taking average of all calculated polarity scores about a food item, we obtain a rating to individual menu items of the restaurant. The proposed food item rating is more reliable since it does not depend on reviews of only one site.

An Analysis of Relationship between Social Sentiments and Cryptocurrency Price: An Econometric Analysis with Big Data (소셜 감성과 암호화폐 가격 간의 관계 분석: 빅데이터를 활용한 계량경제적 분석)

  • Sangyi Ryu;Jiyeon Hyun;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.21 no.1
    • /
    • pp.91-111
    • /
    • 2019
  • Around the end of 2017, the investment fever for cryptocurrencies-especially Bitcoin-has started all over the world. Especially, South Korea has been at the center of this phenomenon. Sinceit was difficult to find the profitable investment opportunities, people have started to see the cryptocurrency markets as an alternative investment objects. However, the cryptocurrency fever inSouth Korea is mostly based on psychological phenomenon due to expectation of short-term profits and social atmosphere rather than intrinsic value of the assets. Therefore, this study aimed to analyze influence of people's social sentiment on price movement of cryptocurrency. The data was collected for 181 days from Nov 1st, 2017 to Apr 30th, 2018, especially focusing on Bitcoin-related post in Twitter along with price of Bitcoin in Bithumb/UPbit. After the collected data was refined into neutral, positive and negative words through sentiment analysis, the refined neutral, positive, and negative words were put into regression model in order to find out the impacts of social sentiments on Bitcoin price. After examining the relationship by the regression analyses and Granger Causality tests, we found that the positive sentiments had a positive relationship with Bitcoin price, while the negative words had a negative relation with it. Also, the causality test results show that there exist two-way causalities between social sentiment and Bitcoin price movement. Therefore, we were able to conclude that the Bitcoin investors'behaviors are affected by the changes of social sentiments.

Extracting and Clustering of Story Events from a Story Corpus

  • Yu, Hye-Yeon;Cheong, Yun-Gyung;Bae, Byung-Chull
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3498-3512
    • /
    • 2021
  • This article describes how events that make up text stories can be represented and extracted. We also address the results from our simple experiment on extracting and clustering events in terms of emotions, under the assumption that different emotional events can be associated with the classified clusters. Each emotion cluster is based on Plutchik's eight basic emotion model, and the attributes of the NLTK-VADER are used for the classification criterion. While comparisons of the results with human raters show less accuracy for certain emotion types, emotion types such as joy and sadness show relatively high accuracy. The evaluation results with NRC Word Emotion Association Lexicon (aka EmoLex) show high accuracy values (more than 90% accuracy in anger, disgust, fear, and surprise), though precision and recall values are relatively low.

Clothing-Recommendation system based on emotion and weather information (감정과 날씨 정보에 따른 의상 추천 시스템)

  • Ugli, Sadriddinov Ilkhomjon Rovshan;Park, Doo-Soon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.528-531
    • /
    • 2021
  • Nowadays recommendation systems are so ubiquitous, where our many decisions are being done by the means of them. We can see recommendation systems in all areas of our daily life. Therefore the research of this sphere is still so active. So far many research papers were published for clothing recommendations as well. In this paper, we propose the clothing-recommendation system according to user emotion and weather information. We used social media to analyze users' 6 basic emotions according to Paul Eckman theory and match the colour of clothing. Moreover, getting weather information using visualcrossing.com API to predict the kind of clothing. For sentiment analysis, we used Emotion Lexicon that was created by using Mechanical Turk. And matching the emotion and colour was done by applying Hayashi's Quantification Method III.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Feature Weighting for Opinion Classification of Comments on News Articles (뉴스 댓글의 감정 분류를 위한 자질 가중치 설정)

  • Lee, Kong-Joo;Kim, Jae-Hoon;Seo, Hyung-Won;Rhyu, Keel-Soo
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.34 no.6
    • /
    • pp.871-879
    • /
    • 2010
  • In this paper, we present a system that classifies comments on a news article into a user opinion called a polarity (positive or negative). The system is a kind of document classification system for comments and is based on machine learning techniques like support vector machine. Unlike normal documents, comments have their body that can influence classifying their opinions as polarities. In this paper, we propose a feature weighting scheme using such characteristics of comments and several resources for opinion classification. Through our experiments, the weighting scheme have turned out to be useful for opinion classification in comments on Korean news articles. Also Korean character n-grams (bigram or trigram) have been revealed to be helpful for opinion classification in comments including lots of Internet words or typos. In the future, we will apply this scheme to opinion analysis of comments of product reviews as well as news articles.

Crafting a Quality Performance Evaluation Model Leveraging Unstructured Data (비정형데이터를 활용한 건축현장 품질성과 평가 모델 개발)

  • Lee, Kiseok;Song, Taegeun;Yoo, Wi Sung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.24 no.1
    • /
    • pp.157-168
    • /
    • 2024
  • The frequent occurrence of structural failures at building construction sites in Korea has underscored the critical role of rigorous oversight in the inspection and management of construction projects. As mandated by prevailing regulations and standards, onsite supervision by designated supervisors encompasses thorough documentation of construction quality, material standards, and the history of any reconstructions, among other factors. These reports, predominantly consisting of unstructured data, constitute approximately 80% of the data amassed at construction sites and serve as a comprehensive repository of quality-related information. This research introduces the SL-QPA model, which employs text mining techniques to preprocess supervision reports and establish a sentiment dictionary, thereby enabling the quantification of quality performance. The study's findings, demonstrating a statistically significant Pearson correlation between the quality performance scores derived from the SL-QPA model and various legally defined indicators, were substantiated through a one-way analysis of variance of the correlation coefficients. The SL-QPA model, as developed in this study, offers a supplementary approach to evaluating the quality performance of building construction projects. It holds the promise of enhancing quality inspection and management practices by harnessing the wealth of unstructured data generated throughout the lifecycle of construction projects.