• Title/Summary/Keyword: Sentiment classification

Search Result 179, Processing Time 0.029 seconds

A Comparative Study on Sentiment Analysis Based on Psychological Model (감정 분석에서의 심리 모델 적용 비교 연구)

  • Kim, Haejun;Do, Junho;Sun, Juoh;Jeong, Seohee;Lee, Hyunah
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.450-452
    • /
    • 2020
  • 기술의 발전과 함께 사용자에게 가까이 자리 잡은 소셜 네트워크 서비스는 이미지, 동영상, 텍스트 등 활용 가능한 데이터의 수를 폭발적으로 증가시켰다. 작성자의 감정을 포함하고 있는 텍스트 데이터는 시장 조사, 주가 예측 등 다양한 분야에서 이용할 수 있으며, 이로 인해 긍부정의 이진 분류가 아닌 다중 감정 분석의 필요성 또한 높아지고 있다. 본 논문에서는 딥러닝 기반 감정 분류에 심리학 이론의 기반 감정 모델을 활용한 결합 모델과 단일 모델을 비교한다. 학습을 위해 AI Hub에서 제공하는 데이터와 노래 가사 데이터를 복합적으로 사용하였으며, 결과에서는 대부분의 경우에 결합 모델이 높은 결과를 보였다.

  • PDF

Performance Comparison of Word Embeddings for Sentiment Classification (감성 분류를 위한 워드 임베딩 성능 비교)

  • Yoon, Hye-Jin;Koo, Jahwan;Kim, Ung-Mo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.760-763
    • /
    • 2021
  • 텍스트를 자연어 처리를 위한 모델에 적용할 수 있게 언어적인 특성을 반영해서 단어를 수치화하는 방법 중 단어를 벡터로 표현하여 나타내는 워드 임베딩은 컴퓨터가 인간의 언어를 이해하고 분석 가능한 언어 모델의 필수 요소가 되었다. Word2vec 등 다양한 워드 임베딩 기법이 제안되었고 자연어를 처리할 때에 감성 분류는 중요한 요소이지만 다양한 임베딩 기법에 따른 감성 분류 모델에 대한 성능 비교 연구는 여전히 부족한 실정이다. 본 논문에서는 Emotion-stimulus 데이터를 활용하여 7가지의 감성과 2가지의 감성을 5가지의 임베딩 기법과 3종류의 분류 모델로 감성 분류 학습을 진행하였다. 감성 분류를 위해 Logistic Regression, Decision Tree, Random Forest 모델 등과 같은 보편적으로 많이 사용하는 머신러닝 분류 모델을 사용하였으며, 각각의 결과를 훈련 정확도와 테스트 정확도로 비교하였다. 실험 결과, 7가지 감성 분류 및 2가지 감성 분류 모두 사전훈련된 Word2vec가 대체적으로 우수한 정확도 성능을 보였다.

Sentiment Classification Model Development Based On EDA-Applied BERT (EDA 기법을 적용한 BERT 기반의 감성 분류 모델 생성)

  • Lee, Jin-Sang;Lim, Heui-Seok
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.79-80
    • /
    • 2022
  • 본 논문에서는 데이터 증강 기법 중 하나인 EDA를 적용하여 BERT 기반의 감성 분류 언어 모델을 만들고, 성능 개선 방법을 제안한다. EDA(Easy Data Augmentation) 기법은 테이터가 한정되어 있는 환경에서 SR(Synonym Replacement), RI(Random Insertion), RS(Random Swap), RD(Random Deletion) 총 4가지 세부 기법을 통해서 학습 데이터를 증강 시킬 수 있다. 이렇게 증강된 데이터를 학습 데이터로 이용해 구글의 BERT를 기본 모델로 한 전이학습을 진행하게 되면 감성 분류 모델을 생성해 낼 수 있다. 데이터 증강 기법 적용 후 전이 학습을 통해 생성한 감성 분류 모델의 성능을 증강 이전의 전이 학습 모델과 비교해 보면 정확도 측면에서 향상을 기대해 볼 수 있다.

  • PDF

Sentiment Analysis for Public Opinion in the Social Network Service (SNS 기반 여론 감성 분석)

  • HA, Sang Hyun;ROH, Tae Hyup
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.1
    • /
    • pp.111-120
    • /
    • 2020
  • As an application of big data and artificial intelligence techniques, this study proposes an atypical language-based sentimental opinion poll methodology, unlike conventional opinion poll methodology. An alternative method for the sentimental classification model based on existing statistical analysis was to collect real-time Twitter data related to parliamentary elections and perform empirical analyses on the Polarity and Intensity of public opinion using attribute-based sensitivity analysis. In order to classify the polarity of words used on individual SNS, the polarity of the new Twitter data was estimated using the learned Lasso and Ridge regression models while extracting independent variables that greatly affect the polarity variables. A social network analysis of the relationships of people with friends on SNS suggested a way to identify peer group sensitivity. Based on what voters expressed on social media, political opinion sensitivity analysis was used to predict party approval rating and measure the accuracy of the predictive model polarity analysis, confirming the applicability of the sensitivity analysis methodology in the political field.

Safeguarding Korean Export Trade through Social Media-Driven Risk Identification and Characterization

  • Sithipolvanichgul, Juthamon;Abrahams, Alan S.;Goldberg, David M.;Zaman, Nohel;Baghersad, Milad;Nasri, Leila;Ractham, Peter
    • Journal of Korea Trade
    • /
    • v.24 no.8
    • /
    • pp.39-62
    • /
    • 2020
  • Purpose - Korean exports account for a vast proportion of Korean GDP, and large volumes of Korean products are sold in the United States. Identifying and characterizing actual and potential product hazards related to Korean products is critical to safeguard Korean export trade, as severe quality issues can impair Korea's reputation and reduce global consumer confidence in Korean products. In this study, we develop country-of-origin-based product risk analysis methods for social media with a specific focus on Korean-labeled products, for the purpose of safeguarding Korean export trade. Design/methodology - We employed two social media datasets containing consumer-generated product reviews. Sentiment analysis is a popular text mining technique used to quantify the type and amount of emotion that is expressed in the text. It is a useful tool for gathering customer opinions regarding products. Findings - We document and discuss the specific potential risks found in Korean-labeled products and explain their implications for safeguarding Korean export trade. Finally, we analyze the false positive matches that arise from the established dictionaries that were used for risk discovery and utilize these classification errors to suggest opportunities for the future refinement of the associated automated text analytic methods. Originality/value - Various studies have used online feedback from social media to analyze product defects. However, none of them links their findings to trade promotion and the protection of a specific country's exports. Therefore, it is important to fill this research gap, which could help to safeguard export trade in Korea.

Sentiment Analysis of Foot-and-Mouth Disease Using Tweet Text-Mining Technique (트윗 텍스트 마이닝 기법을 이용한 구제역의 감성분석)

  • Chae, Heechan;Lee, Jonguk;Choi, Yoona;Park, Daihee;Chung, Yongwha
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.419-426
    • /
    • 2018
  • Due to the FMD(foot-and-mouth disease), the domestic animal husbandry and related industries suffer enormous damage every year. Although various academic researches related to FMD are ongoing, engineering studies on the social effects of FMD are very limited. In this study, we propose a systematic methodology to analyze emotional responses of regular citizens on FMD using text mining techniques. The proposed system first collects data related to FMD from the tweets posted on Twitter, and then performs a polarity classification process using a deep-learning technique. Second, keywords are extracted from the tweet using LDA, which is one of the typical techniques of topic modeling, and a keyword network is constructed from the extracted keywords. Finally, we analyze the various social effects of regular citizens on FMD through keyword network. As a case study, we performed the emotional analysis experiment of regular citizens about FMD from July 2010 to December 2011 in Korea.

Construction of Onion Sentiment Dictionary using Cluster Analysis (군집분석을 이용한 양파 감성사전 구축)

  • Oh, Seungwon;Kim, Min Soo
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2917-2932
    • /
    • 2018
  • Many researches are accomplished as a result of the efforts of developing the production predicting model to solve the supply imbalance of onions which are vegetables very closely related to Korean food. But considering the possibility of storing onions, it is very difficult to solve the supply imbalance of onions only with predicting the production. So, this paper's purpose is trying to build a sentiment dictionary to predict the price of onions by using the internet articles which include the informations about the production of onions and various factors of the price, and these articles are very easy to access on our daily lives. Articles about onions are from 2012 to 2016, using TF-IDF for comparing with four kinds of TF-IDFs through the documents classification of wholesale prices of onions. As a result of classifying the positive/negative words for price by k-means clustering, DBSCAN (density based spatial cluster application with noise) clustering, GMM (Gaussian mixture model) clustering which are partitional clustering, GMM clustering is composed with three meaningful dictionaries. To compare the reasonability of these built dictionary, applying classified articles about the rise and drop of the price on logistic regression, and it shows 85.7% accuracy.

Stock Market Prediction Using Sentiment on YouTube Channels (유튜브 주식채널의 감성을 활용한 코스피 수익률 등락 예측)

  • Su-Ji, Cho;Cheol-Won Yang;Ki-Kwang Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.2
    • /
    • pp.102-108
    • /
    • 2023
  • Recently in Korea, YouTube stock channels increased rapidly due to the high social interest in the stock market during the COVID-19 period. Accordingly, the role of new media channels such as YouTube is attracting attention in the process of generating and disseminating market information. Nevertheless, prior studies on the market forecasting power of YouTube stock channels remain insignificant. In this study, the market forecasting power of the information from the YouTube stock channel was examined and compared with traditional news media. To measure information from each YouTube stock channel and news media, positive and negative opinions were extracted. As a result of the analysis, opinion in channels operated by media outlets were found to be leading indicators of KOSPI market returns among YouTube stock channels. The prediction accuracy by using logistic regression model show 74%. On the other hand, Sampro TV, a popular YouTube stock channel, and the traditional news media simply reported the market situation of the day or instead showed a tendency to lag behind the market. This study is differentiated from previous studies in that it verified the market predictive power of the information provided by the YouTube stock channel, which has recently shown a growing trend in Korea. In the future, the results of advanced analysis can be confirmed by expanding the research results for individual stocks.

A User Sentiment Classification Using Instagram image and text Analysis (인스타그램 이미지와 텍스트 분석을 통한 사용자 감정 분류)

  • Hong, Taekeun;Kim, Jeongin;Shin, Juhyun
    • Smart Media Journal
    • /
    • v.5 no.1
    • /
    • pp.61-68
    • /
    • 2016
  • According to increasing SNS users and developing smart devices like smart phone and tablet PC recently, many techniques to classify user emotions with social network information are researching briskly. The use emotion classification stands for distinguishing its emotion with text and images listed on his/her SNS. This paper suggests a method to classify user emotions through sampling a value of a representative figure on a trigonometrical function, a representative adjective on text, and a canny algorithm on images. The sampling representative adjective on text is selected as one of high frequency in the samplings and measured values of positive-negative by SentiWordNet. Figures sampled on images are selected as the representative in figures; triangle, quadrangle, and circle as well as classified user emotions by measuring pleasure-unpleased values as a type of figures and inclines. Finally, this is re-defined as x-y graph that represents pleasure-unpleased and positive-negative values with wheel of emotions by Plutchik. Also, we are anticipating for applying user-customized service through classifying user emotions on wheel of emotions by Plutchik that is redefined the representative adjectives and figures.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.