• Title/Summary/Keyword: Sentiment classification

Search Result 173, Processing Time 0.03 seconds

Propensity Analysis of Political Attitude of Twitter Users by Extracting Sentiment from Timeline (타임라인의 감정추출을 통한 트위터 사용자의 정치적 성향 분석)

  • Kim, Sukjoong;Hwang, Byung-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.1
    • /
    • pp.43-51
    • /
    • 2014
  • Social Network Service has the sufficient potential can be widely and effectively used for various fields of society because of convenient accessibility and definite user opinion. Above all Twitter has characteristics of simple and open network formation between users and remarkable real-time diffusion. However, real analysis is accompanied by many difficulties because of semantic analysis in 140-characters, the limitation of Korea natural language processing and the technical problem of Twitter is own restriction. This thesis paid its attention to human's political attitudes showing permanence and assumed that if applying it to the analytic design, it would contribute to the increase of precision and showed it through the experiment. As a result of experiment with Tweet corpus gathered during the election of national assemblymen on 11st April 2012, it could be known to be considerably similar compared to actual election result. The precision of 75.4% and recall of 34.8% was shown in case of individual Tweet analysis. On the other hand, the performance improvement of approximately 8% and 5% was shown in by-timeline political attitude analysis of user.

Study on the social issue sentiment classification using text mining (텍스트마이닝을 이용한 사회 이슈 찬반 분류에 관한 연구)

  • Kang, Sun-A;Kim, Yoo Sin;Choi, Sang Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1167-1173
    • /
    • 2015
  • The development of information and communication technology like SNS, blogs, and bulletin boards, was provided a variety of places where you can express your thoughts and comments and allowing Big Data to grow, many people reveal the opinion of the social issues in SNS such as Twitter. In this study, we would like to pre-built sentimental dictionary about social issues and conduct a sentimental analysis with structured dictionary, to gather opinions on social issues that are created on twitter. The data that I used is "bikini", "nakkomsu" including tweet. As the result of analysis, precision is 61% and F1- score is 74%. This study expect to suggest the standard of dictionary construction allowing you to classify positive/negative opinion on specific social issues.

Grading System of Movie Review through the Use of An Appraisal Dictionary and Computation of Semantic Segments (감정어휘 평가사전과 의미마디 연산을 이용한 영화평 등급화 시스템)

  • Ko, Min-Su;Shin, Hyo-Pil
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.669-696
    • /
    • 2010
  • Assuming that the whole meaning of a document is a composition of the meanings of each part, this paper proposes to study the automatic grading of movie reviews which contain sentimental expressions. This will be accomplished by calculating the values of semantic segments and performing data classification for each review. The ARSSA(The Automatic Rating System for Sentiment analysis using an Appraisal dictionary) system is an effort to model decision making processes in a manner similar to that of the human mind. This aims to resolve the discontinuity between the numerical ranking and textual rationalization present in the binary structure of the current review rating system: {rate: review}. This model can be realized by performing analysis on the abstract menas extracted from each review. The performance of this system was experimentally calculated by performing a 10-fold Cross-Validation test of 1000 reviews obtained from the Naver Movie site. The system achieved an 85% F1 Score when compared to predefined values using a predefined appraisal dictionary.

  • PDF

A Study on the Dataset of the Korean Multi-class Emotion Analysis in Radio Listeners' Messages (라디오 청취자 문자 사연을 활용한 한국어 다중 감정 분석용 데이터셋연구)

  • Jaeah, Lee;Gooman, Park
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.940-943
    • /
    • 2022
  • This study aims to analyze the Korean dataset by performing Korean sentence Emotion Analysis in the radio listeners' text messages collected personally. Currently, in Korea, research on the Emotion Analysis of Korean sentences is variously continuing. However, it is difficult to expect high accuracy of Emotion Analysis due to the linguistic characteristics of Korean. In addition, a lot of research has been done on Binary Sentiment Analysis that allows positive/negative classification only, but Multi-class Emotion Analysis that is classified into three or more emotions requires more research. In this regard, it is necessary to consider and analyze the Korean dataset to increase the accuracy of Multi-class Emotion Analysis for Korean. In this paper, we analyzed why Korean Emotion Analysis is difficult in the process of conducting Emotion Analysis through surveys and experiments, proposed a method for creating a dataset that can improve accuracy and can be used as a basis for Emotion Analysis of Korean sentences.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

A Study on SNS Reviews Analysis based on Deep Learning for User Tendency (개인 성향 추출을 위한 딥러닝 기반 SNS 리뷰 분석 방법에 관한 연구)

  • Park, Woo-Jin;Lee, Ju-Oh;Lee, Hyung-Geol;Kim, Ah-Yeon;Heo, Seung-Yeon;Ahn, Yong-Hak
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.11
    • /
    • pp.9-17
    • /
    • 2020
  • In this paper, we proposed an SNS review analysis method based on deep learning for user tendency. The existing SNS review analysis method has a problem that does not reflect a variety of opinions on various interests because most are processed based on the highest weight. To solve this problem, the proposed method is to extract the user's personal tendency from the SNS review for food. It performs classification using the YOLOv3 model, and after performing a sentiment analysis through the BiLSTM model, it extracts various personal tendencies through a set algorithm. Experiments showed that the performance of Top-1 accuracy 88.61% and Top-5 90.13% for the YOLOv3 model, and 90.99% accuracy for the BiLSTM model. Also, it was shown that diversity of the individual tendencies in the SNS review classification through the heat map. In the future, it is expected to extract personal tendencies from various fields and be used for customized service or marketing.

Sound Visualization based on Emotional Analysis of Musical Parameters (음악 구성요소의 감정 구조 분석에 기반 한 시각화 연구)

  • Kim, Hey-Ran;Song, Eun-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.6
    • /
    • pp.104-112
    • /
    • 2021
  • In this study, emotional analysis was conducted based on the basic attribute data of music and the emotional model in psychology, and the result was applied to the visualization rules in the formative arts. In the existing studies using musical parameter, there were many cases with more practical purposes to classify, search, and recommend music for people. In this study, the focus was on enabling sound data to be used as a material for creating artworks and used for aesthetic expression. In order to study the music visualization as an art form, a method that can include human emotions should be designed, which is the characteristics of the arts itself. Therefore, a well-structured basic classification of musical attributes and a classification system on emotions were provided. Also, through the shape, color, and animation of the visual elements, the visualization of the musical elements was performed by reflecting the subdivided input parameters based on emotions. This study can be used as basic data for artists who explore a field of music visualization, and the analysis method and work results for matching emotion-based music components and visualizations will be the basis for automated visualization by artificial intelligence in the future.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

KB-BERT: Training and Application of Korean Pre-trained Language Model in Financial Domain (KB-BERT: 금융 특화 한국어 사전학습 언어모델과 그 응용)

  • Kim, Donggyu;Lee, Dongwook;Park, Jangwon;Oh, Sungwoo;Kwon, Sungjun;Lee, Inyong;Choi, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.191-206
    • /
    • 2022
  • Recently, it is a de-facto approach to utilize a pre-trained language model(PLM) to achieve the state-of-the-art performance for various natural language tasks(called downstream tasks) such as sentiment analysis and question answering. However, similar to any other machine learning method, PLM tends to depend on the data distribution seen during the training phase and shows worse performance on the unseen (Out-of-Distribution) domain. Due to the aforementioned reason, there have been many efforts to develop domain-specified PLM for various fields such as medical and legal industries. In this paper, we discuss the training of a finance domain-specified PLM for the Korean language and its applications. Our finance domain-specified PLM, KB-BERT, is trained on a carefully curated financial corpus that includes domain-specific documents such as financial reports. We provide extensive performance evaluation results on three natural language tasks, topic classification, sentiment analysis, and question answering. Compared to the state-of-the-art Korean PLM models such as KoELECTRA and KLUE-RoBERTa, KB-BERT shows comparable performance on general datasets based on common corpora like Wikipedia and news articles. Moreover, KB-BERT outperforms compared models on finance domain datasets that require finance-specific knowledge to solve given problems.

The Effect of Domain Specificity on the Performance of Domain-Specific Pre-Trained Language Models (도메인 특수성이 도메인 특화 사전학습 언어모델의 성능에 미치는 영향)

  • Han, Minah;Kim, Younha;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.251-273
    • /
    • 2022
  • Recently, research on applying text analysis to deep learning has steadily continued. In particular, researches have been actively conducted to understand the meaning of words and perform tasks such as summarization and sentiment classification through a pre-trained language model that learns large datasets. However, existing pre-trained language models show limitations in that they do not understand specific domains well. Therefore, in recent years, the flow of research has shifted toward creating a language model specialized for a particular domain. Domain-specific pre-trained language models allow the model to understand the knowledge of a particular domain better and reveal performance improvements on various tasks in the field. However, domain-specific further pre-training is expensive to acquire corpus data of the target domain. Furthermore, many cases have reported that performance improvement after further pre-training is insignificant in some domains. As such, it is difficult to decide to develop a domain-specific pre-trained language model, while it is not clear whether the performance will be improved dramatically. In this paper, we present a way to proactively check the expected performance improvement by further pre-training in a domain before actually performing further pre-training. Specifically, after selecting three domains, we measured the increase in classification accuracy through further pre-training in each domain. We also developed and presented new indicators to estimate the specificity of the domain based on the normalized frequency of the keywords used in each domain. Finally, we conducted classification using a pre-trained language model and a domain-specific pre-trained language model of three domains. As a result, we confirmed that the higher the domain specificity index, the higher the performance improvement through further pre-training.