• Title/Summary/Keyword: sentiment score

Search Result 70, Processing Time 0.03 seconds

Hybrid Approach to Sentiment Analysis based on Syntactic Analysis and Machine Learning (구문분석과 기계학습 기반 하이브리드 텍스트 논조 자동분석)

  • Hong, Mun-Pyo;Shin, Mi-Young;Park, Shin-Hye;Lee, Hyung-Min
    • Language and Information
    • /
    • v.14 no.2
    • /
    • pp.159-181
    • /
    • 2010
  • This paper presents a hybrid approach to the sentiment analysis of online texts. The sentiment of a text refers to the feelings that the author of a text has towards a certain topic. Many existing approaches employ either a pattern-based approach or a machine learning based approach. The former shows relatively high precision in classifying the sentiments, but suffers from the data sparseness problem, i.e. the lack of patterns. The latter approach shows relatively lower precision, but 100% recall. The approach presented in the current work adopts the merits of both approaches. It combines the pattern-based approach with the machine learning based approach, so that the relatively high precision and high recall can be maintained. Our experiment shows that the hybrid approach improves the F-measure score for more than 50% in comparison with the pattern-based approach and for around 1% comparing with the machine learning based approach. The numerical improvement from the machine learning based approach might not seem to be quite encouraging, but the fact that in the current approach not only the sentiment or the polarity information of sentences but also the additional information such as target of sentiments can be classified makes the current approach promising.

  • PDF

A Domain Adaptive Sentiment Dictionary Construction Method for Domain Sentiment Analysis (도메인 별 감성분석을 위한 도메인 맞춤형 감성사전 구축 기법)

  • Kim, Dahae;Cho, Taemin;Lee, Jee-Hyong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.01a
    • /
    • pp.15-18
    • /
    • 2015
  • SNS의 확산으로 대중들은 제품, 서비스, 사회적 이슈 등 다양한 도메인에 대하여 자신의 기분이나 의견을 적극적으로 표현하고 있다. 이에 따라 SNS를 분석하여 제품의 수요, TV 시청률, 주가 등의 다양한 현상을 예측하는 데 있어 감성분석을 활용하는 연구가 활발히 진행되고 있다. 감성분석은 각 어휘에 대한 품사, 극성, 감성지수를 규정하고 있는 감성사전을 기반으로 이루어진다. 하지만 동일한 단어라도 도메인에 따라 중요도가 달라지기 때문에 도메인의 특성을 고려한 감성사전을 사용해야 할 필요성이 있다. 따라서 본 연구에서는 다양한 도메인에 대하여 각각의 특성에 맞게 더욱 정확한 감성분석을 할 수 있도록 도메인 맞춤형 감성사전을 구축하는 기법을 제안한다. 도메인 별로 긍 / 부정 평가에 있어 중요한 척도가 되는 단어들을 도메인 감성어휘로 선별하여 목록을 구축하고, 각 감성어휘의 중요도에 따라 도메인 감성지수를 새롭게 정의하였다. 실험 결과, 평가 도메인에 적합한 감성사전이 다른 도메인의 감성사전 및 범용 감성사전보다 우수한 성능을 보였다. 이를 통해 도메인 맞춤형 감성사전 구축기법의 효용성을 확인하였다.

  • PDF

A Study on Lexicon Integrated Convolutional Neural Networks for Sentiment Analysis (감성 분석을 위한 어휘 통합 합성곱 신경망에 관한 연구)

  • Yoon, Joo-Sung;Kim, Hyeon-Cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.916-919
    • /
    • 2017
  • 최근 딥러닝의 발달로 인해 Sentiment analysis분야에서도 다양한 기법들이 적용되고 있다. 이미지, 음성인식 분야에서 높은 성능을 보여주었던 Convolutional Neural Networks (CNN)은 최근 자연어처리 분야에서도 활발하게 연구가 진행되고 있으며 Sentiment analysis에도 효과적인 것으로 알려져 있다. 기존의 머신러닝에서는 lexicon을 이용한 기법들이 활발하게 연구되었지만 word embedding이 등장하면서 이러한 시도가 점차 줄어들게 되었다. 그러나 lexicon은 여전히 sentiment analysis에서 유용한 정보를 제공한다. 본 연구에서는 SemEval 2017 Task4에서 제공한 Twitter dataset과 다양한 lexicon corpus를 사용하여 lexicon을 CNN과 결합하였을 때 모델의 성능이 얼마큼 향상되는지에 대하여 연구하였다. 또한 word embedding과 lexicon이 미치는 영향에 대하여 분석하였다. 모델을 평가하는 metric은 positive, negative, neutral 3가지 class에 대한 macroaveraged F1 score를 사용하였다.

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

  • Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.818-833
    • /
    • 2021
  • To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.

A Comparative Study between Stock Price Prediction Models Using Sentiment Analysis and Machine Learning Based on SNS and News Articles (SNS와 뉴스기사의 감성분석과 기계학습을 이용한 주가예측 모형 비교 연구)

  • Kim, Dongyoung;Park, Jeawon;Choi, Jaehyun
    • Journal of Information Technology Services
    • /
    • v.13 no.3
    • /
    • pp.221-233
    • /
    • 2014
  • Because people's interest of the stock market has been increased with the development of economy, a lot of studies have been going to predict fluctuation of stock prices. Latterly many studies have been made using scientific and technological method among the various forecasting method, and also data using for study are becoming diverse. So, in this paper we propose stock prices prediction models using sentiment analysis and machine learning based on news articles and SNS data to improve the accuracy of prediction of stock prices. Stock prices prediction models that we propose are generated through the four-step process that contain data collection, sentiment dictionary construction, sentiment analysis, and machine learning. The data have been collected to target newspapers related to economy in the case of news article and to target twitter in the case of SNS data. Sentiment dictionary was built using news articles among the collected data, and we utilize it to process sentiment analysis. In machine learning phase, we generate prediction models using various techniques of classification and the data that was made through sentiment analysis. After generating prediction models, we conducted 10-fold cross-validation to measure the performance of they. The experimental result showed that accuracy is over 80% in a number of ways and F1 score is closer to 0.8. The result can be seen as significantly enhanced result compared with conventional researches utilizing opinion mining or data mining techniques.

Analysis of IT Service Quality Elements Using Text Sentiment Analysis (텍스트 감정분석을 이용한 IT 서비스 품질요소 분석)

  • Kim, Hong Sam;Kim, Chong Su
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.4
    • /
    • pp.33-40
    • /
    • 2020
  • In order to satisfy customers, it is important to identify the quality elements that affect customers' satisfaction. The Kano model has been widely used in identifying multi-dimensional quality attributes in this purpose. However, the model suffers from various shortcomings and limitations, especially those related to survey practices such as the data amount, reply attitude and cost. In this research, a model based on the text sentiment analysis is proposed, which aims to substitute the survey-based data gathering process of Kano models with sentiment analysis. In this model, from the set of opinion text, quality elements for the research are extracted using the morpheme analysis. The opinions' polarity attributes are evaluated using text sentiment analysis, and those polarity text items are transformed into equivalent Kano survey questions. Replies for the transformed survey questions are generated based on the total score of the original data. Then, the question-reply set is analyzed using both the original Kano evaluation method and the satisfaction index method. The proposed research model has been tested using a large amount of data of public IT service project evaluations. The result shows that it can replace the existing practice and it promises advantages in terms of quality and cost of data gathering. The authors hope that the proposed model of this research may serve as a new quality analysis model for a wide range of areas.

Method for Spatial Sentiment Lexicon Construction using Korean Place Reviews (한국어 장소 리뷰를 이용한 공간 감성어 사전 구축 방법)

  • Lee, Young Min;Kwon, Pil;Yu, Ki Yun;Kim, Ji Young
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.2
    • /
    • pp.3-12
    • /
    • 2017
  • Leaving positive or negative comments of places where he or she visits on location-based services is being common in daily life. The sentiment analysis of place reviews written by actual visitors can provide valuable information to potential consumers, as well as business owners. To conduct sentiment analysis of a place, a spatial sentiment lexicon that can be used as a criterion is required; yet, lexicon of spatial sentiment words has not been constructed. Therefore, this study suggested a method to construct a spatial sentiment lexicon by analyzing the place review data written by Korean internet users. Among several location categories, theme parks were chosen for this study. For this purpose, natural language processing technique and statistical techniques are used. Spatial sentiment words included the lexicon have information about sentiment polarity and probability score. The spatial sentiment lexicon constructed in this study consists of 3 tables(SSLex_SS, SSLex_single, SSLex_combi) that include 219 spatial sentiment words. Throughout this study, the sentiment analysis has conducted based on the texts written about the theme parks created on Twitter. As the accuracy of the sentiment classification was calculated as 0.714, the validity of the lexicon was verified.

Cyberbullying Detection by Sentiment Analysis of Tweets' Contents Written in Arabic in Saudi Arabia Society

  • Almutairi, Amjad Rasmi;Al-Hagery, Muhammad Abdullah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.112-119
    • /
    • 2021
  • Social media has become a global means of communication in people's lives. Most people are using Twitter for communication purposes and its inappropriate use, which has negative effects on people's lives. One of the widely common misuses of Twitter is cyberbullying. As the resources of dialectal Arabic are rare, so for cyberbullying most people are using dialectal Arabic. For this reason, the ultimate goal of this study is to detect and classify cyberbullying on Twitter in the Arabic context in Saudi Arabia. To help in the detection and classification of tweets, Pointwise Mutual Information (PMI) to generate a lexicon, and Support Vector Machine (SVM) algorithms are used. The evaluation is performed on both methods in terms of the F1-score. However, the F1-score after applying the PMI is 50%, while after the SVM application on the resampling data it is 82%. The analysis of the results shows that the SVM algorithm outperforms better.

Extraction of Satisfaction Factors and Evaluation of Tourist Attractions based on Travel Site Review Comments (여행 사이트 리뷰를 활용한 관광지 만족도 요인 추출 및 평가)

  • Cho, Suhyoun;Kim, Boseop;Park, Minsik;Lee, Gichang;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.43 no.1
    • /
    • pp.62-71
    • /
    • 2017
  • In order to attract foreign tourists, it is important to understand what factors on domestic tour spots are critically considered and how they are evaluated after visit. However, most of the researches on tour business have collected information from tourists through survey on a small number of tourists, which leads to inaccurate and biased conclusion. In this paper, we suggest a data-driven methodology to figure out tourists' satisfaction factors and estimate sentiment scores on them. To do so, we collected review comments data from popular web site. Latent dirichlet allocation is employed to extract key factors and elastic net is used to estimate sentiment scores. Then, an aggregated evaluation score is generated by combining the factors and the sentiment scores per topics. Our proposed method can be used to recommend travel schedules with themes and discover new spots.

Assessment of Public Awareness on Invasive Alien Species of Freshwater Ecosystem Using Conservation Culturomics (보전문화체학 접근방식을 통한 생태계교란 생물인 담수 외래종의 대중인식 평가)

  • Park, Woong-Bae;Do, Yuno
    • Journal of Wetlands Research
    • /
    • v.23 no.4
    • /
    • pp.364-371
    • /
    • 2021
  • Public awareness of alien species can vary by generation, period, or specific events associated with these species. An understanding of public awareness is important for the management of alien species because differences in public awareness can affect the establishment and implementation of management plans. We analyzed digital texts on social media platforms, news articles, and internet search volumes used in conservation culturomics to understand public interest and sentiment regarding alien freshwater species. The number of tweets, number of news articles, and relative search volume to 11 freshwater alien species were extracted to determine public interest. Additionally, the trend over time, seasonal variability, and repetition period of these data were confirmed. We also calculated the sentiment score and analyzed public sentiment in the collected data using sentiment analysis based on text mining techniques. The American bullfrog, nutria, bluegill, and largemouth bass drew relatively more public interest than other species. Some species showed repeated patterns in the number of Twitter posts, media coverage, and internet searches found according to the specified periods. The text mining analysis results showed negative sentiments from most people regarding alien freshwater species. Particularly, negative sentiments increased over the years after alien species were designated as ecologically disturbing species.