• Title/Summary/Keyword: 트윗 분석

Search Result 128, Processing Time 0.027 seconds

Study on the social issue sentiment classification using text mining (텍스트마이닝을 이용한 사회 이슈 찬반 분류에 관한 연구)

  • Kang, Sun-A;Kim, Yoo Sin;Choi, Sang Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1167-1173
    • /
    • 2015
  • The development of information and communication technology like SNS, blogs, and bulletin boards, was provided a variety of places where you can express your thoughts and comments and allowing Big Data to grow, many people reveal the opinion of the social issues in SNS such as Twitter. In this study, we would like to pre-built sentimental dictionary about social issues and conduct a sentimental analysis with structured dictionary, to gather opinions on social issues that are created on twitter. The data that I used is "bikini", "nakkomsu" including tweet. As the result of analysis, precision is 61% and F1- score is 74%. This study expect to suggest the standard of dictionary construction allowing you to classify positive/negative opinion on specific social issues.

A Case Study of the Issue detected Analysis on Social Media Big Data (소셜 빅 데이터를 이용한 이슈 감지 사례분석)

  • Song, Eun-Jee;Kang, Min-Shik
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.682-683
    • /
    • 2014
  • 최근 IT업체들은 온라인 상에서 소비자들이 평소에 쏟아내는 의견들을 수집, 축적해서, 원하는 키워드를 중심으로 내용을 분석함으로써, 특정 주제에 대해 어떤 여론이 형성되고 있으며, 여론이 어떻게 전파되고 있는지 경로를 파악할 수 있는 소셜 빅데이터 분석 툴을 경쟁적으로 개발하고 있다. 본 논문에서는 소셜 빅 데이터를 분석함에 있어 이슈를 감지하고 예측하는 기술을 실제 사례에 적용하여 분석한 결과를 고찰해 보고자 한다. 소셜 미디어 데이터 패턴을 비교 분석하고 부정이슈 감지를 위해 부정 여론을 확산시키는데 영향을 미치는 내용과 작성자를 독립변수로 하고, 평균 이슈 도달 시간 및 속도를 종속변수로 정의한다. 부정 여론 형성의 영향력은 트윗수, 리트윗 수를 기준으로 이슈 감지한다. 분석결과 전체 트윗 중 리트윗 메시지가 큰 비중 차지하고 이슈에 대한 버즈가 증가할수록 리트윗 비중이 증가하였으며 크게 확산될 때는 리트윗량이 크게 증가하여 짧은 시간 안에 넓게 확산하였다.

  • PDF

Opinion Retrieval in Twitter Considering Syntactic Relations of Sentiment Phrase (의견 어구의 구문 관계를 고려한 트위터 의견 검색)

  • Kim, Yoonsung;Yang, Min-Chul;Lee, Seung-Wook;Rim, Hae-Chang
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.492-497
    • /
    • 2014
  • In this paper, we propose a method of retrieving opinioned tweets in Twitter, which is the one of the popular Social Network Services and shares diverse opinions among various users. In typical opinion retrieval systems, they may consider the presence of sentiment phrases (subjectivity) as the important factor even if the subjective phrases are not related to a given query or speaker. To alleviate these problems, we utilized the syntactic structure of a sentence to identify the relationships between 1) subjectivity-query and 2) subjectivity-speaker and 3) the syntactic role of subjectivity. Besides, our learning-to-rank approach is trained to retrieve opinioned tweets based on query-relevance, textual features, user information, and Twitter-specific features. Experimental results on real world data show that our proposed method can achieve better performance than several baseline methods in terms of precision and nDCG.

Analyzing Spatial Correlation between Location-Based Social Media Data and Real Estates Price Index through Rasterization (격자기반 분석을 통한 위치기반 소셜 미디어 데이터와 부동산 가격지수 간의 공간적 상관성 분석 연구)

  • Park, Woo Jin;Eo, Seung Won;Yu, Ki Yun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.23 no.1
    • /
    • pp.23-29
    • /
    • 2015
  • In this study, the spatial relevance between the regional housing price data and the spatial distribution of the location-based social media data is explored. The spatial analysis with rasterization was applied to this study, because the both data have a different form to analyze. The geo-tagged Twitter data had been collected for a month and the regional housing price index about sales and lease were used. The spatial range of both data includes Seoul and the some parts of the metropolitan area. 2,000m grid was constructed to consider the different spatial measure between two data, and they were combined into the constructed grids. The Hotspot Analysis was operated using the combined dataset to see the comparison of spatial distribution, and the bivariate spatial correlation coefficients between two data were measured for the quantitative analysis. The result of this study shows that Seocho-gu area is detected as a common hotspot of tweet and housing sales price index data. though the spatial relevance is not detected between tweet and housing lease price index data.

Initial Small Data Reveal Rumor Traits via Recurrent Neural Networks (초기 소량 데이터와 RNN을 활용한 루머 전파 추적 기법)

  • Kwon, Sejeong;Cha, Meeyoung
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.680-685
    • /
    • 2017
  • The emergence of online media and their data has enabled data-driven methods to solve challenging and complex tasks such as rumor classification problems. Recently, deep learning based models have been shown as one of the fastest and the most accurate algorithms to solve such problems. These new models, however, either rely on complete data or several days-worth of data, limiting their applicability in real time. In this study, we go beyond this limit and test the possibility of super early rumor detection via recurrent neural networks (RNNs). Our model takes in social media streams as time series input, along with basic meta-information about the rumongers including the follower count and the psycholinguistic traits of rumor content itself. Based on analyzing millions of social media posts on 498 real rumors and 494 non-rumor events, our RNN-based model detected rumors with only 30 initial posts (i.e., within a few hours of rumor circulation) with remarkable F1 score of 0.74. This finding widens the scope of new possibilities for building a fast and efficient rumor detection system.

Semi-supervised learning for sentiment analysis in mass social media (대용량 소셜 미디어 감성분석을 위한 반감독 학습 기법)

  • Hong, Sola;Chung, Yeounoh;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.482-488
    • /
    • 2014
  • This paper aims to analyze user's emotion automatically by analyzing Twitter, a representative social network service (SNS). In order to create sentiment analysis models by using machine learning techniques, sentiment labels that represent positive/negative emotions are required. However it is very expensive to obtain sentiment labels of tweets. So, in this paper, we propose a sentiment analysis model by using self-training technique in order to utilize "data without sentiment labels" as well as "data with sentiment labels". Self-training technique is that labels of "data without sentiment labels" is determined by utilizing "data with sentiment labels", and then updates models using together with "data with sentiment labels" and newly labeled data. This technique improves the sentiment analysis performance gradually. However, it has a problem that misclassifications of unlabeled data in an early stage affect the model updating through the whole learning process because labels of unlabeled data never changes once those are determined. Thus, labels of "data without sentiment labels" needs to be carefully determined. In this paper, in order to get high performance using self-training technique, we propose 3 policies for updating "data with sentiment labels" and conduct a comparative analysis. The first policy is to select data of which confidence is higher than a given threshold among newly labeled data. The second policy is to choose the same number of the positive and negative data in the newly labeled data in order to avoid the imbalanced class learning problem. The third policy is to choose newly labeled data less than a given maximum number in order to avoid the updates of large amount of data at a time for gradual model updates. Experiments are conducted using Stanford data set and the data set is classified into positive and negative. As a result, the learned model has a high performance than the learned models by using "data with sentiment labels" only and the self-training with a regular model update policy.

Improving accuracy of SNS-based Disaster Notification System using Morphological Analysis and Artificial Neural Network (형태소분석과 인공신경망을 활용한 SNS 기반 재난알림시스템의 정확도 향상)

  • Lee, Dong-Ho;Kang, Suk-Min;Kim, Soo-Hyun;Jo, Sung-Jae;Park, Chan-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.881-884
    • /
    • 2017
  • 스마트 디바이스가 대중화 되면서 각종 사건 사고에 대한 데이터가 SNS 상에 실시간으로 업데이트 된다. SNS의 이런 특성을 이용하여 이용자 개개인이 사고감지센서의 역할을 하면 빠른 사고감지가 가능하다. 하지만 기존 연구들은 단순히 키워드의 출현 빈도로 사고를 판단하는 방식과, 문법파괴 요소가 많은 트위터의 특성으로 인해 정확성에서 한계를 보인다. 본 연구에서는 사고감지의 정확도를 높이기 위해 형태소로 분석한 트윗을 벡터화하여 다층퍼셉트론신경망으로 학습시키는 모델을 구현하였다. 연구 결과 일반명사로 이루어진 40개의 단어를 사용했을 때 가장 높은 82.58%의 정확도를 얻었다.

The Study on the Relationship between Disaster Signs and Sentimental of the Social Bigdata (소셜 빅데이터의 감성과 재난전조의 연관성에 관한 연구)

  • Bae, ByungGul;Lee, BoRam;Choi, SeonHwa
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.898-899
    • /
    • 2014
  • 여러 가지 예측하기 힘든 요소에 의해서 발생되는 재난을 미리 감지하는 것은 매우 어려운 일이다. 특히, 일부라도 예측할 수가 있는 자연재난이 아닌 복합재난의 경우, 측정될 수가 있는 정형적인 데이터가 존재하지 않기 때문에 재난을 예측하기 위한 데이터가 없는 것이 현실이다. 본 논문에서는 재난에 대한 전조를 감지하기 위해 소셜미디어에서 사람들이 직접 생성하는 소셜 빅데이터를 활용하여 재난과 관련된 메시지의 감성이 재난전조와 연관성이 있다는 것을 알아보고자 한다. 그래서 실제 사람들이 작성한 재난과 관련된 트윗을 수집하고 감성분석하여 재난발생 전후의 감성변화를 분석하였다.

The Management of Medical Information Quality Utilizing Big Data (빅 데이터를 활용한 의료정보 질 관리)

  • Cho, Young-bok;Woo, Sung-Hee;Lee, Sang-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.728-731
    • /
    • 2014
  • Today, the quality of medical service has become a major concern because that sustainable development of IT technology and extending people's life expectancy. This paper, it is used as a tool for the medical information quality management that analyze tweets big data form generated by individual's daily. The result of the analyze big data offers improvement medical information based evidence based medicine. Also it has been possible for a trace observation of chronic disease and can reduce additional other complications of patients. Therefore, effective treatment of disease and prevention is possible.

  • PDF

Citizen Sentiment Analysis of the Social Disaster by Using Opinion Mining (오피니언 마이닝 기법을 이용한 사회적 재난의 시민 감성도 분석)

  • Seo, Min Song;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.37-46
    • /
    • 2017
  • Recently, disaster caused by social factors is frequently occurring in Korea. Prediction about what crisis could happen is difficult, raising the citizen's concern. In this study, we developed a program to acquire tweet data by applying Python language based Tweepy plug-in, regarding social disasters such as 'Nonspecific motive crimes' and 'Oxy' products. These data were used to evaluate psychological trauma and anxiety of citizens through the text clustering analysis and the opinion mining analysis of the R Studio program after natural language processing. In the analysis of the 'Oxy' case, the accident of Sewol ferry, the continual sale of Oxy products of the Oxy had the highest similarity and 'Nonspecific motive crimes', the coping measures of the government against unexpected incidents such as the 'incident' of the screen door, the accident of Sewol ferry and 'Nonspecific motive crime' due to misogyny in Busan, had the highest similarity. In addition, the average index of the Citizens sentiment score in Nonspecific motive crimes was more negative than that in the Oxy case by 11.61%p. Therefore, it is expected that the findings will be utilized to predict the mental health of citizens to prevent future accidents.