• Title/Summary/Keyword: 트윗 수집

Search Result 72, Processing Time 0.025 seconds

Design of a Reputation System for Twitter (트위터 이용한 인물 평판 분석 시스템)

  • Lee, Gyoung-Ho;Lee, Kong Joo
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.62-66
    • /
    • 2012
  • 본 논문은 트위터 사용자들이 글(트윗)을 통해 표현한 인물에 대한 평가를 수집, 분석하여 인물에 대한 평판을 종합적으로 분석하는 시스템의 구성에 대한 논문이다. 트위터의 Open API를 이용한 데이터 수집과 수집된 데이터의 특징에 대하여 분석하고 감성사전을 이용한 데이터 분석과 분석된 결과의 저장방식에 대하여 논한다. 2012년에 치루어지는 18대 대통령 선거의 출마자들을 본 시스템에 적용하여 시스템의 유효성을 검증하고자 한다.

  • PDF

A Study on the Spatial Patterns of Tweet Data for Urban Areas by Time - A Case of Busan City - (도시 지역 트윗 데이터의 시간대별 공간분포 특성 - 부산광역시를 사례로 -)

  • Ku, Cha Yong
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.269-281
    • /
    • 2016
  • The process of spatial big data, such as social media, is being paid more attention in the field of spatial information in recent years. This study, as an example of spatial big data analysis, analyzed the spatial and temporal distribution of Tweet data based on the location and time information. In addition, the characteristics of its spatial pattern by times were identified. Tweet data in Busan city are collected, processed, and analyzed to identify the characteristics of the temporal and spatial pattern. Then, the results of Tweet data analysis were compared with the characteristics of the land type. This study found that spatial pattern of tweeting in the city was associated with given time periods such as daytime and nighttime in both weekdays and weekends. The spatial distribution patterns of individual time periods were compared with the characteristics of the land for the spatially concentrated area. The results of this study showed that tweeted data would be related to different spatial distribution depending on the time, which potentially reflects the daily pattern and characteristics of the land type of urban area to some extent. This study presented the possible incorporation of social media data, e. g. Tweet data, into the field of spatial information. It is expected that there will be more advantage to use a variety of social media data in areas such as land planning and urban planning.

A Content Analysis on the Domestic Public Libraries' Use of Twitter (국내 공공도서관의 트위터 이용에 관한 내용분석)

  • Shim, Jiyoung
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.1
    • /
    • pp.241-262
    • /
    • 2017
  • This study aims to identify and analyze the Twitter use of domestic public libraries. In order to identify the detailed patterns of Twitter use in library and information services, a content analysis was conducted for the 3,038 tweet data from the top 14 public libraries' accounts on Twitter use. Inductive approach was adopted to develop a coding scheme and open coding was conducted with the entire tweet. Additionally, correspondence analysis was conducted for the result of content analysis to identify how library accounts correspond to specific types. As a result, 3 main categories and 9 sub-categories of public libraries' Twitter use were developed. And the 37 detailed patterns of public libraries' use of Twitter were identified. The identified patterns can provide the libraries interested in Twitter use with guidelines.

The Study on the Activation of Public Library Services Utilizing Twitter (트위터를 활용한 공공도서관 서비스 활성화 방안 연구)

  • Oh, Eui-Kyung
    • Journal of Information Management
    • /
    • v.43 no.2
    • /
    • pp.133-150
    • /
    • 2012
  • This study showed the activation of public library services utilizing twitter. Top five American public library twitter's 1,373 tweets collected, analyzed by content types and examined applicability into public library services. Based on the results, it suggested that public library services can be activated by auto-tweeting informations within home page, re-tweeting of timely informations, generating HASH tag, using diverse social medias, active re-tweeting/replying, and utilizing twitter programs such as twit-bot. Finally, the study proposed that evaluations about twitter services such as satisfaction survey should be carried out.

Sentiment Analysis of Foot-and-mouth Disease using Tweet Keyword Network (트윗 키워드 네트워크를 이용한 구제역의 감성분석)

  • Chae, Heechan;Lee, Jonguk;Choi, Yoona;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.267-270
    • /
    • 2018
  • 구제역으로 인하여 국내 축산업계 및 관련 산업분야는 매년 막대한 피해를 입고 있다. 구제역과 관련한 다양한 학술적 연구들이 현재 진행되고는 있으나, 구제역의 발병에 따른 사회적 파급효과에 관한 공학적 분석 연구는 매우 제한적이다. 본 연구에서는 구제역에 관한 일반 시민들의 감성적 반응을 텍스트 마이닝 방법론을 사용하여 분석하는 체계적인 방법론을 제안한다. 제안하는 시스템은 먼저, 트위터에 게시된 트윗 중 구제역과 관련된 데이터를 수집한 후, 감성사전을 기반으로 극성탐지 과정을 거친다. 둘째, 토픽 모델링의 대표적인 기법 중 하나인 LDA를 활용하여 트윗으로 부터 키워드들을 추출하고, 추출된 키워드들로부터 극성별 동시출현 키워드 네트워크를 구성한다. 셋째, 키워드 네트워크을 통해 각 구간별 구제역의 사회적 파급효과를 분석한다. 사례 분석으로써, 2010년 7월부터 2011년 12월까지 국내에서 발생한 구제역에 관한 일반 시민들의 감성적 변화를 분석하였다.

Significance Analysis of Yellow Dust Related Disease Using Tweet Data (트윗 데이터를 이용한 황사 관련 질병 유의성 분석)

  • Jung, Yong-Han;Seo, Min-Song;Yoo, Hwan-Hee
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.1
    • /
    • pp.267-276
    • /
    • 2017
  • Damages have occurred in various fields such as agriculture, industry, and citizen's health due to the yellow dust. Therefore, it is urgent to take measures against it. In this regard, this study collected data of yellow dust over 11 days on a basis of Feb. 23. 2015 when yellow dust was the greatest after 2009, issue words analysis and recomposed health related tweet data. After testing the significance of yellow dust related diseases by association rule analysis with diseases, it obtained the study results as follows: As a result of significance test for the patients with rhinitis, asthma and conjunctivitis by acquiring the condition data of patients from the Health Insurance Review & Assessment Service, conjunctivitis appeared to be significant in 13 cities for 16 cities at 5% significance probability, while asthma and rhinitis showed a significance in 3 and 6 areas. As described above, it is possible to obtain information about citizens' health from SNS data, such as Tweet data and it is judged that these data will provide useful information for establishing measures of citizens' health care.

Location Inference of Twitter Users using Timeline Data (타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법)

  • Kang, Ae Tti;Kang, Young Ok
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.69-81
    • /
    • 2015
  • If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.

A Generation and Matching Method of Normal-Transient Dictionary for Realtime Topic Detection (실시간 이슈 탐지를 위한 일반-급상승 단어사전 생성 및 매칭 기법)

  • Choi, Bongjun;Lee, Hanjoo;Yong, Wooseok;Lee, Wonsuk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.5
    • /
    • pp.7-18
    • /
    • 2017
  • Recently, the number of SNS user has rapidly increased due to smart device industry development and also the amount of generated data is exponentially increasing. In the twitter, Text data generated by user is a key issue to research because it involves events, accidents, reputations of products, and brand images. Twitter has become a channel for users to receive and exchange information. An important characteristic of Twitter is its realtime. Earthquakes, floods and suicides event among the various events should be analyzed rapidly for immediately applying to events. It is necessary to collect tweets related to the event in order to analyze the events. But it is difficult to find all tweets related to the event using normal keywords. In order to solve such a mentioned above, this paper proposes A Generation and Matching Method of Normal-Transient Dictionary for realtime topic detection. Normal dictionaries consist of general keywords(event: suicide-death-loop, death, die, hang oneself, etc) related to events. Whereas transient dictionaries consist of transient keywords(event: suicide-names and information of celebrities, information of social issues) related to events. Experimental results show that matching method using two dictionary finds more tweets related to the event than a simple keyword search.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Spatio-temporal Visualization of Social Anxiety Using SNS Data (SNS 데이터를 이용한 사회 불안의 시공간 기반 시각화)

  • Kim, Jae-Min;Lee, Joo-Hong;Choi, Yong-Suk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.849-852
    • /
    • 2017
  • 본 논문에서는 SNS에서 수집한 데이터를 이용하여 사회 불안의 시공간 분포를 시각화 하는 기법을 소개한다. Open API인 twitter4j를 이용하여 트위터로부터 시공간 정보를 포함한 데이터를 수집한 뒤, 이 트윗의 작성자가 불안한지 아닌지 표시한 훈련 데이터를 준비한다. 이 훈련 데이터와 한글 형태소 분석기 Open API인 KOMORAN을 이용해 사전을 구축하고, 불안 분류기를 개발한다. 트위터로부터 수집한 시공간 정보를 포함한 데이터를 분류기로 분류하여, 지도에 표시해줌으로써 사회 불안을 시각화 한다. 사회 과학자들이 이를 이용하여 불안을 체계적으로 연구함으로써 불안으로부터 생기는 다양한 사회 문제들을 해결할 수 있다.