• Title/Summary/Keyword: Tweet Data

Search Result 54, Processing Time 0.027 seconds

Tweet Entity Linking Method based on User Similarity for Entity Disambiguation (개체 중의성 해소를 위한 사용자 유사도 기반의 트윗 개체 링킹 기법)

  • Kim, SeoHyun;Seo, YoungDuk;Baik, Doo-Kwon
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1043-1051
    • /
    • 2016
  • Web based entity linking cannot be applied in tweet entity linking because twitter documents are shorter in comparison to web documents. Therefore, tweet entity linking uses the information of users or groups. However, data sparseness problem is occurred due to the users with the inadequate number of twitter experience data; in addition, a negative impact on the accuracy of the linking result for users is possible when using the information of unrelated groups. To solve the data sparseness problem, we consider three features including the meanings from single tweets, the users' own tweet set and the sets of other users' tweets. Furthermore, we improve the performance and the accuracy of the tweet entity linking by assigning a weight to the information of users with a high similarity. Through a comparative experiment using actual twitter data, we verify that the proposed tweet entity linking has higher performance and accuracy than existing methods, and has a correlation with solving the data sparseness problem and improved linking accuracy for use of information of high similarity users.

An Efficient Method for Design and Implementation of Tweet Analysis System (효율적인 트윗 분석 시스템 설계 및 구현 방법)

  • Choi, Minseok
    • Journal of Digital Convergence
    • /
    • v.13 no.2
    • /
    • pp.43-50
    • /
    • 2015
  • Since the popularity of social network services (SNS) rise, the data produced from them is rapidly increased. The SNS data includes personal propensity or interest and propagates rapidly so there are many requests on analyzing the data for applying the analytic results to various fields. New technologies and services for processing and analyzing big data in the real-time are introduced but it is hard to apply them in a short time and low coast. In this paper, an efficient method to build a tweet analysis system without inducing new technologies or service platforms for handling big data is proposed. The proposed method was verified through building a prototype monitoring system to collect and analyze tweets using the MySQL database and the PHP scripts.

A Study on the Spatial Patterns of Tweet Data for Urban Areas by Time - A Case of Busan City - (도시 지역 트윗 데이터의 시간대별 공간분포 특성 - 부산광역시를 사례로 -)

  • Ku, Cha Yong
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.269-281
    • /
    • 2016
  • The process of spatial big data, such as social media, is being paid more attention in the field of spatial information in recent years. This study, as an example of spatial big data analysis, analyzed the spatial and temporal distribution of Tweet data based on the location and time information. In addition, the characteristics of its spatial pattern by times were identified. Tweet data in Busan city are collected, processed, and analyzed to identify the characteristics of the temporal and spatial pattern. Then, the results of Tweet data analysis were compared with the characteristics of the land type. This study found that spatial pattern of tweeting in the city was associated with given time periods such as daytime and nighttime in both weekdays and weekends. The spatial distribution patterns of individual time periods were compared with the characteristics of the land for the spatially concentrated area. The results of this study showed that tweeted data would be related to different spatial distribution depending on the time, which potentially reflects the daily pattern and characteristics of the land type of urban area to some extent. This study presented the possible incorporation of social media data, e. g. Tweet data, into the field of spatial information. It is expected that there will be more advantage to use a variety of social media data in areas such as land planning and urban planning.

Location Inference of Twitter Users using Timeline Data (타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법)

  • Kang, Ae Tti;Kang, Young Ok
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.69-81
    • /
    • 2015
  • If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.

Discovery of Urban Area and Spatial Distribution of City Population using Geo-located Tweet Data (위치기반 트윗 데이터를 이용한 도심권 추정과 인구의 공간분포 분석)

  • Kim, Tae Kyu;Lee, Jin Kyu;Cho, Jae Hee
    • Journal of Information Technology Services
    • /
    • v.18 no.1
    • /
    • pp.131-140
    • /
    • 2019
  • This study compares and analyzes the spatial distribution of people in two cities using location information in twitter data. The target cities were selected as Paris, a traditional tourist city, and Dubai, a tourist city that has recently attracted attention. The data was collected over 123 days in 2016 and 125 days in 2018. We compared the spatial distribution of two cities according to the two periods and residence status. In this study, we have found a hot place using a spatial statistical model called dart-shaped space division and estimated the urban area by reflecting the distribution of tweet population. And we visualized it as a CDF (cumulative distribution function) curve so that the distance between all the tweets' occurrence points and the city center point can be compared for different cities.

An Analysis of Relationship Between Word Frequency in Social Network Service Data and Crime Occurences (소셜 네트워크 서비스의 단어 빈도와 범죄 발생과의 관계 분석)

  • Kim, Yong-Woo;Kang, Hang-Bong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.9
    • /
    • pp.229-236
    • /
    • 2016
  • In the past, crime prediction methods utilized previous records to accurately predict crime occurrences. Yet these crime prediction models had difficulty in updating immense data. To enhance the crime prediction methods, some approaches used social network service (SNS) data in crime prediction studies, but the relationship between SNS data and crime records has not been studied thoroughly. Hence, in this paper, we analyze the relationship between SNS data and criminal occurrences in the perspective of crime prediction. Using Latent Dirichlet Allocation (LDA), we extract tweets that included any words regarding criminal occurrences and analyze the changes in tweet frequency according to the crime records. We then calculate the number of tweets including crime related words and investigate accordingly depending on crime occurrences. Our experimental results demonstrate that there is a difference in crime related tweet occurrences when criminal activity occurs. Moreover, our results show that SNS data analysis will be helpful in crime prediction model as there are certain patterns in tweet occurrences before and after the crime.

Dynamic Seed Selection for Twitter Data Collection (트위터 데이터 수집을 위한 동적 시드 선택)

  • Lee, Hyoenchoel;Byun, Changhyun;Kim, Yanggon;Lee, Sang Ho
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.217-225
    • /
    • 2014
  • Analysis of social media such as Twitter can yield interesting perspectives to understanding human behavior, detecting hot issues, identifying influential people, or discovering a group and community. However, it is difficult to gather the data relevant to specific topics due to the main characteristics of social media data; data is large, noisy, and dynamic. This paper proposes a new algorithm that dynamically selects the seed nodes to efficiently collect tweets relevant to topics. The algorithm utilizes attributes of users to evaluate the user influence, and dynamically selects the seed nodes during the collection process. We evaluate the proposed algorithm with real tweet data, and get satisfactory performance results.

Spatial Distribution Patterns of Twitter Data with Topic Modeling (토픽 모델링을 이용한 트위터 데이터의 공간 분포 패턴 분석)

  • Woo, Hyun Jee;Kim, Young Hoon
    • Journal of the Korean association of regional geographers
    • /
    • v.23 no.2
    • /
    • pp.376-387
    • /
    • 2017
  • This paper attempts to analyze the geographical characters of Twitter data and presents analysis potentials for social network analysis in geography. First, this paper suggests a methodology for a topic modeling-based approach in order to identify the geographical characteristics of tweets, including an analysis flow of Twitter data sets, tweet data collection and conversion, textural pre-processing and structural analysis, topic discovery, and interpretation of tweets' topics. GPS coordinates referencing tweets(geotweets) were extracted among sampled Twitter data sets because it contains the tweet place where it was created. This paper identifies a correlated relationship between some specific topics and local places in Jeju. This correlation is closely associated with some place names and local sites in Jeju Island. We assume it is the intention of tweeters to record their tweet places and to share and retweet with other tweeters in some cases. A surface density map shows the hotspots of tweets, detecting around some specific places and sites such as Jeju airport, sightseeing sites, and local places in Jeju Island. The hotspots show similar patterns of the floating population of Jeju, especially the thirty-year age group. In addition, a topic modeling algorithm is applied for the geographical topic discovery and comparison of the spatial patterns of tweets. Finally, this empirical analysis presents that Twitter data, as social network data, provide geographical significance, with topic modeling approach being useful in analyzing the textural features reflecting the geographical characteristics in large data sets of tweets.

  • PDF

Significance Analysis of Yellow Dust Related Disease Using Tweet Data (트윗 데이터를 이용한 황사 관련 질병 유의성 분석)

  • Jung, Yong-Han;Seo, Min-Song;Yoo, Hwan-Hee
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.1
    • /
    • pp.267-276
    • /
    • 2017
  • Damages have occurred in various fields such as agriculture, industry, and citizen's health due to the yellow dust. Therefore, it is urgent to take measures against it. In this regard, this study collected data of yellow dust over 11 days on a basis of Feb. 23. 2015 when yellow dust was the greatest after 2009, issue words analysis and recomposed health related tweet data. After testing the significance of yellow dust related diseases by association rule analysis with diseases, it obtained the study results as follows: As a result of significance test for the patients with rhinitis, asthma and conjunctivitis by acquiring the condition data of patients from the Health Insurance Review & Assessment Service, conjunctivitis appeared to be significant in 13 cities for 16 cities at 5% significance probability, while asthma and rhinitis showed a significance in 3 and 6 areas. As described above, it is possible to obtain information about citizens' health from SNS data, such as Tweet data and it is judged that these data will provide useful information for establishing measures of citizens' health care.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.