Search | Korea Science

Social Media Mining Toolkit (SMMT)

Tekumalla, Ramya;Banda, Juan M.
- Genomics & Informatics
- /
- v.18 no.2
- /
- pp.16.1-16.5
- /
- 2020
There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minimal exceptions, the few that do, place the burden on the researcher to figure out how to fetch the data, how to best format their data, and how to create automatic and manual annotations on the acquired data. In order to address this pressing issue, we introduce the Social Media Mining Toolkit (SMMT), a suite of tools aimed to encapsulate the cumbersome details of acquiring, preprocessing, annotating and standardizing social media data. The purpose of our toolkit is for researchers to focus on answering research questions, and not the technical aspects of using social media data. By using a standard toolkit, researchers will be able to acquire, use, and release data in a consistent way that is transparent for everybody using the toolkit, hence, simplifying research reproducibility and accessibility in the social media domain.
https://doi.org/10.5808/GI.2020.18.2.e16 인용 PDF KSCI

Issue summarization scheme based on real-time SNS trend analysis (실시간 SNS 트렌드 분석에 기반한 이슈 요약 기법)

Kim, Daeyong;Kim, Daehoon;Hwang, Eenjun
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.11a
- /
- pp.1096-1097
- /
- 2013
최근 Twitter를 비롯한 소셜 네트워크 서비스의 급속한 확산으로 인해, 많은 수의 SNS 메시지가 실시간으로 생성되고 있다. 이러한 SNS 상의 모든 글을 읽어보는 것은 현실적으로 불가능하며, 여러 포탈 사이트에서 제공되는 실시간 검색어 순위만으로는 상세 내용을 직관적으로 파악하기 어렵다. 따라서, 이러한 SNS상의 글을 실시간으로 분석하여 최신의 트렌드를 찾고 이와 연관된 내용을 분류 및 요약할 수 있다면, 사용자에게 유용한 최신 정보를 생성하여 제공할 수 있다. 본 논문에서는 Tweet 들을 분석하여 얻은 트렌드 키워드를 기반으로 관련된 Tweet 들을 주제 별로 분류한 후, 각 주제 별로 세부 내용을 요약해서 제공하는 기법을 제안한다. 제안하는 기법은 실시간으로 생성되는 Tweet 내에서 최근 화제가 된 트렌드 및 연관 키워드를 추출해낸다. 그 후, 해당 키워드가 출현한 Tweet 내에서 핵심 키워드를 찾고, 이를 기반으로 Tweet 들을 각각의 주제별로 분류하고 각 주제를 '이슈'로 정의한다. 마지막으로, 특정한 이슈에 해당되는 Tweet들을 분석하여 각 이슈 별로 키워드 리스트 및 단문 형식으로 요약된 줄거리를 생성한다. 제안된 기법을 바탕으로 프로토타입 시스템을 구현하고, 다양한 실험을 통하여 이슈 검출 기법의 유용성 면에서 성능을 평가한다.
https://doi.org/10.3745/PKIPS.y2013m11a.1096 인용 PDF

A Method for Evaluating News Value based on Supply and Demand of Information Using Text Analysis (텍스트 분석을 활용한 정보의 수요 공급 기반 뉴스 가치 평가 방안)

Lee, Donghoon;Choi, Hochang;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.22 no.4
- /
- pp.45-67
- /
- 2016
Given the recent development of smart devices, users are producing, sharing, and acquiring a variety of information via the Internet and social network services (SNSs). Because users tend to use multiple media simultaneously according to their goals and preferences, domestic SNS users use around 2.09 media concurrently on average. Since the information provided by such media is usually textually represented, recent studies have been actively conducting textual analysis in order to understand users more deeply. Earlier studies using textual analysis focused on analyzing a document's contents without substantive consideration of the diverse characteristics of the source medium. However, current studies argue that analytical and interpretive approaches should be applied differently according to the characteristics of a document's source. Documents can be classified into the following types: informative documents for delivering information, expressive documents for expressing emotions and aesthetics, operational documents for inducing the recipient's behavior, and audiovisual media documents for supplementing the above three functions through images and music. Further, documents can be classified according to their contents, which comprise facts, concepts, procedures, principles, rules, stories, opinions, and descriptions. Documents have unique characteristics according to the source media by which they are distributed. In terms of newspapers, only highly trained people tend to write articles for public dissemination. In contrast, with SNSs, various types of users can freely write any message and such messages are distributed in an unpredictable way. Again, in the case of newspapers, each article exists independently and does not tend to have any relation to other articles. However, messages (original tweets) on Twitter, for example, are highly organized and regularly duplicated and repeated through replies and retweets. There have been many studies focusing on the different characteristics between newspapers and SNSs. However, it is difficult to find a study that focuses on the difference between the two media from the perspective of supply and demand. We can regard the articles of newspapers as a kind of information supply, whereas messages on various SNSs represent a demand for information. By investigating traditional newspapers and SNSs from the perspective of supply and demand of information, we can explore and explain the information dilemma more clearly. For example, there may be superfluous issues that are heavily reported in newspaper articles despite the fact that users seldom have much interest in these issues. Such overproduced information is not only a waste of media resources but also makes it difficult to find valuable, in-demand information. Further, some issues that are covered by only a few newspapers may be of high interest to SNS users. To alleviate the deleterious effects of information asymmetries, it is necessary to analyze the supply and demand of each information source and, accordingly, provide information flexibly. Such an approach would allow the value of information to be explored and approximated on the basis of the supply-demand balance. Conceptually, this is very similar to the price of goods or services being determined by the supply-demand relationship. Adopting this concept, media companies could focus on the production of highly in-demand issues that are in short supply. In this study, we selected Internet news sites and Twitter as representative media for investigating information supply and demand, respectively. We present the notion of News Value Index (NVI), which evaluates the value of news information in terms of the magnitude of Twitter messages associated with it. In addition, we visualize the change of information value over time using the NVI. We conducted an analysis using 387,014 news articles and 31,674,795 Twitter messages. The analysis results revealed interesting patterns: most issues show lower NVI than average of the whole issue, whereas a few issues show steadily higher NVI than the average.
https://doi.org/10.13088/jiis.2016.22.4.045 인용 PDF KSCI

User Oriented clustering of news articles using Tweets Heterogeneous Information Network (트위트 이형 정보 망을 이용한 뉴스 기사의 사용자 지향적 클러스터링)

Shoaib, Muhammad;Song, Wang-Cheol
- Journal of Internet Computing and Services
- /
- v.14 no.6
- /
- pp.85-94
- /
- 2013
With the emergence of world wide web, in particular web 2.0 the rapidly growing amount of news articles has created a problem for users in selection of news articles according to their requirements. To overcome this problem different clustering mechanism has been proposed to broadly categorize news articles. However these techniques are totally machine oriented techniques and lack users' participation in the process of decision making for membership of clustering. In order to overcome the issue of zero-participation in the process of clustering news articles in this paper we have proposed a framework for clustering news articles by combining users' judgments that they post on twitter with the news articles to cluster the objects. We have employed twitter hash-tags for this purpose. Furthermore we have computed the credibility of users' based on frequency of retweets for their tweets in order to enhance the accuracy of the clustering membership function. In order to test performance of proposed methodology, we performed experiments on tweets messages tweeted during general election 2013 in Pakistan. Our results proved over claim that using users' output better outcome can be achieved then ordinary clustering algorithms.
https://doi.org/10.7472/jksii.2013.14.6.85 인용 PDF KSCI

CoAID⁺ : COVID-19 News Cascade Dataset for Social Context Based Fake News Detection (CoAID⁺ : 소셜 컨텍스트 기반 가짜뉴스 탐지를 위한 COVID-19 뉴스 파급 데이터)

Han, Soeun;Kang, Yoonsuk;Ko, Yunyong;Ahn, Jeewon;Kim, Yushim;Oh, Seongsoo;Park, Heejin;Kim, Sang-Wook
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.4
- /
- pp.149-156
- /
- 2022
In the current COVID-19 pandemic, fake news and misinformation related to COVID-19 have been causing serious confusion in our society. To accurately detect such fake news, social context-based methods have been widely studied in the literature. They detect fake news based on the social context that indicates how a news article is propagated over social media (e.g., Twitter). Most existing COVID-19 related datasets gathered for fake news detection, however, contain only the news content information, but not its social context information. In this case, the social context-based detection methods cannot be applied, which could be a big obstacle in the fake news detection research. To address this issue, in this work, we collect from Twitter the social context information based on CoAID, which is a COVID-19 news content dataset built for fake news detection, thereby building CoAID⁺ that includes both the news content information and its social context information. The CoAID⁺ dataset can be utilized in a variety of methods for social context-based fake news detection, thus would help revitalize the fake news detection research area. Finally, through a comprehensive analysis of the CoAID⁺ dataset in various perspectives, we present some interesting features capable of differentiating real and fake news.
https://doi.org/10.3745/KTSDE.2022.11.4.149 인용 PDF KSCI

A study on the issue analysis of National Archives of Korea based on SNS(tweet) analysis between 2014~2015 (2014년~2015년 국가기록원 관련 트윗 이슈분석)

Seo, Ji-Won;Park, Jun-Hyeong;Oh, Hyo-Jung;Youn, Eunha
- The Korean Journal of Archival Studies
- /
- no.50
- /
- pp.139-175
- /
- 2016
This study is a content analysis on the National Archives of Korea as reflected in tweets produced between 2014 and 2015. The study thus collected all tweets that used the key word 'National Archives of Korea' from 2014 and 2015. The contents of the tweets, including their category and issues mention, were then analyzed. The results of the analysis were as follows. First, the analysis showed that the collected archives of the National Archives had increased their volume in over two years, which have a similar type and pattern in their content. Second, the tweets produced by the public reflects more current political and social issues rather than archival service.
https://doi.org/10.20923/kjas.2016.50.139 인용 PDF

Sentiment Analysis for COVID-19 Vaccine Popularity

Muhammad Saeed;Naeem Ahmed;Abid Mehmood;Muhammad Aftab;Rashid Amin;Shahid Kamal
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.5
- /
- pp.1377-1393
- /
- 2023
Social media is used for various purposes including entertainment, communication, information search, and voicing their thoughts and concerns about a service, product, or issue. The social media data can be used for information mining and getting insights from it. The World Health Organization has listed COVID-19 as a global epidemic since 2020. People from every aspect of life as well as the entire health system have been severely impacted by this pandemic. Even now, after almost three years of the pandemic declaration, the fear caused by the COVID-19 virus leading to higher depression, stress, and anxiety levels has not been fully overcome. This has also triggered numerous kinds of discussions covering various aspects of the pandemic on the social media platforms. Among these aspects is the part focused on vaccines developed by different countries, their features and the advantages and disadvantages associated with each vaccine. Social media users often share their thoughts about vaccinations and vaccines. This data can be used to determine the popularity levels of vaccines, which can provide the producers with some insight for future decision making about their product. In this article, we used Twitter data for the vaccine popularity detection. We gathered data by scraping tweets about various vaccines from different countries. After that, various machine learning and deep learning models, i.e., naive bayes, decision tree, support vector machines, k-nearest neighbor, and deep neural network are used for sentiment analysis to determine the popularity of each vaccine. The results of experiments show that the proposed deep neural network model outperforms the other models by achieving 97.87% accuracy.
https://doi.org/10.3837/tiis.2023.05.004 인용 PDF HTML

A Study on Detection Methodology for Influential Areas in Social Network using Spatial Statistical Analysis Methods (공간통계분석기법을 이용한 소셜 네트워크 유력지역 탐색기법 연구)

Lee, Young Min;Park, Woo Jin;Yu, Ki Yun
- Journal of Korean Society for Geospatial Information Science
- /
- v.22 no.4
- /
- pp.21-30
- /
- 2014
Lately, new influentials have secured a large number of volunteers on social networks due to vitalization of various social media. There has been considerable research on these influential people in social networks but the research has limitations on location information of Location Based Social Network Service(LBSNS). Therefore, the purpose of this study is to propose a spatial detection methodology and application plan for influentials who make comments about diverse social and cultural issues in LBSNS using spatial statistical analysis methods. Twitter was used to collect analysis object data and 168,040 Twitter messages were collected in Seoul over a month-long period. In addition, 'politics,' 'economy,' and 'IT' were set as categories and hot issue keywords as given categories. Therefore, it was possible to come up with an exposure index for searching influentials in respect to hot issue keywords, and exposure index by administrative units of Seoul was calculated through a spatial joint operation. Moreover, an influential index that considers the spatial dependence of the exposure index was drawn to extract information on the influential areas at the top 5% of the influential index and analyze the spatial distribution characteristics and spatial correlation. The experimental results demonstrated that spatial correlation coefficient was relatively high at more than 0.3 in same categories, and correlation coefficient between politics category and economy category was also more than 0.3. On the other hand, correlation coefficient between politics category and IT category was very low at 0.18, and between economy category and IT category was also very weak at 0.15. This study has a significance for materialization of influentials from spatial information perspective, and can be usefully utilized in the field of gCRM in the future.
https://doi.org/10.7319/kogsis.2014.22.4.021 인용 PDF KSCI

Comparison of responses to issues in SNS and Traditional Media using Text Mining -Focusing on the Termination of Korea-Japan General Security of Military Information Agreement(GSOMIA)- (텍스트 마이닝을 이용한 SNS와 언론의 이슈에 대한 반응 비교 -"한일군사정보보호협정(GSOMIA) 종료"를 중심으로-)

Lee, Su Ryeon;Choi, Eun Jung
- Journal of Digital Convergence
- /
- v.18 no.2
- /
- pp.277-284
- /
- 2020
Text mining is a representative method of big data analysis that extracts meaningful information from unstructured and large amounts of text data. Social media such as Twitter generates hundreds of thousands of data per second and acts as a one-person media that instantly and directly expresses public opinions and ideas. The traditional media are delivering informations, criticizing society, and forming public opinions. For this, we compare the responses of SNS with the responses of media on the issue of the termination of the Korea-Japan GSOMIA (General Security of Military Information Agreement), one of the domestic issues in the second half of 2019. Data collected from 201,728 tweets and 20,698 newspaper articles were analyzed by sentiment analysis, association keyword analysis, and cluster analysis. As a result, SNS tends to respond positively to this issue, and the media tends to react negatively. In association keyword analysis, SNS shows positive views on domestic issues such as "destruction, decision, we," while the media shows negative views on external issues such as "disappointment, regret, concern". SNS is faster and more powerful than media when studying or creating social trends and opinions, rather than the function of information delivery. This can complement the role of the media that reflects public perception.
https://doi.org/10.14400/JDC.2020.18.2.277 인용 PDF KSCI

A Comparative Study of Information Delivery Method in Networks According to Off-line Communication (오프라인 커뮤니케이션 유무에 따른 네트워크 별 정보전달 방법 비교 분석)

Park, Won-Kuk;Choi, Chan;Moon, Hyun-Sil;Choi, Il-Young;Kim, Jae-Kyeong
- Journal of Intelligence and Information Systems
- /
- v.17 no.4
- /
- pp.131-142
- /
- 2011
In recent years, Social Network Service, which is defined as a web-based service that allows an individual to construct a public or a semi-public profile within a bounded system, articulates a list of other users with whom they share connections, and traverses their list of connections. For example, Facebook and Twitter are the representative sites of Social Network Service, and these sites are the big issue in the world. A lot of people use Social Network Services to connect and maintain social relationship. Recently the users of Social Network Services have increased dramatically. Accordingly, many organizations become interested in Social Network Services as means of marketing, media, communication with their customers, and so on, because social network services can offer a variety of benefits to organizations such as companies and associations. In other words, organizations can use Social Network Services to respond rapidly to various user's behaviors because Social Network Services can make it possible to communicate between the users more easily and faster. And marketing cost of the Social Network Service is lower than that of existing tools such as broadcasts, news papers, and direct mails. In addition, Social network Services are growing in market place. So, the organizations such as companies and associations can acquire potential customers for the future. However, organizations uniformly communicate with users through Social Network Service without consideration of the characteristics of the networks although networks have different effects on information deliveries. For example, members' cohesion in an offline communication is higher than that in an online communication because the members of the offline communication are very close. that is, the network of the offline communication has a strong tie. Accordingly, information delivery is fast in the network of the offline communication. In this study, we compose two networks which have different characteristic of communication in Twitter. First network is constructed with data based on an offline communication such as friend, family, senior and junior in school. Second network is constructed with randomly selected data from users who want to associate with friends in online. Each network size is 250 people who divide with three groups. The first group is an ego which means a person in the center of the network. The second group is the ego's followers. The last group is composed of the ego's follower's followers. We compare the networks through social network analysis and follower's reaction analysis. We investigate density and centrality to analyze the characteristic of each network. And we analyze the follower's reactions such as replies and retweets to find differences of information delivery in each network. Our experiment results indicate that density and centrality of the offline communicationbased network are higher than those of the online-based network. Also the number of replies are larger than that of retweets in the offline communication-based network. On the other hand, the number of retweets are larger than that of replies in the online based network. We identified that the effect of information delivery in the offline communication-based network was different from those in the online communication-based network through experiments. So, you configure the appropriate network types considering the characteristics of the network if you want to use social network as an effective marketing tool.
https://doi.org/10.13088/jiis.2011.17.4.131 인용 PDF KSCI

Search Result 53, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)