• Title/Summary/Keyword: 소셜 데이터 분석

Search Result 739, Processing Time 0.024 seconds

Prediction of Onion Purchase Using Structured and Unstructured Big Data (정형 및 비정형 빅데이터를 이용한 양파 소비 예측)

  • Rah, HyungChul;Oh, Eunhwa;Yoo, Do-il;Cho, Wan-Sup;Nasridinov, Aziz;Park, Sungho;Cho, Youngbeen;Yoo, Kwan-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.30-37
    • /
    • 2018
  • The social media data and the broadcasting data related to onion as well as agri-food consumer panel data were collected and investigated if the amount of money spent to purchase onion in year 2014 when onion price plunged latest were correlated with the frequencies of onion-related keywords in the social media data and the broadcasting programs because onion price in year 2018 is expected to plunge due to overproduction and there has been needs to analyze impacts of social media and broadcasting program on onion purchase in the previous similar events, and identify potential factors that can promote onion consumption in advance. What we identified from our study include a) broadcasting news programs mentioning words "onion," were correlated with onion purchase with 3 - 6 weeks in advance; b) broadcasting entertainment programs mentioning words "onion and health," were correlated with onion purchase with 11 weeks in advance; c) blog mentioning words "onion and efficacy," were correlated with onion purchase with 5 weeks in advance. Our study provided a case on how social media and broadcasting programs could be analyzed for their effects on consumer purchase behavior using big data collection and analysis in the field of agriculture. We propose to use the findings from the study may be applied to promote onion consumption.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

The Analysis of the Recent News on Domestic Drought Situation by National Drought Information-Analysis System (국가가뭄정보분석시스템을 활용한 최근 가뭄관련 언론현황 분석 및 고찰)

  • Lee, Ho Sun;Chun, Gun Il;Park, Jae Young
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.340-340
    • /
    • 2017
  • 최근 전 세계적으로 기후변화로 인한 가뭄이 빈번히 발생하고 있으며 우리나라도 '14~'15년 장기화된 가뭄으로 인해 많은 어려움을 겪었다. 이러한 가뭄은 비교적 느린 속도로 진행되고 그 영향이 복잡하게 나타나기 때문에 적절한 사전대응이 이루어지지 않으면 상당한 피해를 겪게 된다. 최근 기존 수자원 정보의 수집과 분석을 탈피해서 다른 사회 시스템과의 연계 추진하는 빅데이터 개념의 적용시도가 이루어지고 있다. K-water 국가가뭄정보분석센터에서는 가뭄의 사전인지와 영향평가의 보조적인 수단으로서 뉴스를 활용하는 방법론을 도출하고 이를 시스템에 구현하여 적용하여 활용성을 분석하였다. 언론(뉴스)정보는 가뭄의 발생, 영향, 대응 등을 포괄적으로 검색할 수 있도록 가뭄진행 순서에 따라 가뭄징조 및 예측, 가뭄발생, 가뭄영향, 가뭄대응, 가뭄대비 및 해소 관련 5개 카테고리와 이와 관련된 69개 세부 키워드로 구분하고 이를 시스템에 반영하였다. 빅데이터 기능을 적용하여 인터넷 뉴스를 해당키워드를 적용해 자동으로 수집할 수 있도록 하였으며 중복되거나 관련 없는 뉴스를 제외하고 이를 다시 발생지역으로 공간 구분하여 GIG 맵에 표출될 수 있도록 구축하였다. 구축된 시스템을 활용하여 '16년을 대상으로 수집된 총 448건의 뉴스자료를 분석한 결과 시스템에 구축되어 있는 '16년 용수공급체계를 반영한 가뭄평가결과와 발생위치, 발생시기, 피해내용 등이 '16년 물수급 현황을 잘 나타내는 것으로 나타났다. 향후 센터에서는 뉴스이외에 소셜미디어와 SNS등에서 다양한 가뭄관련정보를 빅데이터 수집방식에 의해 확보하고 이를 가뭄인자와 영향평가에 대한 참고자료로서 활용하기 위한 방안과 시스템 적용을 통한 검증을 지속적으로 진행할 예정이다.

  • PDF

Study on the Application Methods of Big Data at a Corporation -Cases of A and Y corporation Big Data System Projects- (기업의 빅데이터 적용방안 연구 -A사, Y사 빅데이터 시스템 적용 사례-)

  • Lee, Jae Sung;Hong, Sung Chan
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.103-112
    • /
    • 2014
  • In recent years, the rapid diffusion of smart devices and growth of internet usage and social media has led to a constant production of huge amount of valuable data set that includes personal information, buying patterns, location information and other things. IT and Production Infrastructure has also started to produce its own data with the vitalization of M2M (Machine-to-Machine) and IoT (Internet of Things). This analysis study researches the applicable effects of Structured and Unstructured Big Data in various business circumstances, and purposes to find out the value creation method for a corporation through the Structured and Unstructured Big Data case studies. The result demonstrates that corporations looking for the optimized big data utilization plan could maximize their creative values by utilizing Unstructured and Structured Big Data generated interior and exterior of corporations.

Big data mining for natural disaster analysis (자연재해 분석을 위한 빅데이터 마이닝 기술)

  • Kim, Young-Min;Hwang, Mi-Nyeong;Kim, Taehong;Jeong, Chang-Hoo;Jeong, Do-Heon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1105-1115
    • /
    • 2015
  • Big data analysis for disaster have been recently started especially to text data such as social media. Social data usually supports for the final two stages of disaster management, which consists of four stages: prevention, preparation, response and recovery. Otherwise, big data analysis for meteorologic data can contribute to the prevention and preparation. This motivated us to review big data technologies dealing with non-text data rather than text in natural disaster area. To this end, we first explain the main keywords, big data, data mining and machine learning in sec. 2. Then we introduce the state-of-the-art machine learning techniques in meteorology-related field sec. 3. We show how the traditional machine learning techniques have been adapted for climatic data by taking into account the domain specificity. The application of these techniques in natural disaster response are then introduced (sec. 4), and we finally conclude with several future research directions.

A Study on the Potential and Limitation of Pre-producing Dramas through Social Analysis -focusing on a jtbc drama - (소셜 분석을 통한 사전제작 드라마의 가능성과 한계에 관한 연구 -jtbc <맨투맨>을 중심으로-)

  • Kim, Kyung-Ae;Ku, Jin-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.2
    • /
    • pp.164-172
    • /
    • 2018
  • This paper examines the relevance of pre-production and storytelling in big data analysis and, focusing on JTBC's Man to Man series, looks at how the drama's storytelling should be structured. In this study, we conducted text mining on blogs focused on a particular topic to read the viewer's thoughts on pre-produced dramas and on 67 blogs written about Pre-Production Dramas from 2016.12.15 to 2017.12.15. Also, we conducted sentiment analysis about the Man to Man series, which is not only a pre-production drama, but also has storytelling issues. The blog text extraction and text mining were analyzed using the OutWit Hub and the R, and the tools.provided by social metrics were used to make sentiment analyses of the larger data. Sentiment analysis revealed that the viewers of the Man to Man series did not agree with the romance between Kim Sul-woo and Cha Do-ha, due to the lack of reality in the female characters. Therefore, it was concluded that it is crucial to increase the reality of the characters in order to increase the audience's empathy. These studies will continue to be necessary, because they will form the basis for digitally driven storytelling studies and will provide valuable materials for conducting predictions and instructions in the cultural content industry.

The Framework of Research Network and Performance Evaluation on Personal Information Security: Social Network Analysis Perspective (개인정보보호 분야의 연구자 네트워크와 성과 평가 프레임워크: 소셜 네트워크 분석을 중심으로)

  • Kim, Minsu;Choi, Jaewon;Kim, Hyun Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.177-193
    • /
    • 2014
  • Over the past decade, there has been a rapid diffusion of electronic commerce and a rising number of interconnected networks, resulting in an escalation of security threats and privacy concerns. Electronic commerce has a built-in trade-off between the necessity of providing at least some personal information to consummate an online transaction, and the risk of negative consequences from providing such information. More recently, the frequent disclosure of private information has raised concerns about privacy and its impacts. This has motivated researchers in various fields to explore information privacy issues to address these concerns. Accordingly, the necessity for information privacy policies and technologies for collecting and storing data, and information privacy research in various fields such as medicine, computer science, business, and statistics has increased. The occurrence of various information security accidents have made finding experts in the information security field an important issue. Objective measures for finding such experts are required, as it is currently rather subjective. Based on social network analysis, this paper focused on a framework to evaluate the process of finding experts in the information security field. We collected data from the National Discovery for Science Leaders (NDSL) database, initially collecting about 2000 papers covering the period between 2005 and 2013. Outliers and the data of irrelevant papers were dropped, leaving 784 papers to test the suggested hypotheses. The co-authorship network data for co-author relationship, publisher, affiliation, and so on were analyzed using social network measures including centrality and structural hole. The results of our model estimation are as follows. With the exception of Hypothesis 3, which deals with the relationship between eigenvector centrality and performance, all of our hypotheses were supported. In line with our hypothesis, degree centrality (H1) was supported with its positive influence on the researchers' publishing performance (p<0.001). This finding indicates that as the degree of cooperation increased, the more the publishing performance of researchers increased. In addition, closeness centrality (H2) was also positively associated with researchers' publishing performance (p<0.001), suggesting that, as the efficiency of information acquisition increased, the more the researchers' publishing performance increased. This paper identified the difference in publishing performance among researchers. The analysis can be used to identify core experts and evaluate their performance in the information privacy research field. The co-authorship network for information privacy can aid in understanding the deep relationships among researchers. In addition, extracting characteristics of publishers and affiliations, this paper suggested an understanding of the social network measures and their potential for finding experts in the information privacy field. Social concerns about securing the objectivity of experts have increased, because experts in the information privacy field frequently participate in political consultation, and business education support and evaluation. In terms of practical implications, this research suggests an objective framework for experts in the information privacy field, and is useful for people who are in charge of managing research human resources. This study has some limitations, providing opportunities and suggestions for future research. Presenting the difference in information diffusion according to media and proximity presents difficulties for the generalization of the theory due to the small sample size. Therefore, further studies could consider an increased sample size and media diversity, the difference in information diffusion according to the media type, and information proximity could be explored in more detail. Moreover, previous network research has commonly observed a causal relationship between the independent and dependent variable (Kadushin, 2012). In this study, degree centrality as an independent variable might have causal relationship with performance as a dependent variable. However, in the case of network analysis research, network indices could be computed after the network relationship is created. An annual analysis could help mitigate this limitation.

Managing Duplicate Memberships of Websites : An Approach of Social Network Analysis (웹사이트 중복회원 관리 : 소셜 네트워크 분석 접근)

  • Kang, Eun-Young;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.153-169
    • /
    • 2011
  • Today using Internet environment is considered absolutely essential for establishing corporate marketing strategy. Companies have promoted their products and services through various ways of on-line marketing activities such as providing gifts and points to customers in exchange for participating in events, which is based on customers' membership data. Since companies can use these membership data to enhance their marketing efforts through various data analysis, appropriate website membership management may play an important role in increasing the effectiveness of on-line marketing campaign. Despite the growing interests in proper membership management, however, there have been difficulties in identifying inappropriate members who can weaken on-line marketing effectiveness. In on-line environment, customers tend to not reveal themselves clearly compared to off-line market. Customers who have malicious intent are able to create duplicate IDs by using others' names illegally or faking login information during joining membership. Since the duplicate members are likely to intercept gifts and points that should be sent to appropriate customers who deserve them, this can result in ineffective marketing efforts. Considering that the number of website members and its related marketing costs are significantly increasing, it is necessary for companies to find efficient ways to screen and exclude unfavorable troublemakers who are duplicate members. With this motivation, this study proposes an approach for managing duplicate membership based on the social network analysis and verifies its effectiveness using membership data gathered from real websites. A social network is a social structure made up of actors called nodes, which are tied by one or more specific types of interdependency. Social networks represent the relationship between the nodes and show the direction and strength of the relationship. Various analytical techniques have been proposed based on the social relationships, such as centrality analysis, structural holes analysis, structural equivalents analysis, and so on. Component analysis, one of the social network analysis techniques, deals with the sub-networks that form meaningful information in the group connection. We propose a method for managing duplicate memberships using component analysis. The procedure is as follows. First step is to identify membership attributes that will be used for analyzing relationship patterns among memberships. Membership attributes include ID, telephone number, address, posting time, IP address, and so on. Second step is to compose social matrices based on the identified membership attributes and aggregate the values of each social matrix into a combined social matrix. The combined social matrix represents how strong pairs of nodes are connected together. When a pair of nodes is strongly connected, we expect that those nodes are likely to be duplicate memberships. The combined social matrix is transformed into a binary matrix with '0' or '1' of cell values using a relationship criterion that determines whether the membership is duplicate or not. Third step is to conduct a component analysis for the combined social matrix in order to identify component nodes and isolated nodes. Fourth, identify the number of real memberships and calculate the reliability of website membership based on the component analysis results. The proposed procedure was applied to three real websites operated by a pharmaceutical company. The empirical results showed that the proposed method was superior to the traditional database approach using simple address comparison. In conclusion, this study is expected to shed some light on how social network analysis can enhance a reliable on-line marketing performance by efficiently and effectively identifying duplicate memberships of websites.

Analysis of News Big Data for Deriving Social Issues in Korea (한국의 사회적 이슈 도출을 위한 뉴스 빅데이터 분석 연구)

  • Lee, Hong Joo
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.3
    • /
    • pp.163-182
    • /
    • 2019
  • Analyzing the frequency and correlation of the news keywords in the modern society that are becoming complicated according to the time flow is a very important research to discuss the response and solution to issues. This paper analyzed the relationship between the flow of social keyword and major issues through the analysis of news big data for 10 years (2009~2018). In this study, political issues, education and social culture, gender conflicts and social problems were presented as major issues. And, to study the change and flow of issues, it analyzed the change of the issue by dividing it into five years. Through this, the changes and countermeasures of social issues were studied. As a result, the keywords (economy, police) that are closely related to the people's life were analyzed as keywords that are very important in our society regardless of the flow of time. In addition, keyword such as 'safety' have decreased in increasing rate compared to frequency in recent years. Through this, it can be inferred that it is necessary to improve the awareness of safety in our society.

Social media big data analysis of Z-generation fashion (Z세대 패션에 대한 소셜미디어의 빅데이터 분석)

  • Sung, Kwang-Sook
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.22 no.3
    • /
    • pp.49-61
    • /
    • 2020
  • This study analyzed the social media accounts and performed a Big Data analysis of Z-generation fashion using Textom Text Mining Techniques program and Ucinet Big Data analysis program. The research results are as follows: First, as a result of keyword analysis on 67.646 Z-generation fashion social media posts over the last 5 years, 220,211 keywords were extracted. Among them, 67 major keywords were selected based on the frequency of co-occurrence being greater than more than 250 times. As the top keywords appearing over 1000 times, were the most influential as the number of nodes connected to 'Z generation' (29595 times) are overwhelmingly, and was followed by 'millennials'(18536 times), 'fashion'(17836 times), and 'generation'(13055 times), 'brand'(8325 times) and 'trend'(7310 times) Second, as a result of the analysis of Network Degree Centrality between the key keywords for the Z-generation, the number of nodes connected to the "Z-generation" (29595 times) is overwhelmingly large. Next, many 'millennial'(18536 times), 'fashion'(17836 times), 'generation'(13055 times), 'brand'(8325 times), 'trend'(7310 times), etc. appear. These texts are considered to be important factors in exploring the reaction of social media to the Z-generation. Third, through the analysis of CONCOR, text with the structural equivalence between major keywords for Gen Z fashion was rearranged and clustered. In addition, four clusters were derived by grouping through network semantic network visualization. Group 1 is 54 texts, 'Diverse Characteristics of Z-Generation Fashion Consumers', Group 2 is 7 Texts, 'Z-Generation's teenagers Fashion Powers', Group 3 is 8 Texts, 'Z-Generation's Celebrity Fashions' Interest and Fashion', Group 4 named 'Gucci', the most popular luxury fashion of the Z-generation as one text.