• Title/Summary/Keyword: utility detection

Search Result 222, Processing Time 0.017 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Research Trend Analysis Using Bibliographic Information and Citations of Cloud Computing Articles: Application of Social Network Analysis (클라우드 컴퓨팅 관련 논문의 서지정보 및 인용정보를 활용한 연구 동향 분석: 사회 네트워크 분석의 활용)

  • Kim, Dongsung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.195-211
    • /
    • 2014
  • Cloud computing services provide IT resources as services on demand. This is considered a key concept, which will lead a shift from an ownership-based paradigm to a new pay-for-use paradigm, which can reduce the fixed cost for IT resources, and improve flexibility and scalability. As IT services, cloud services have evolved from early similar computing concepts such as network computing, utility computing, server-based computing, and grid computing. So research into cloud computing is highly related to and combined with various relevant computing research areas. To seek promising research issues and topics in cloud computing, it is necessary to understand the research trends in cloud computing more comprehensively. In this study, we collect bibliographic information and citation information for cloud computing related research papers published in major international journals from 1994 to 2012, and analyzes macroscopic trends and network changes to citation relationships among papers and the co-occurrence relationships of key words by utilizing social network analysis measures. Through the analysis, we can identify the relationships and connections among research topics in cloud computing related areas, and highlight new potential research topics. In addition, we visualize dynamic changes of research topics relating to cloud computing using a proposed cloud computing "research trend map." A research trend map visualizes positions of research topics in two-dimensional space. Frequencies of key words (X-axis) and the rates of increase in the degree centrality of key words (Y-axis) are used as the two dimensions of the research trend map. Based on the values of the two dimensions, the two dimensional space of a research map is divided into four areas: maturation, growth, promising, and decline. An area with high keyword frequency, but low rates of increase of degree centrality is defined as a mature technology area; the area where both keyword frequency and the increase rate of degree centrality are high is defined as a growth technology area; the area where the keyword frequency is low, but the rate of increase in the degree centrality is high is defined as a promising technology area; and the area where both keyword frequency and the rate of degree centrality are low is defined as a declining technology area. Based on this method, cloud computing research trend maps make it possible to easily grasp the main research trends in cloud computing, and to explain the evolution of research topics. According to the results of an analysis of citation relationships, research papers on security, distributed processing, and optical networking for cloud computing are on the top based on the page-rank measure. From the analysis of key words in research papers, cloud computing and grid computing showed high centrality in 2009, and key words dealing with main elemental technologies such as data outsourcing, error detection methods, and infrastructure construction showed high centrality in 2010~2011. In 2012, security, virtualization, and resource management showed high centrality. Moreover, it was found that the interest in the technical issues of cloud computing increases gradually. From annual cloud computing research trend maps, it was verified that security is located in the promising area, virtualization has moved from the promising area to the growth area, and grid computing and distributed system has moved to the declining area. The study results indicate that distributed systems and grid computing received a lot of attention as similar computing paradigms in the early stage of cloud computing research. The early stage of cloud computing was a period focused on understanding and investigating cloud computing as an emergent technology, linking to relevant established computing concepts. After the early stage, security and virtualization technologies became main issues in cloud computing, which is reflected in the movement of security and virtualization technologies from the promising area to the growth area in the cloud computing research trend maps. Moreover, this study revealed that current research in cloud computing has rapidly transferred from a focus on technical issues to for a focus on application issues, such as SLAs (Service Level Agreements).