• Title/Summary/Keyword: Large Volume Data Stream

Search Result 33, Processing Time 0.019 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Analysis on Factors Influencing Welfare Spending of Local Authority : Implementing the Detailed Data Extracted from the Social Security Information System (지방자치단체 자체 복지사업 지출 영향요인 분석 : 사회보장정보시스템을 통한 접근)

  • Kim, Kyoung-June;Ham, Young-Jin;Lee, Ki-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.141-156
    • /
    • 2013
  • Researchers in welfare services of local government in Korea have rather been on isolated issues as disables, childcare, aging phenomenon, etc. (Kang, 2004; Jung et al., 2009). Lately, local officials, yet, realize that they need more comprehensive welfare services for all residents, not just for above-mentioned focused groups. Still cases dealt with focused group approach have been a main research stream due to various reason(Jung et al., 2009; Lee, 2009; Jang, 2011). Social Security Information System is an information system that comprehensively manages 292 welfare benefits provided by 17 ministries and 40 thousand welfare services provided by 230 local authorities in Korea. The purpose of the system is to improve efficiency of social welfare delivery process. The study of local government expenditure has been on the rise over the last few decades after the restarting the local autonomy, but these studies have limitations on data collection. Measurement of a local government's welfare efforts(spending) has been primarily on expenditures or budget for an individual, set aside for welfare. This practice of using monetary value for an individual as a "proxy value" for welfare effort(spending) is based on the assumption that expenditure is directly linked to welfare efforts(Lee et al., 2007). This expenditure/budget approach commonly uses total welfare amount or percentage figure as dependent variables (Wildavsky, 1985; Lee et al., 2007; Kang, 2000). However, current practice of using actual amount being used or percentage figure as a dependent variable may have some limitation; since budget or expenditure is greatly influenced by the total budget of a local government, relying on such monetary value may create inflate or deflate the true "welfare effort" (Jang, 2012). In addition, government budget usually contain a large amount of administrative cost, i.e., salary, for local officials, which is highly unrelated to the actual welfare expenditure (Jang, 2011). This paper used local government welfare service data from the detailed data sets linked to the Social Security Information System. The purpose of this paper is to analyze the factors that affect social welfare spending of 230 local authorities in 2012. The paper applied multiple regression based model to analyze the pooled financial data from the system. Based on the regression analysis, the following factors affecting self-funded welfare spending were identified. In our research model, we use the welfare budget/total budget(%) of a local government as a true measurement for a local government's welfare effort(spending). Doing so, we exclude central government subsidies or support being used for local welfare service. It is because central government welfare support does not truly reflect the welfare efforts(spending) of a local. The dependent variable of this paper is the volume of the welfare spending and the independent variables of the model are comprised of three categories, in terms of socio-demographic perspectives, the local economy and the financial capacity of local government. This paper categorized local authorities into 3 groups, districts, and cities and suburb areas. The model used a dummy variable as the control variable (local political factor). This paper demonstrated that the volume of the welfare spending for the welfare services is commonly influenced by the ratio of welfare budget to total local budget, the population of infants, self-reliance ratio and the level of unemployment factor. Interestingly, the influential factors are different by the size of local government. Analysis of determinants of local government self-welfare spending, we found a significant effect of local Gov. Finance characteristic in degree of the local government's financial independence, financial independence rate, rate of social welfare budget, and regional economic in opening-to-application ratio, and sociology of population in rate of infants. The result means that local authorities should have differentiated welfare strategies according to their conditions and circumstances. There is a meaning that this paper has successfully proven the significant factors influencing welfare spending of local government in Korea.

Application of OECD Agricultural Water Use Indicator in Korea (우리나라에 적합한 OECD 농업용수 사용지표의 설정)

  • Hur, Seung-Oh;Jung, Kang-Ho;Ha, Sang-Keun;Song, Kwan-Cheol;Eom, Ki-Cheol
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.39 no.5
    • /
    • pp.321-327
    • /
    • 2006
  • In Korea, there is a growing competitive for water resources between industrial, domestic and agricultural consumer, and the environment as many other OECD countries. The demand on water use is also affecting aquatic ecosystems particularly where withdrawals are in excess of minimum environmental needs for rivers, lakes and wetland habits. OECD developed three indicators related to water use by the agriculture in above contexts : the first is a water use intensity indicator, which is expressed as the quantity or share of agricultural water use in total national water utilization; the second is a water stress indicator, which is expressed as the proportion of rivers (in length) subject to diversion or regulation for irrigation without reserving a minimum of limiting reference flow; and the third is a water use efficiency indicator designated as the technical and the economic efficiency. These indicators have different meanings in the aspect of water resource conservation and sustainable water use. So, it will be more significant that the indicators should reflect the intrinsic meanings of them. The problem is that the aspect of an overall water flow in the agro-ecosystem and recycling of water use not considered in the assessment of agricultural water use needed for calculation of these water use indicators. Namely, regional or meteorological characteristics and site-specific farming practices were not considered in the calculation of these indicators. In this paper, we tried to calculate water use indicators suggested in OECD and to modify some other indicators considering our situation because water use pattern and water cycling in Korea where paddy rice farming is dominant in the monsoon region are quite different from those of semi-arid regions. In the calculation of water use intensity, we excluded the amount of water restored through the ground from the total agricultural water use because a large amount of water supplied to the farm was discharged into the stream or the ground water. The resultant water use intensity was 22.9% in 2001. As for water stress indicator, Korea has not defined nor monitored reference levels of minimum flow rate for rivers subject to diversion of water for irrigation. So, we calculated the water stress indicator in a different way from OECD method. The water stress indicator was calculated using data on the degree of water storage in agricultural water reservoirs because 87% of water for irrigation was taken from the agricultural water reservoirs. Water use technical efficiency was calculated as the reverse of the ratio of irrigation water to a standard water requirement of the paddy rice. The efficiency in 2001 was better than in 1990 and 1998. As for the economic efficiency for water use, we think that there are a lot of things to be taken into considerations to make a useful indicator to reflect socio-economic values of agricultural products resulted from the water use. Conclusively, site-specific, regional or meteorogical characteristics as in Korea were not considered in the calculation of water use indicators by methods suggested in OECD(Volume 3, 2001). So, it is needed to develop a new indicators for the indicators to be more widely applicable in the world.