[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13088/jiis.2016.22.4.109

Design of Client-Server Model For Effective Processing and Utilization of Bigdata

Park, Dae Seo (Dept. of Computer and Communications Engineering, Kangwon National University)
Kim, Hwa Jong (Dept. of Computer and Communications Engineering, Kangwon National University)

Publication Information

Journal of Intelligence and Information Systems / v.22, no.4, 2016 , pp. 109-122 More about this Journal

Abstract

Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.

Keywords

Big Data; Client; Server; Spark; Pre-Analysis;

Citations & Related Records

Times Cited By KSCI : 5 (Citation Analysis)

Reference
Cited By KSCI

1	Bok, K. S, M. S. Yook, Y. W. Noh, J. E. Han, Y. W. Kim, J. T. Lim, J. S. Yoo, "In-Memory Based Incremental Processing Method for Stream Query Processing in Big Data Environments", Journal of The Korea Contents Association, Vol. 16, No. 2(2016), 163-173. DOI
2	Choi, J. K, "Current status and implications of big data analysis at home and abroad", Korea Institute of S&T Evaluation and Planning, 2016. Available at http://www.kistep.re.kr (Downloaded 16 November, 2016).
3	Choi, K, H. J. Kim, "A Suggestion on the Strategy for Common Sharing of Big-DATA", Korea Institute of Information & Telecommunication Facilities Engineering, (2013), 108-114.
4	Jeon, Y. H, J. S. Jang,"Big Data Networking Considerations and Cisco Case Studies", Korean Institute of Information Technology Magazine, Vol. 10, No. 3(2012), 11-16.
5	Judith, R, "Round and Round the Garden? Big Data, Small Government and the Balance of Power in the Information Age", Journal of Law & Economic Regulation, Vol. 8, No. 1(2015), 49-61.
6	Jun, S. H, "A New Statistical Sampling Method for Reducing Computing time of Machine Learning Algorithms", Korean Institute of Intelligent Systems, Vol. 21, No. 2(2011), 171-177. DOI
7	Kim, H. J, "ODI-based data access framework for spread Big data"Information and Communications Magazine, vol. 31, No. 11(2014), 67-71.
8	Kim. T. H, "A Hybrid Under-sampling Approach for Better Bankruptcy Prediction", Journal of Intelligence and Information Systems, Vol. 21, No. 2(2015), 173-190. DOI
9	Kim, Y. S,"Agile Network Delay Time Modeling of Web Traffic", Journal of KIIT, Vol. 11, No. 9(2013), 103-110.
10	Lee, H. S, D. W. Lim, H. J. Zo,"Personal Information Overload and User Resistance in the Big Data Age", Journal of Intelligence and Information Systems, Vol. 19, No. 1(2013), 125-139.
11	Heo, S. W, "Big Data Legal Issues in Korea", Journal of Law & Economic Regulation, vol. 7, No. 2(2014), 7-21.
12	Oh, J. H, "Big Data Industry Top 10 News & Issues in 2015", National Information Society Agency(NIA), 2016. Available at http://www.nia.or.kr (Downloaded 15 November, 2016).
13	Park, J. H, H. J. Kim, S. W. Choi, S. R. Yoon, "Comparative Performance Analysis of Logistic Regression on Apache Spark Framework", Korea Computer Congress, (2015), 1531-1533.
14	Shoro, A. G, T. R Soomro, "Big Data Analysis: Apache Spark Perspective", Global Journal of Computer Science and Technology, Vol. 15, No. 1-C(2015), 7-14.
15	Zaharia, M, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica, "Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing", Networked Systems Design and Implementation(NSDI), Vol. 12, No. 4(2012), 15-28.
16	Um, J. H, T. H. Kim, S. W. Lee, C. H. Jung and H. M, Jung, "Next-generation real-time big data distribution system trend", Institute for Information & communications Technology Promotion, 2014. Available at http://www.itfind.or.kr/itfind (Downloaded 16 November, 2016).

KSCI

Design of Client-Server Model For Effective Processing and Utilization of Bigdata 빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계

Design of Client-Server Model For Effective Processing and Utilization of Bigdata