DOI QR코드

DOI QR Code

Design of Client-Server Model For Effective Processing and Utilization of Bigdata

빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계

  • Park, Dae Seo (Dept. of Computer and Communications Engineering, Kangwon National University) ;
  • Kim, Hwa Jong (Dept. of Computer and Communications Engineering, Kangwon National University)
  • 박대서 (강원대학교 컴퓨터정보통신학과) ;
  • 김화종 (강원대학교 컴퓨터정보통신학과)
  • Received : 2016.11.22
  • Accepted : 2016.12.21
  • Published : 2016.12.31

Abstract

Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.

최근 빅데이터 분석은 기업과 전문가뿐만 아니라 개인이나 비전문가들도 큰 관심을 갖는 분야로 발전하였다. 그에 따라 현재 공개된 데이터 또는 직접 수집한 이터를 분석하여 마케팅, 사회적 문제 해결 등에 활용되고 있다. 국내에서도 다양한 기업들과 개인이 빅데이터 분석에 도전하고 있지만 빅데이터 공개의 제한과 수집의 어려움으로 분석 초기 단계에서부터 어려움을 겪고 있다. 본 논문에서는 빅데이터 공유를 방해하는 개인정보, 빅트래픽 등의 요소들에 대한 기존 연구와 사례들을 살펴보고 정책기반의 해결책이 아닌 시스템을 통해서 빅데이터 공유 제한 문제를 해결 할 수 있는 클라이언트-서버 모델을 이용해 빅데이터를 공개 및 사용 할 때 발생하는 문제점들을 해소하고 공유와 분석 활성화를 도울 수 있는 방안에 대해 기술한다. 클라이언트-서버 모델은 SPARK를 활용해 빠른 분석과 사용자 요청을 처리하며 Server Agent와 Client Agent로 구분해 데이터 제공자가 데이터를 공개할 때 서버 측의 프로세스와 데이터 사용자가 데이터를 사용하기 위한 클라이언트 측의 프로세스로 구분하여 설명한다. 특히, 빅데이터 공유, 분산 빅데이터 처리, 빅트래픽 문제에 초점을 맞추어 클라이언트-서버 모델의 세부 모듈을 구성하고 각 모듈의 설계 방법에 대해 제시하고자 한다. 클라이언트-서버 모델을 통해서 빅데이터 공유문제를 해결하고 자유로운 공유 환경을 구성하여 안전하게 빅데이터를 공개하고 쉽게 빅데이터를 찾는 이상적인 공유 서비스를 제공할 수 있다.

Keywords

References

  1. Bok, K. S, M. S. Yook, Y. W. Noh, J. E. Han, Y. W. Kim, J. T. Lim, J. S. Yoo, "In-Memory Based Incremental Processing Method for Stream Query Processing in Big Data Environments", Journal of The Korea Contents Association, Vol. 16, No. 2(2016), 163-173. https://doi.org/10.5392/JKCA.2016.16.02.163
  2. Choi, J. K, "Current status and implications of big data analysis at home and abroad", Korea Institute of S&T Evaluation and Planning, 2016. Available at http://www.kistep.re.kr (Downloaded 16 November, 2016).
  3. Choi, K, H. J. Kim, "A Suggestion on the Strategy for Common Sharing of Big-DATA", Korea Institute of Information & Telecommunication Facilities Engineering, (2013), 108-114.
  4. Jeon, Y. H, J. S. Jang,"Big Data Networking Considerations and Cisco Case Studies", Korean Institute of Information Technology Magazine, Vol. 10, No. 3(2012), 11-16.
  5. Judith, R, "Round and Round the Garden? Big Data, Small Government and the Balance of Power in the Information Age", Journal of Law & Economic Regulation, Vol. 8, No. 1(2015), 49-61.
  6. Jun, S. H, "A New Statistical Sampling Method for Reducing Computing time of Machine Learning Algorithms", Korean Institute of Intelligent Systems, Vol. 21, No. 2(2011), 171-177. https://doi.org/10.5391/JKIIS.2011.21.2.171
  7. Kim, H. J, "ODI-based data access framework for spread Big data"Information and Communications Magazine, vol. 31, No. 11(2014), 67-71.
  8. Kim. T. H, "A Hybrid Under-sampling Approach for Better Bankruptcy Prediction", Journal of Intelligence and Information Systems, Vol. 21, No. 2(2015), 173-190. https://doi.org/10.13088/jiis.2015.21.2.173
  9. Kim, Y. S,"Agile Network Delay Time Modeling of Web Traffic", Journal of KIIT, Vol. 11, No. 9(2013), 103-110.
  10. Lee, H. S, D. W. Lim, H. J. Zo,"Personal Information Overload and User Resistance in the Big Data Age", Journal of Intelligence and Information Systems, Vol. 19, No. 1(2013), 125-139.
  11. Heo, S. W, "Big Data Legal Issues in Korea", Journal of Law & Economic Regulation, vol. 7, No. 2(2014), 7-21.
  12. Oh, J. H, "Big Data Industry Top 10 News & Issues in 2015", National Information Society Agency(NIA), 2016. Available at http://www.nia.or.kr (Downloaded 15 November, 2016).
  13. Park, J. H, H. J. Kim, S. W. Choi, S. R. Yoon, "Comparative Performance Analysis of Logistic Regression on Apache Spark Framework", Korea Computer Congress, (2015), 1531-1533.
  14. Shoro, A. G, T. R Soomro, "Big Data Analysis: Apache Spark Perspective", Global Journal of Computer Science and Technology, Vol. 15, No. 1-C(2015), 7-14.
  15. Um, J. H, T. H. Kim, S. W. Lee, C. H. Jung and H. M, Jung, "Next-generation real-time big data distribution system trend", Institute for Information & communications Technology Promotion, 2014. Available at http://www.itfind.or.kr/itfind (Downloaded 16 November, 2016).
  16. Zaharia, M, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica, "Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing", Networked Systems Design and Implementation(NSDI), Vol. 12, No. 4(2012), 15-28.