DOI QR코드

DOI QR Code

도커 기반의 실시간 데이터 연계 및 처리 환경을 고려한 빅데이터 관리 플랫폼 개발

Development of Big-data Management Platform Considering Docker Based Real Time Data Connecting and Processing Environments

  • 투고 : 2021.06.15
  • 심사 : 2021.07.26
  • 발행 : 2021.08.31

초록

Real-time access is required to handle continuous and unstructured data and should be flexible in management under dynamic state. Platform can be built to allow data collection, storage, and processing from local-server or multi-server. Although the former centralize method is easy to control, it creates an overload problem because it proceeds all the processing in one unit, and the latter distributed method performs parallel processing, so it is fast to respond and can easily scale system capacity, but the design is complex. This paper provides data collection and processing on one platform to derive significant insights from various data held by an enterprise or agency in the latter manner, which is intuitively available on dashboards and utilizes Spark to improve distributed processing performance. All service utilize dockers to distribute and management. The data used in this study was 100% collected from Kafka, showing that when the file size is 4.4 gigabytes, the data processing speed in spark cluster mode is 2 minute 15 seconds, about 3 minutes 19 seconds faster than the local mode.

키워드

과제정보

이 논문은 2021년도 정부 (산업통상자원부)의 재원으로 한국산업기술진흥원의 지원을 받아 수행된 연구임 (R0006229, 차세대 생명·건강산업생태계 조성사업).

참고문헌

  1. H.R. Yu, Y.M. Na, "2020 Data Industrial White-paper: Market Status," Korea Data Agency, No. 23, pp. 110-141, 2020 (in Korean).
  2. H.S. Son, "A Study on the Legislative Direction of Data Acts about Digital New Deal," Korean Constitutional Law Association, Vol. 27, No. 1, pp. 203-252, 2021 (in Korean). https://doi.org/10.35901/kjcl.2021.27.1.203
  3. H.L. Kim, "A Knowledge Model of Data Map for Semantically Representing National Data," Journal of Digital Contents Society, Vol. 22, No. 3, pp. 491-499, 2021 (in Korean). https://doi.org/10.9728/dcs.2021.22.3.491
  4. J.M. Bang, "Data-based Administration for the Substantialization of Public Data Utilization-Focused on US Federal Data Strategy-," The Journal of Comparative Law, Vol. 21, No. 1, pp. 87-126, 2021 (in Korean). https://doi.org/10.56006/JCL.2021.21.1.3
  5. S.M. Chun, S.Y. Suk, "Development and Implementation of Smart Manufacturing Big-Data Platform Using Opensource for Failure Prognostics and Diagnosis Technology of Industrial Robotl," IEMEK J. Embed. Sys. Appl., Vol. 14, No. 4, pp. 187-195, 2019 (in Korean). https://doi.org/10.14372/IEMEK.2019.14.4.187
  6. http://www.riss.kr/index.do
  7. D.G. Kim, T.Y. Chung, "A Study on Bigdata Collection and Processing ENgine of Docker Based," Korea Institute of Intelligent Transportation Systems Proceedings of the Spring Conference, pp. 593-594, 2021 (in Korean).
  8. P. Hoenisch, W. Ingo, S. Stefan, Z. Liming, F. Alan, "Four-Fold Auto-Scaling on a Contemporary Deployment Platform Using Docker Containers," Lecture Notes in Computer Science, Vol. 2015, No. 9435, pp. 316-323, 2015.
  9. A. Hosny, V.L. Paola, R. Laubenbacher, T. Favre, "AlgoRun: a Docker-based Packaging System for Platform-agnostic Implemented Algorithms," Bioinformatics, Vol. 32, No. 15, pp. 2396-2398, 2016. https://doi.org/10.1093/bioinformatics/btw120
  10. A. Eiermann, M. Renner, M. Grossmann, K. Udo R. "On a Fog Computing Platform Built on ARM Architectures by Docker Container Technology," Communications in Computer and Information Science, Vol. 2017, No. 717, pp. 71-86, 2017.
  11. C.W. Tien, T.Y. Huang, C.W. Tien, T.C. Huang, S.Y. Kuo, "KubAnomaly: Anomaly Detection for the Docker Orchestration Platform with Neural Network Approaches," Engineering reports, Vol. 1, No. 5, pp. 1-20, 2019.
  12. R.Y. Jang, R. Lee, M.W. Park, S.H. Lee, "Development of an AI Analysis Service System based on OpenFaaS," Journal of the Korea Contents Association, Vol. 20, No. 7, pp. 97-106, 2020 (in Korean). https://doi.org/10.5392/JKCA.2020.20.07.097
  13. T.Y. Kim, J.R. Lee, T.H. Kim, I.G. Chun, J.M. Park, S.G. Jin, "Kubernetes Scheduler Framework Implementation with Realtime Resource Monitoring," IEMEK J. Embed. Sys. Appl., Vol. 15, No. 3, pp. 129-137, 2020 (in Korean). https://doi.org/10.14372/IEMEK.2020.15.3.129
  14. https://cloud.google.com/containers/?hl=ko
  15. I.S. Jeong, "Apache Kafka : From Application Development to Pipeline, Internet of Things data hub Construction," Hanbit Media, pp. 1-388, 2020 (in Korean).
  16. W.S. Ryu, "A System Design for Real-Time Monitoring of Patient Waiting Time based on Open-Source Platform," Journal of the Korea Institute of Information and Communication Engineering, Vol. 22, No. 4, pp. 575-580, 2018 (in Korean). https://doi.org/10.6109/JKIICE.2018.22.4.575
  17. https://kafka.apache.org/27/documentation.html#connect
  18. R.Y. Myung, H.C. Yu. S.K. Choi, "Performance Optimization Strategies for Fully Utilizing Apache Spark," KIPS Transactions on Computer and Communication Systems, Vol. 7, No. 1, pp. 9-18, 2018 (in Korean). https://doi.org/10.3745/KTCCS.2018.7.1.9
  19. S.Y. Ko, J.H. Won, "Processing Large-scale Data with Apache Spark," The Korean Journal of Applied Statistics, Vol. 29, No. 6, pp. 1077-1094, 2016 (in Korean). https://doi.org/10.5351/KJAS.2016.29.6.1077
  20. S.M. Baek, "Spark Two Programming for the Bigdata Analysis," Wikibooks, pp. 1-630, 2018 (in Korean).
  21. https://en.wikipedia.org/wiki/Docker_(software)
  22. https://docs.docker.com/engine/install/ubuntu/
  23. https://docs.docker.com/compose/
  24. https://en.wikipedia.org/wiki/YAML
  25. D.H. Han, Y.K. Lee, "Design of Action-Based Web Crawler Structural Configuration for Multi-Website Management," KIISE Transactions on Computing Practices, Vol. 27, No. 2, pp. 98-103, 2021 (in Korean). https://doi.org/10.5626/KTCP.2021.27.2.98
  26. https://gitlab-gemscrc.gwnu.ac.kr/dgkim1108/bigdata-analysis-platform