DOI QR코드

DOI QR Code

Shared Distributed Big-Data Processing Platform Model: a Study

대용량 분산처리 플랫폼 공유 모델 연구

  • 정환진 (성균관대학교 전자전기컴퓨터공학과) ;
  • 강태호 (성균관대학교 전자전기컴퓨터공학과) ;
  • 김규석 (한국과학기술정보연구원 정보시스템운영실) ;
  • 신영호 (한국과학기술정보연구원 정보시스템운영실) ;
  • 정진규 (성균관대학교 반도체시스템공학과)
  • Received : 2016.08.16
  • Accepted : 2016.09.09
  • Published : 2016.11.15

Abstract

With the increasing need for big data processing, building a shared big data processing platform is important to minimize time and monetary costs. In shared big data processing, multitenancy is a major requirement that needs to be addressed, in order to provide a single isolated personal big data platform for each user, but to share the underlying hardware is shared among users to increase hardware utilization. In this paper, we explore two well-known shared big data processing platform models. One is to use a native Hadoop cluster, and the other is to build a virtual Hadoop cluster for each user. For each model we verified whether it is sufficient to support multi-tenancy. We also present a method to complement unsupported multi-tenancy features in a native Hadoop cluster model. Lastly we built prototype platforms and compared the performance of both models.

최근 다양한 분야에서 빅데이터 분석의 수요가 증가하고 있다. 효과적인 빅데이터 분석을 위해 분산처리시스템을 이용하지만 시스템 구축에는 상당한 금전적, 시간적 비용이 소모된다. 따라서 시스템 구축비용절감을 위한 방안이 필요하며 빅데이터 분석 플랫폼 서비스를 제공하여 사용자의 시스템 구축비용을 절약할 수 있다. 멀티테넌시는 다수의 사용자가 하나의 서비스를 공유하는 환경을 말하며 싱글테넌트 환경에 비해 시스템 자원 이용률을 향상시킬 수 있다는 장점이 있다. 본 논문에서는 대용량 분산처리 플랫폼 모델 두 가지를 제시하며 멀티테넌시를 지원하기 위한 방안에 대해 설명한다. 첫 번째 모델은 다수의 사용자가 단일 하둡 플랫폼을 공유하는 모델로 하둡의 멀티테넌시 지원을 활용하며, 다른 모델은 가상화 클라우드 컴퓨팅 환경을 활용하여 개별 가상 하둡 클러스터를 제공하는 모델이다. 제시한 두 모델의 프로토타입을 구축하였으며 두 모델의 성능 비교와 하둡 플랫폼의 멀티테넌시 검증을 하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단, 한국과학기술정보연구원

References

  1. Paul Zikopoulos, and Chris Eaton, Understanding big data: Analytics for enterprise class hadoop and streaming data, McGraw-Hill Osborne Media, 2011.
  2. C. J. Gou, et al., "A framework for native multi-tenancy application development and management," Proc. of the 9th IEEE International Conference on E-Commerce Technology (CEC-EEE 2007), 2007.
  3. C. Bezemer, et al., "Enabling multi-tenancy: An industrial experience report," Proc. of the IEEE International Conference on Software Maintenance (ICSM), pp. 1-8, 2010.
  4. Hong Cai, Ning Wang, and Ming Jun Zhou, "A transparent approach of enabling SaaS multi-tenancy in the cloud," Proc. of the 6th World Congress on Services, 2010.
  5. [Online] Available: Hadoop, http://hadoop.apache.org/
  6. [Online] Available: Spark, http://spark.apache.org/
  7. [Online] Available: CDH, https://www.cloudera.com/
  8. [Online] Available: HDP, http://hortonworks.com/hdp/
  9. [Online] Available: MapR, hhttps://www.mapr.com/
  10. P. Barham, et al., "Xen and the art of virtualization," ACM SIGOPS Operating Systems Review, Vol. 37, No. 5, 2003.
  11. T. Deshane, et al., "Quantitative comparison of Xen and KVM," Xen Summit, Boston, MA, USA, pp. 1-2, 2008.
  12. N. Huber, et al. "Evaluating and Modeling Virtualization Performance Overhead for Cloud Environments," Proc. of the 1st International Conference on Cloud Computing and Services Science, 2011.
  13. K. H. Lee, D. W. Lee, and Y. I. Eom, "A Design for Improving File I/O Performance in a Virtualized Environment," Proc. of the 40th KIISE Fall Conference, pp. 1135-1137, 2013. (in Korean)
  14. [Online] Available: HBase, https://hbase.apache.org/
  15. [Online] Available: Hive, https://hive.apache.org/
  16. V. K. Vavilapalli, et al., "Apache hadoop yarn: Yet another resource negotiator," Proc of the 4th annual Symposium on Cloud Computing, 2013.
  17. Rao, B. Thirumala, and L. S. S. Reddy, "Survey on improved scheduling in Hadoop MapReduce in cloud environments," International Journal of Computer Applications, Vol. 34, No. 9, Nov. 2011.
  18. M. R. Jam et al., "A survey on security of Hadoop," Proc. of the 4th International eConference on Computer and Knowledge Engineering (ICCKE), 2014.
  19. S. Y. Park, and Y. S. Lee, "Analysis of Hadoop Security Vulnerabilities," Proc. of the 39th KIISE Fall Conference, pp. 25-27, 2012. (in Korean)
  20. P. P. Sharma, and Chandrakant P. Navdeti. "Securing big data hadoop: a review of security issues, threats and solution," International Journal of Computer Science and Information Technologies, Vol. 5, No. 2, pp. 2126-2131, 2014.
  21. S. Y. Park, and Y. S. Lee, "A Performance Analysis of Encryption in HDFS," Journal of KIISE : Databases, Vol. 15, No. 1, pp. 21-27, Feb. 2014. (in Korean)
  22. [Online] Available: OpenStack, https://www.openstack.org/
  23. [Online] Available: HiBench, https://github.com/intelhadoop/HiBench/