DOI QR코드

DOI QR Code

Task failure resilience technique for improving the performance of MapReduce in Hadoop

  • Kavitha, C (Department of Information and Communication Engineering, Anna University) ;
  • Anita, X (Department of Computer Science and Engineering, Jerusalem College of Engineering)
  • Received : 2018.05.15
  • Accepted : 2020.04.13
  • Published : 2020.11.16

Abstract

MapReduce is a framework that can process huge datasets in parallel and distributed computing environments. However, a single machine failure during the runtime of MapReduce tasks can increase completion time by 50%. MapReduce handles task failures by restarting the failed task and re-computing all input data from scratch, regardless of how much data had already been processed. To solve this issue, we need the computed key-value pairs to persist in a storage system to avoid re-computing them during the restarting process. In this paper, the task failure resilience (TFR) technique is proposed, which allows the execution of a failed task to continue from the point it was interrupted without having to redo all the work. Amazon ElastiCache for Redis is used as a non-volatile cache for the key-value pairs. We measured the performance of TFR by running different Hadoop benchmarking suites. TFR was implemented using the Hadoop software framework, and the experimental results showed significant performance improvements when compared with the performance of the default Hadoop implementation.

Keywords

References

  1. H. Jin et al., Performance under Failures of MapReduce Applications, in Proc. IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput. (Newport Beach, CA, USA), May 2011, pp. 608-609.
  2. H. Herodotou, Hadoop performance models. arXiv:1106.0940, 2011, 1-19.
  3. H. Wang et al., BeTL: MapReduce checkpoint tactics beneath the task level, IEEE Trans. Services Comput. 9 (2016), no. 1, 84-95. https://doi.org/10.1109/TSC.2015.2453973
  4. M. Isard et al., Dryad: Distributed data parallel programs from sequential building blocks, in Proc. ACMSIGOPS, Eur. Conf. Comput. Syst. (Lisbon Portugal), Mar. 2007, pp. 59-72.
  5. J. Dean, Experiences with MapReduce, An abstraction for largescale computation, in Proc. Int. Conf. Parallel Architectures Compilation Techn. (Seattle, WA, USA), Sept. 2006, p. 1.
  6. K. Plankensteiner et al., Fault Detection, Prevention and Recovery in Current Grid Workflow Systems, Grid and Services Evolution, Springer, 2009, pp. 1-13. https://doi.org/10.1007/978-0-387-85966 -8_9.
  7. Y. Chen et al., aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters, IEEE Trans. Parallel Distrib. Syst. 28 (2017), no. 11, 3060-3073. https://doi.org/10.1109/TPDS.2017.2706686
  8. Q. Zheng, Improving MapReduce Fault Tolerance in the Cloud, in Proc. IEEE Int. Symp. Parallel Distrib. Process. (Atlanta, GA, USA), May 2010, pp. 1-6.
  9. P. Costa et al., Byzantine Fault-Tolerant MapReduce: Faults are not just crashes, in Proc. IEEE Int. Conf. Cloud Comput. Technol. Sci. (Athens, Greece), 2011, 32-39.
  10. P. Hu and W. Dai, Enhancing Fault Tolerance Based on Hadoop Cluster, Int. J. Database Theor. Appl. 7 (2014), no. 1, 37-48. https://doi.org/10.14257/ijdta.2014.7.1.04
  11. J. Lin et al., Modeling and Designing Fault-Tolerance Mechanisms for MPI-Based MapReduce Data Computing Framework, in Proc. IEEE Int. Conf. Big Data Comput. Service Applicat. (Redwood City, CA, USA), 2015, pp. 176-183.
  12. J.-A. Quiane-Ruiz et al., RAFTing MapReduce: Fast Recovery on the RAFT, in Proc. IEEE Int. Conf. Data Eng. (Hannover, Germany), Apr. 2011, pp. 589-600.
  13. R. Gu et al., SHadoop: Improving mapreduce performance by optimizing job execution mechanism in Hadoop Clusters, J. Parallel Distrib. Comput. 74 (2014), no. 3, 2166-2179. https://doi.org/10.1016/j.jpdc.2013.10.003
  14. J. Dittrich et al., Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing), Proc. VLDB Endowment 3 (2010), no. 1, 515-529. https://doi.org/10.14778/1920841.1920908
  15. https://data-flair.training/blogs/hadoop-mapper-in-mapreduce/.
  16. H. Jianfeng et al., KVBTree: A Key/Value Based Storage Structure for Large-Scale Electric Power Data, in Proc. Int. Conf. Adv. Cloud Big Data (Chengdu, China), Aug. 2016, pp. 133-137.
  17. M. Zaharia et al., Improving MapReduce performance in heterogeneous environments, in Proc. USENIX Conf. Operat. Syst. Design Implementation (San Diego, CA, USA), Dec. 2008, pp. 29-49.
  18. AWS, What Is Amazon ElastiCache for Redis?, https://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/WhatIs.html.
  19. 8K Miles, Billion Messages - Art of Architecting scalable ElastiCache Redis tier, Sept. 2014, https://8kmiles.com/blog/billion-messages-art-of-architecting-scalable-elasticache-Redis-tier.
  20. L. Chen et al., MRSIM: Mitigating Reducer Skew in MapReduce, in Proc. Int. Conf. Adv. Inf. Netw. Applicat. Workshops (Taipei, Taiwan), Mar. 2017, pp. 379-384.
  21. C. B. Walton, A. G. Dale, and R. M. Jenevein, A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins, in Proc. Int. Conf. Very Large Data Bases (Barcelona, Spain), 1991, pp. 537-548.
  22. S. Acharya, P. B. Gibbons, and V. Poosala, Congressional samples for approximate answering of group-by queries, ACM SIGMOD Record. ACM 29 (2000), no. 2, 487-498. https://doi.org/10.1145/335191.335450
  23. A. Shatdal and J. F. Naughton, Adaptive Parallel Aggregation Algorithms, ACM SIGMOD Record. ACM 24 (1995), no. 2, 104-114. https://doi.org/10.1145/568271.223801
  24. Redis, How fast is Redis?, https://Redis.io/topic s/benchmarks.