[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.22937/IJCSNS.2022.22.5.18

A Survey on the Performance Comparison of Map Reduce Technologies and the Architectural Improvement of Spark

Raghavendra, GS (Computer Science and Engineering, RVR & JC College of Engineering)
Manasa, Bezwada (Computer Science and Engineering, RVR & JC College of Engineering)
Vasavi, M. (Computer Science and Engineering, RVR & JC College of Engineering)

Publication Information

International Journal of Computer Science & Network Security / v.22, no.5, 2022 , pp. 121-126 More about this Journal

Abstract

Hadoop and Apache Spark are Apache Software Foundation open source projects, and both of them are premier large data analytic tools. Hadoop has led the big data industry for five years. The processing velocity of the Spark can be significantly different, up to 100 times quicker. However, the amount of data handled varies: Hadoop Map Reduce can process data sets that are far bigger than Spark. This article compares the performance of both spark and map and discusses the advantages and disadvantages of both above-noted technologies.

Keywords

Hadoop; spark; Map reduce;

Citations & Related Records

Reference

1	Gunturi S Raghavendra,Prof Shanthi Mahesh, Prof MVP Chandrasekhara Raohttps://www.ijrte.org/portfolioitem/e6045018520/
2	"Apache Spark Tutorial for Beginners" Data Flair 2020.
3	"Apache Spark Pros and Cons" Knowledge Hut. 2020
4	Ovidiu-Cristian Marcu , Alexandru Costan , Gabriel Antoniu , Maria S. Perez-Hernandez Bogdan Nicolae† , Radu Tudoran, Stefano Bortoli 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) ,pp.1480-1485.
5	Gautam Pal, Gangmin Li, Katie Atkinson "Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics" ,,mdpi ,pp.1-15.
6	Adesh Chimariya B. Professor Mika Mantyla, "Streaming Data AnalyticsBackground, Technologies, and Outlook," Master's Thesis, University of Oulu
7	"Apache Map Reduce" IBM technologies 2020.
8	"Real Time Cluster Computing Framework" Sandeep Dayananda, 2020
9	"Hadoop MapReduce vs Spark: A Comprehensive Analysis "Nicholas Samuel on Data Integration, ETL
10	"Limitations of Apache Spark" techvidvan 2020
11	UnGyu Han and Jinho Ahn, "Dynamic Load Balancing Method for Apache Flume Log Processing," in Advanced Science and Technology Letters, Vol.79 (IST 2014), pp.83-86
12	Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox, "HyMR: a Hybrid MapReduce Workflow System," ACM 978-1-4503-1339-1/12/06.
13	Gautam Pal, Gangmin Li, Katie Atkinson "Big Data Real Time Ingestion and Machine Learning", IEEE Second International Conference on Data Stream Mining & Processing,