DOI QR코드

DOI QR Code

An Analytical Approach to Evaluation of SSD Effects under MapReduce Workloads

  • Ahn, Sungyong (DS Software R&D Center, Samsung Electronics Co., Ltd.) ;
  • Park, Sangkyu (DS Software R&D Center, Samsung Electronics Co., Ltd.)
  • Received : 2015.04.17
  • Accepted : 2015.06.08
  • Published : 2015.10.30

Abstract

As the cost-per-byte of SSDs dramatically decreases, the introduction of SSDs to Hadoop becomes an attractive choice for high performance data processing. In this paper the cost-per-performance of SSD-based Hadoop cluster (SSD-Hadoop) and HDD-based Hadoop cluster (HDD-Hadoop) are evaluated. For this, we propose a MapReduce performance model using queuing network to simulate the execution time of MapReduce job with varying cluster size. To achieve an accurate model, the execution time distribution of MapReduce job is carefully profiled. The developed model can precisely predict the execution time of MapReduce jobs with less than 7% difference for most cases. It is also found that SSD-Hadoop is 20% more cost efficient than HDD-Hadoop because SSD-Hadoop needs a smaller number of nodes than HDD-Hadoop to achieve a comparable performance, according to the results of simulation with varying the number of cluster nodes.

Keywords

References

  1. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Operating Systems Design and Implementation, 2004, OSDI 2004, 6th Symposium on, Dec., 2004.
  2. S. Ghemawat, H. Gobioff, and S. Leung, "The Google File System," Symposium on Operating systems principles, 2003. SOSP 2003, 19th ACM symposium on, pp. 29-43, Dec., 2003.
  3. Apache Hadoop Project, http://hadoop.apache.org
  4. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," Mass Storage Systems and Technologies, 2010, MSST 2010, IEEE 26th Symposium on, May, 2010.
  5. S. Moon, J. Lee, and Y. Kee, "Introducing SSDs to the Hadoop MapReduce Framework," Cloud Computing, 2014, CLOUD 2014, IEEE 7th International Conference on, July, 2014.
  6. K. Kambatla, and Y. Chen, "The Truth About MapReduce Performance on SSDs," Large Installation System Administration, 2014, LISA 2014, 28th USENIX conference on, pp. 109-117, Nov., 2014.
  7. X. Yang and J. Sun, "An analytical performance model of MapReduce," Cloud Computing and Intelligence Systems, 2011, CCIS 2011, IEEE International Conference on, pp. 306-310, Sept., 2011.
  8. Cloudera Inc., CDH (Cloudera Distributed Hadoop), http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
  9. S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, "The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis," Data Engineering Workshops, 2010, ICDEW 2010, IEEE 26th International Conference on, pp. 41-51, Mar., 2010.
  10. JMT (Java Modeling Tools), http://jmt.sourceforge.net/
  11. E. Krevat, T. Shiran, E. Anderson, J. Tucek, J. J. Wylie, and G. R. Ganger, "Understanding Inefficiencies in Data-Intensive Computing," Carnegie Mellon University Technical Report, Jan., 2012.