A Novel Method of Improving Cache Hit-rate in Hadoop MapReduce using SSD Cache

Kim, Jong-Chan;An, Jae-Hoon;Kim, Young-Hwan;Jeon, Ki-Man;

doi:10.9708/jksci.2015.20.8.001

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

제20권8호
/
Pages.1-6
/
2015
/
1598-849X(pISSN)
/
2383-9945(eISSN)

한국컴퓨터정보학회 (Korean Society of Computer Information)

DOI QR Code

A Novel Method of Improving Cache Hit-rate in Hadoop MapReduce using SSD Cache

Kim, Jong-Chan (Intelligent IDC Project Office, Korea Electronics Technology Institute) ;
An, Jae-Hoon (Intelligent IDC Project Office, Korea Electronics Technology Institute) ;
Kim, Young-Hwan (Intelligent IDC Project Office, Korea Electronics Technology Institute) ;
Jeon, Ki-Man (Intelligent IDC Project Office, Korea Electronics Technology Institute)

투고 : 2015.06.17
심사 : 2015.07.31
발행 : 2015.08.31

https://doi.org/10.9708/jksci.2015.20.8.001 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

The MapReduce Program of Hadoop Distributed File System operates on any unspecified nodes due to distributed-parallel process and block replicate for data stability. Since it is difficult to guarantee the cache locality when a Solid State Drive is used as a cache in hadoop, cache hit-rate is decreased. In this paper, we suggest a method to improve cache hit rate by pre-loading the input data of the MapReduce onto the SSD cache. To perform this method, we estimated the blocks that are used on each node by using capacity scheduler and block metadata. Eventually we could increase the performance of SSD cache by loading the blocks onto SSD cache before the Map Task run.

키워드

참고문헌

"Hadoop.", from http://hadoop.apache.org
"Solid-state drive, Wikipedia", https://en.wikipedia.org/wiki/Solid-state_drive
Shvachko K., Kuang H., Radia S., and Chansler R, "The hadoop distributed file system.", In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium, pp. 1-10, May, 2010
J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters." In Communications of the ACM, Vol.51, No.1, pp. 107-113, Jan, 2008 https://doi.org/10.1145/1327452.1327492
"Flashcache.", https://wiki.archlinux.org/index.php/Flashcache
"Flashcache project", https://github.com/facebook/flashcache
"Hadoop's Capacity Scheduler", http://hadoop.apache.org/core/docs/current/capacity_scheduler.html
Arun C. Murthy. "Apache Hadoop YARN, Moving beyond MapReduce and Batch Processing with Apache Hadoop 2", Pearson Education, pp. 153-170, 2014
S. H. Kang, D. H. Koo, W. H. Kang, and S. W. Lee, "A case for flash memory ssd in hadoop applications." International Journal of Control and Automation, Vol.6, No.1, pp. 201-210, Feb, 2013 https://doi.org/10.14257/ijca.2013.6.6.19
T. H. Keum, W. J. Lee, and C. H. Jeon, "A Performance Analysis Based on Hadoop Application's Characteristics in Cloud Computing", Journal of The Korea Society of Computer and Information, Vol.15, No.5, pp.49-56, May, 2010 https://doi.org/10.9708/jksci.2010.15.5.049
J. S. Kim, C. H. Kim, W. J. Lee, and C. H. Jeon "A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster", Journal of The Korea Society of Computer and Information, Vol.19, No.11, pp.9-15, Nov,2014 https://doi.org/10.9708/JKSCI.2014.19.11.009

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

A Novel Method of Improving Cache Hit-rate in Hadoop MapReduce using SSD Cache

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)