Browse > Article
http://dx.doi.org/10.5392/JKCA.2014.14.11.028

A Distributed Cache Management Scheme for Efficient Accesses of Small Files in HDFS  

Oh, Hyunkyo (충북대학교 정보통신공학부)
Kim, Kiyeon (충북대학교 정보통신공학부)
Hwang, Jae-Min (충북대학교 정보통신공학부)
Park, Junho (국방과학연구소 제1기술연구본부)
Lim, Jongtae (충북대학교 정보통신공학부)
Bok, Kyoungsoo (충북대학교 정보통신공학부)
Yoo, Jaesoo (충북대학교 정보통신공학부)
Publication Information
Abstract
In this paper, we propose the distributed cache management scheme to efficiently access small files in Hadoop Distributed File Systems(HDFS). The proposed scheme can reduce the number of metadata managed by a name node since many small files are merged and stored in a chunk. It is also possible to reduce the file access costs, by keeping the information of requested files using the client cache and data node caches. The client cache keeps small files that a user requests and metadata. Each data node cache keeps the small files that are frequently requested by users. It is shown through performance evaluation that the proposed scheme significantly reduces the processing time over the existing scheme.
Keywords
Hadoop Distributed File System; Small File; Distributed Cache; Cache Metadata;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 J. Dittrich and J. Quiane-Ruiz, "Efficient BigData Processing in Hadoop MapReduce," Proc. of VLDB Endowment, Vol.5, No.12, pp.2014-2015, 2012.   DOI
2 J. Cohen, J. Dolan, M. Dunlap, J. Hellerstein, and C. Welton, "MAD Skills: New Analysis Practices for Big Data," Proc. of VLDB Endowment, Vol.2, No.2, pp.1481-1492, 2009.   DOI
3 http://hadoop.apache.org
4 K. Schvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," Proc of IEEE Symposium on Mass Storage Systems and Technologies, pp.1-10, 2010.
5 http://hadoop.apache.org/core/docs/current/hdfs_design.html
6 J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Cluster," Communications of the ACM, Vol.51, No.1, pp.107-113, 2008.
7 류은경, 손인국, 박준호, 복경수, 유재수, "비-전용 분산 컴퓨팅 환경에서 맵-리듀스 처리 성능 최적화를 위한 효율적인 데이터 재배치 알고리즘", 한국콘텐츠학회논문지, 제13권, 제9호, pp.20-27, 2013   과학기술학회마을   DOI   ScienceOn
8 손인국, 류은경, 박준호, 복경수, 유재수, "맵-리듀스의 처리 속도 향상을 위한 데이터 접근 패턴에 따른 핫-데이터 복제 기법", 한국콘텐츠학회논문지, 제13권, 제11호, pp.21-27, 2013   과학기술학회마을   DOI   ScienceOn
9 http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
10 B. Dong, J. Qiu, O. Zheng, X. Zhong, J. Li, and Y. Li, "A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop:a Case Study by Power Point Files," Proc. of IEEE International Conference on Services Computing, pp.65-72, 2010.
11 D. Chandrasekar, R. Dakshinamurthy, P. G. Sechakumar, and B. Prabavathy, "A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System," Proc. of International Conference on Computer Communication and Informatics, pp.1-8, 2013.
12 J. Zhang, G. Wu, X. Hu, and X. Wu, "A Distributed Cache for Hadoop Distributed File System in Real-time Cloud Services," Proc. of International Conference on Grid Computing, pp.12-21, 2012.