DOI QR코드

DOI QR Code

Data Access Frequency based Data Replication Method using Erasure Codes in Cloud Storage System

클라우드 스토리지 시스템에서 데이터 접근빈도와 Erasure Codes를 이용한 데이터 복제 기법

  • Received : 2013.11.08
  • Published : 2014.02.25

Abstract

Cloud storage system uses a distributed file system for storing and managing data. Traditional distributed file system makes a triplication of data in order to restore data loss in disk failure. However, enforcing data replication method increases storage utilization and causes extra I/O operations during replication process. In this paper, we propose a data replication method using erasure codes in cloud storage system to improve storage space efficiency and I/O performance. In particular, according to data access frequency, the proposed method can reduce the number of data replications but using erasure codes can keep the same data recovery performance. Experimental results show that proposed method improves performance in storage efficiency 40%, read throughput 11%, write throughput 10% better than HDFS does.

클라우드 스토리지 시스템은 데이터의 저장과 관리를 위해서 분산 파일시스템을 사용한다. 기존 분산 파일시스템은 데이터 디스크의 손실 발생시 이를 복구하기 위해서 3개의 복제본을 만든다. 그러나 데이터 복제 기법은 저장공간을 원본 파일의 복제 횟수만큼 필요로하고 복제과정에서 입출력 발생이 증가하는 문제가 있다. 본 논문에서는 SSD 기반 클라우드 스토리지 시스템에서 저장공간 효율성 향상과 입출력 성능 향상을 위하여 Erasure Codes를 이용한 데이터 복제 기법을 제안한다. 특히, 데이터 접근 빈도에 따라 복제 횟수를 줄이더라도 Erasure Codes를 사용하여 데이터 복구 성능을 동일하게 유지하였다. 실험 결과 제안한 기법이 HDFS 보다 저장공간 효율성은 최대 약40% 향상되었으며, 읽기성능은 약11%, 쓰기성능은 약10% 향상됨을 확인하였다.

Keywords

References

  1. D. J. Abadi, "Data Management in the Cloud: Limitations and Opportunities," in Proc. of IEEE Conf. on Data Engineering, pp.1-10, Shanghai, China, March 2009.
  2. K. Shvachko, H. Kuang, S. Radia, "The Hadoop Distributed File System," in Proc. of IEEE Conf. on Mass Storage System and Technologies, pp.1-10, Santa Clara, California, USA, May 2010.
  3. J. Wang, W. Gong, P. Varman, C. Xie, "Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System," in Proc. of ACM and IEEE Conf. on 13th Grid Computing, pp.174-183, Beijing, China, September 2012.
  4. B. Mao, H. Jiang, S. Wu, Y. Fu, L. Tian, "SAR: SSD Assisted Restore Optimization for Deduplication-based Storage System in the Cloud," in Proc. of IEEE Conf. on 7th Networking, Architecture, and Storage, pp.328-337, Xiamen, China, June 2012.
  5. B. Fan, W. Tantisiriroj, L. Xiao, G. Gibson, "DiskReduce: RAID for Data-Intensive Scalable Computing," in Proc. of ACM Conf. on Supercomputing PDSW'09, pp.6-10, Portland, Oregon, USA, November 2009.
  6. S. Plank, S Simmerman, C. D. Schuman, "Jerasure: A Library in C/C++ Facilitating Erasure Coding for Storage Applications," Technical Report CS-08-627, University of Tennessee Department of Electrical Engineering and Computer Science, pp.1-59, August 2008.
  7. J. S. Plank, "A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems," Software-Practice & Experience, Vol.27, No.9, pp.995-1012, September 1997. https://doi.org/10.1002/(SICI)1097-024X(199709)27:9<995::AID-SPE111>3.0.CO;2-6
  8. M. Blaum, J. Brandy, J. Bruck, M. Jai, "EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures," IEEE Transactions on Computers, Vol.44, No.2, pp.192-202, February 1995. https://doi.org/10.1109/12.364531
  9. L. H. James, "WEAVER Codes: Highly Fault Tolerant Erasure Codes for Storage Systems," in Proc. of ACM Conf. on FAST, pp.1-10, 2005.
  10. X. Lihao, J. Bruck, "X-code: MDS array codes with optimal encoding," IEEE Transactions on Information Theory, Vol.45, No.1, pp.272-276, January 1999. https://doi.org/10.1109/18.746809
  11. J-H. Jo, J-K. Kim, P. Mehdi, D-H. Kim, "Data Replication Method using Erasure Code in SSD based Cloud Storage System," in Proc. of IEEK Conf. on Summer Conference, Vol.36, No.1, pp.1539-1542, Jeju, Korea, July 2013.
  12. J-K. Kim, J-H. Jo, P. Mehdi, D-H. Kim, "Unified De-duplication Method of Data and Parity Disks in SSD-based RAID Storage," in Proc. of IEEK Conf. on Summer Conference, Vol.36, No.1, pp.1543-1546, Jeju, Korea, July 2013.
  13. D. Park, D. H. C. Du, "Hot Data Identification for Flash-based Storage Systems Using Multiple Bloom Filters," in Proc. of IEEE on 27th Mass Storage Systems and Technologies(MSST), pp.1-11, Denver, Colorado, USA, May 2011.
  14. J.-W. Hsieh, L.-P. Chang, T.-W. Kuo, "Efficient Online Identification of Hot Data for Flash-Memory Management," in Proc. of ACM on 20th Symposium on Applied Computing(SAC), pp.838-842, Santa Fe, New Mexico, March 2005.
  15. H.-S. Lee, H.-S. Yun, D.-H. Lee, "HFTL: Hybrid Flash Translation Layer based on Hot Data Identification for Flash Memory," IEEE Transactions on Consumer Electronics, Vol.55, No.4, pp.2005-2011, November 2009. https://doi.org/10.1109/TCE.2009.5373762