DOI QR코드

DOI QR Code

Survey on Data Deduplication in Cloud Storage Environments

  • Kim, Won-Bin (Dept. of Software Convergence, Soonchunhyang University) ;
  • Lee, Im-Yeong (Dept. of Software Convergence, Soonchunhyang University)
  • Received : 2019.02.01
  • Accepted : 2019.10.06
  • Published : 2021.06.30

Abstract

Data deduplication technology improves data storage efficiency while storing and managing large amounts of data. It reduces storage requirements by determining whether replicated data is being added to storage and omitting these uploads. Data deduplication technologies require data confidentiality and integrity when applied to cloud storage environments, and they require a variety of security measures, such as encryption. However, because the source data cannot be transformed, common encryption techniques generally cannot be applied at the same time as data deduplication. Various studies have been conducted to solve this problem. This white paper describes the basic environment for data deduplication technology. It also analyzes and compares multiple proposed technologies to address security threats.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2019R1A2C1085718) and the Republic of Korea's MSIT (Ministry of Science and ICT), under the High-Potential Individuals Global Training Program) (No. 2021-0-01516) supervised by the IITP (Institute of Information and Communications Technology Planning & Evaluation) and the BK21 FOUR (Fostering Outstanding Universities for Research) (No. 5199990914048).

References

  1. M. W. Storer, K. Greenan, D. D. Long, and E. L. Miller, "Secure data deduplication," in Proceedings of the 4th ACM International Workshop on Storage Security and Survivability, Alexandria, VA, 2008, pp. 1-10.
  2. J. R. Douceur, A. Adya, W. J. Bolosky, P. Simon, and M. Theimer, "Reclaiming space from duplicate files in a serverless distributed file system," in Proceedings 22nd International Conference on Distributed Computing Systems, Vienna, Austria, 2002, pp. 617-624.
  3. N. Kaaniche and M. Laurent, "A secure client side deduplication scheme in cloud storage environments," in Proceedings of 2014 6th International Conference on New Technologies, Mobility and Security (NTMS), Dubai, UAE, 2014, pp. 1-7.
  4. A. Brinkmann, S. Effert, F. M. auf der Heide, and C. Scheideler, "Dynamic and redundant data placement," in Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS), Toronto, Canada, 2007.
  5. A. Iyengar, R. Cahn, J. A. Garay, and C. Jutla, "Design and implementation of a secure distributed data repository," IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, 1998.
  6. A. W. Leung, E. L. Miller, and S. Jones, "Scalable security for petascale parallel file systems," in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, Reno, NV, 2007, pp. 1-12.
  7. J. Li, M. N. Krohn, D. Mazieres, and D. E. Shasha, "Secure Untrusted Data Repository (SUNDR)," in Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, 2004, pp. 121-136.
  8. E. L., Miller, D. D. Long, W. E. Freeman, and B. Reed, "Strong security for network-attached storage," in Proceedings of the 1st UNENIX Conference on File and Storage Technologies (FAST), Monterey, CA, 2002, pp. 1-13.
  9. S. Quinlan and S. Dorward, "Venti: a new approach to archival storage," in Proceedings of the 1st UNENIX Conference on File and Storage Technologies (FAST), Monterey, CA, 2002, pp. 89-101.
  10. C. Wang, Z. G. Qin, J. Peng, and J. Wang, "A novel encryption scheme for data deduplication system," in Proceedings of 2010 International Conference on Communications, Circuits and Systems (ICCCAS), 2010, pp. 265-269.
  11. M. Miao, J. Wang, H. Li, and X. Chen, "Secure multi-server-aided data deduplication in cloud computing," Pervasive and Mobile Computing, vol. 24, pp. 129-137, 2015. https://doi.org/10.1016/j.pmcj.2015.03.002
  12. J. Paulo and J. Pereira, "A survey and classification of storage deduplication systems," ACM Computing Surveys (CSUR), vol. 47, no. 1, article no. 11, 2014.
  13. S. Keelveedhi, M. Bellare, and T. Ristenpart, "Dupless: server-aided encryption for deduplicated storage," in Proceedings of the 22nd USENIX Security Symposium, Washington, DC, 2013, pp. 179-194.
  14. K. Kim, K. Y. Chang, and I. K. Kim, "Deduplication technologies over encrypted data," Electronics and Telecommunications Trends, vol. 33, no. 1, pp. 68-77, 2018. https://doi.org/10.22648/ETRI.2018.J.330107
  15. M. Bellare, S. Keelveedhi, and T. Ristenpart, "Message-locked encryption and secure deduplication," in Advances in Cryptology - EUROCRYPT 2013. Heidelberg, Germany: Springer, 2013, pp. 296-312.
  16. R. C. Merkle, "A digital signature based on a conventional encryption function," in Advances in Cryptology - CRYPT'87. Heidelberg, Germany: Springer, 1987, pp. 369-378.
  17. S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg, "Proofs of ownership in remote storage systems," in Proceedings of the 18th ACM Conference on Computer and Communications Security, Chicago, IL, 2011, pp. 491-500.
  18. J. Blasco, R. Di Pietro, A. Orfila, and A. Sorniotti, "A tunable proof of ownership scheme for deduplication using bloom filters," in Proceedings of 2014 IEEE Conference on Communications and Network Security, San Francisco, CA, 2014, pp. 481-489.
  19. L. Marques and C. J. Costa, "Secure deduplication on mobile devices," in Proceedings of the 2011 Workshop on Open Source and Design of Communication, Lisbon, Portugal, 2011, pp. 19-26.
  20. J. Xu, E. C. Chang, and J. Zhou, "Weak leakage-resilient client-side deduplication of encrypted data in cloud storage," in Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, Hangzhou, China, 2013, pp. 195-206.
  21. J. Li, X. Chen, M. Li, J. Li, P. P. Lee, and W. Lou, "Secure deduplication with efficient and reliable convergent key management," IEEE Transactions on Parallel And Distributed Systems, vol. 25, no. 6, pp. 1615-1625, 2014. https://doi.org/10.1109/TPDS.2013.284
  22. M. Naor and O. Reingold, "Number-theoretic constructions of efficient pseudo-random functions," Journal of the ACM, vol. 51, no. 2, pp. 231-262, 2004. https://doi.org/10.1145/972639.972643
  23. D. Chaum, "Blind signatures for untraceable payments," in Advances in Cryptology. Boston, MA: Springer, 1993, pp. 199-203.
  24. M. Bellare, C. Namprempre, D. Pointcheval, and M. Semanko, "The one-more-RSA-inversion problems and the security of Chaum's Blind Signature Scheme," Journal of Cryptology, vol. 16, no. 3, pp. 182-215, 2003.
  25. B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," Communications of the ACM, vol. 13, no. 7, pp. 422-426, 1970. https://doi.org/10.1145/362686.362692
  26. P. Puzio, R. Molva, M. Onen, and S. Loureiro, "ClouDedup: secure deduplication with encrypted data for cloud storage," in Proceedings of 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, Bristol, UK, 2013, pp. 363-370.
  27. P. Puzio, R. Molva, M. Onen, and S. Loureiro, "Block-level de-duplication with encrypted data," Open Journal of Cloud Computing (OJCC), vol. 1, no. 1, pp. 10-18, 2014.
  28. J. Hur, D. Koo, Y. Shin, and K. Kang, "Secure data deduplication with dynamic ownership management in cloud storage," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 11, pp. 3113-3125, 2016. https://doi.org/10.1109/TKDE.2016.2580139
  29. J. Camenisch and G. Neven, "Simulatable adaptive oblivious transfer," in Advances in Cryptology-EUROCRYPT2007. Heidelberg, Germany: Springer, 2007, pp. 573-590
  30. D. Harnik, B. Pinkas, and A. Shulman-Peleg, "Side channels in cloud services: deduplication in cloud storage," IEEE Security & Privacy, vol. 8, no. 6, pp. 40-47, 2010.
  31. D. Russell, "Data deduplication will be even bigger in 2010," 2010 [Online]. Available from: https://www.gartner.com/en/documents/1297513/data-deduplication-will-be-even-bigger-in-2010.
  32. M. Dutch, "Understanding data deduplication ratios," 2008 [Online]. Available: https://www.snia.org/sites/default/files/Understanding_Data_Deduplication_Ratios-20080718.pdf.
  33. S. Rafaeli and D. Hutchison, "A survey of key management for secure group communication," ACM Computing Surveys (CSUR), vol. 35, no. 3, pp. 309-329, 2003. https://doi.org/10.1145/937503.937506
  34. C. Park, D. Hong, C. Seo, and K. Y. Chang, "Privacy preserving source based deduplication in cloud storage," Journal of the Korea Institute of Information Security & Cryptology, vol. 25, no. 1, pp. 123-132, 2015. https://doi.org/10.13089/JKIISC.2015.25.1.123
  35. J. Wang and X. Chen, "Efficient and secure storage for outsourced data: a survey," Data Science and Engineering, vol. 1, no. 3, pp. 178-188, 2016. https://doi.org/10.1007/s41019-016-0018-9
  36. N. Cook, D. Milojicic, and V. Talwar, "Cloud management," Journal of Internet Services and Applications, vol. 3, no. 1, pp. 67-75, 2012. https://doi.org/10.1007/s13174-011-0053-8
  37. S. A. El-Booz, G. Attiya, and N. El-Fishawy, "A secure cloud storage system combining time-based one-time password and automatic blocker protocol," EURASIP Journal on Information Security, vol. 2016, no. 1, article no. 13, 2016. https://doi.org/10.1186/s13635-016-0037-0
  38. J. Kim and S. Nepal, "A cryptographically enforced access control with a flexible user revocation on untrusted cloud storage," Data Science and Engineering, vol. 1, no. 3, pp. 149-160, 2016. https://doi.org/10.1007/s41019-016-0014-0
  39. M. I. Salam, W. C. Yau, J. J. Chin, S. H. Heng, H. C. Ling, R. C. Phan, G. S. Poh, S. Y. Tan, and W. S. Yap, "Implementation of searchable symmetric encryption for privacy-preserving keyword search on cloud storage," Human-centric Computing and Information Sciences, vol. 5, article no. 19, 2015. https://doi.org/10.1186/s13673-015-0039-9
  40. U. Habiba, R. Masood, M. A. Shibli, and M. A. Niazi, "Cloud identity management security issues & solutions: a taxonomy," Complex Adaptive Systems Modeling, vol. 2, no. 1, pp. 1-37, 2014. https://doi.org/10.1186/2194-3206-2-1
  41. N. Singh and A. K. Singh, "Data privacy protection mechanisms in cloud," Data Science and Engineering, vol. 3, no. 1, pp. 24-39, 2018. https://doi.org/10.1007/s41019-017-0046-0
  42. Z. Guan, J. Li, Y. Zhang, R. Xu, Z. Wang, and T. Yang, "An efficient traceable access control scheme with reliable key delegation in mobile cloud computing," EURASIP Journal on Wireless Communications and Networking, vol. 2016, article no. 208, 2016. https://doi.org/10.1186/s13638-016-0705-2
  43. N. Fotiou, A. Machas, G. C. Polyzos, and G. Xylomenos, "Access control as a service for the cloud," Journal of Internet Services and Applications, vol. 6, no. 1, pp. 1-15, 2015. https://doi.org/10.1186/s13174-014-0015-z
  44. K. Hashizume, D. G. Rosado, E. Fernandez-Medina, and E. B. Fernandez, "An analysis of security issues for cloud computing," Journal of Internet Services and Applications, vol. 4, article no. 5, 2013. https://doi.org/10.1186/1869-0238-4-5
  45. J. Stanek, A. Sorniotti, E. Androulaki, and L. Kencl, "A secure data deduplication scheme for cloud storage," in Financial Cryptography and Data Security. Heidelberg, Germany: Springer, 2014, pp. 99-118
  46. W. K. Ng, Y. Wen, and H. Zhu, "Private data deduplication protocols in cloud storage," in Proceedings of the 27th Annual ACM Symposium on Applied Computing, Trento, Italy, 2012, pp. 441-446.
  47. X. Jin, L. Wei, M. Yu, N. Yu, and J. Sun, "Anonymous deduplication of encrypted data with proof of ownership in cloud storage," in Proceedings of 2013 IEEE/CIC International Conference on Communications in China (ICCC), Xi'an, China, 2013, pp. 224-229.
  48. J. Li, X. Chen, X. Huang, S. Tang, Y. Xiang, M. M. Hassan, and A. Alelaiwi, "Secure distributed deduplication systems with improved reliability," IEEE Transactions on Computers, vol. 64, no. 12, pp. 3569-3579, 2015. https://doi.org/10.1109/TC.2015.2401017
  49. Y. J. Shin, J. Hur, and K. Kim, "Security weakness in the proof of storage with deduplication," 2012 [Online]. Available: https://eprint.iacr.org/2012/554.pdf.