DOI QR코드

DOI QR Code

클라우드 환경에서 검색 효율성 개선과 프라이버시를 보장하는 유사 중복 검출 기법

Efficient and Privacy-Preserving Near-Duplicate Detection in Cloud Computing

  • 투고 : 2017.05.22
  • 심사 : 2017.08.08
  • 발행 : 2017.10.15

초록

최근 다수의 콘텐츠 서비스 제공자가 제공하는 콘텐츠 중심 서비스가 클라우드로 이전함과 동시에 온라인 상의 유사 중복 콘텐츠가 급격히 증가함에 따라, 불필요한 과잉 검색 결과를 초래하는 등 클라우드 기반 데이터 검색 서비스의 품질이 저하하고 있다. 또한 데이터 보호법 등에 의거, 각 서비스 제공자는 서로 다른 비밀키를 이용하여 콘텐츠를 암호화하기 때문에 데이터 검색이 어렵다. 따라서, 검색 프라이버시를 보장하면서 유사 중복 데이터 검색의 정확도까지 보장하는 서비스의 구현은 기술적으로 어려운 실정이다. 본 연구에서는, 클라우드 환경에서 데이터 복호 없이 불필요한 검색 결과를 제거함으로써 검색서비스 품질을 제고하며, 동시에 효율성까지 개선된 유사 중복 검출 기법을 제안한다. 제안 기법은 검색 프라이버시와 콘텐츠 기밀성을 보장한다. 또한, 사용자 측면의 연산 비용 및 통신 절감을 제공하며, 빠른 검색 평가기능을 제공함으로써 유사 중복 검출 결과의 신뢰성을 보장한다. 실제 데이터를 통한 실험을 통해, 제안 기법은 기존 연구 대비 약 70.6%로 성능이 개선됨을 보인다.

As content providers further offload content-centric services to the cloud, data retrieval over the cloud typically results in many redundant items because there is a prevalent near-duplication of content on the Internet. Simply fetching all data from the cloud severely degrades efficiency in terms of resource utilization and bandwidth, and data can be encrypted by multiple content providers under different keys to preserve privacy. Thus, locating near-duplicate data in a privacy-preserving way is highly dependent on the ability to deduplicate redundant search results and returns best matches without decrypting data. To this end, we propose an efficient near-duplicate detection scheme for encrypted data in the cloud. Our scheme has the following benefits. First, a single query is enough to locate near-duplicate data even if they are encrypted under different keys of multiple content providers. Second, storage, computation and communication costs are alleviated compared to existing schemes, while achieving the same level of search accuracy. Third, scalability is significantly improved as a result of a novel and efficient two-round detection to locate near-duplicate candidates over large quantities of data in the cloud. An experimental analysis with real-world data demonstrates the applicability of the proposed scheme to a practical cloud system. Last, the proposed scheme is an average of 70.6% faster than an existing scheme.

키워드

과제정보

연구 과제번호 : 차세대 인증 기술 개발

연구 과제 주관 기관 : 정보통신기술진흥센터, 한국연구재단

참고문헌

  1. 500 hours of video uploaded to YouTube every minute, [Online]. Available: http://www.reelseo.com/hours-minute-uploaded-youtube, 2016.
  2. X. Wu, A. G. Hauptmann, and C. W. Ngo, "Practical elimination of near-duplicates from web video search," Proc. of the 15th ACM international conference on Multimedia, 2007.
  3. H. R. Motahari-Nezhad, B. Stephenson, and S. Singhal, "Outsourcing business to cloud computing services: Opportunities and challenges," IEEE Internet Computing, 2009.
  4. D. Boneh, G. Di Crescenzo, R. Ostrovsky, and G. Persiano, "Public key encryption with keyword search," International Conference on the Theory and Applications of Cryptographic Techniques, 2004.
  5. M. Bellare, A. Boldyreva, and A. O'Neill, "Deterministic and efficiently searchable encryption," Annual International Cryptology Conference, 2007.
  6. F. Bao, R. H. Deng, X. Ding, and Y. Yang, "Private query on encrypted data in multi-user settings," International Conference on Information Security Practice and Experience, 2008.
  7. C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, "Secure ranked keyword search over encrypted cloud data," Distributed Computing Systems (ICDCS), 2010.
  8. Z. Xia, X. Wang, X. Sun, and Q. Wang, "A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data," IEEE Transactions on Parallel and Distributed Systems, 2016.
  9. L. Liu, W. Lai, X. S. Hua, and S. Q. Yang, "Video histogram: A novel video signature for efficient web video duplicate detection," International Conference on Multimedia Modeling, 2007.
  10. X. Wu, A. G. Hauptmann, and C. W. Ngo, "Practical elimination of near-duplicates from web video search," Proc. of the 15th ACM international conference on Multimedia, 2007.
  11. M. Kuzu, M. S. Islam, and M. Kantarcioglu, "Enabling efficient fuzzy keyword search over encrypted data in cloud computing," IEEE Transactions on Parallel and Distributed Systems, 2010.
  12. X. Yuan, X. Wang, C. Wang, A. Squicciarini, and K. Ren, "Enabling privacy-preserving image-centric social discovery," Distributed Computing Systems (ICDCS), 2014.
  13. R. A. Popa and N. Zeldovich, "Multi-key searchable encryption," ACR Cryptology ePrint Archive, 2013.
  14. H. Cui, X. Yuan, Y. Zheng, and C. Wang, "Enabling secure and effective near-duplicate detection over encrypted in-network storage," In the 35th International Conference on Computer Communications (INFOCOM), 2016.
  15. C. Chen, Z. Zhang, and D. Feng, "Efficient ciphertext policy attribute-based encryption with constantsize ciphertext and constant computation-cost," International Conference on Provable Security, 2011.
  16. J. Yuan and S. Yu, "Efficient privacy-preserving biometric identification in cloud computing," INFOCOM, 2013.
  17. Q. Wang, S. Hu, K. Ren, M. He, M. Du, and Z. Wang, "Cloudbi: Practical privacy-preserving outsourcing of biometric identification in the cloud," European Symposium on Research in Computer Security, 2015.
  18. L. Ballard, S. Kamara, and F. Monrose, "efficient conjunctive keyword searches over encrypted data," International Conference on Information and Communications Security, 2005.
  19. P. Golle, J. Staddon, and B. Waters, "Secure conjunctive keyword search over encrypted data," International Conference on Applied Cryptography and Network Security, 2004.
  20. M. Kuzu, M. S. Islam, and M. Kantarcioglu, "Enabling efficient fuzzy keyword search over encrypted data in cloud computing," IEEE Transactions on Parallel and Distributed Systems, 2010.
  21. M. Kuzu, M. S. Islam, and M. Kantarcioglu, "Efficient similarity search over encrypted data," IEEE 28th International Conference on Data Engineering, 2012.
  22. A. Gionis, P. Indyk, and R. Motwani, "Similarity search in high dimensions via hashing," VLDB, 1999.
  23. A. De Caro and V. Iovino, "jPBC: Java pairing based cryptography," Computers and Communications (ISCC), IEEE Symposium on, pp. 850-855, 2011.
  24. INRIA Copydays, [Online]. Available: http://lear.inrialpes.fr/
  25. Python Software Foundation, "ImageHash 2.2," [Online]. Available: https://pypi.python.org/pypi/Image-Hash