DOI QR코드

DOI QR Code

Similarity measurement based on Min-Hash for Preserving Privacy

  • Cha, Hyun-Jong (Dept. of Multimedia Science, Chungwoon Univ.) ;
  • Yang, Ho-Kyung (Dvision. of Information Technology Education, Sunmoon Univ.) ;
  • Song, You-Jin (Dept. of Information Management, Dongguk Univ.)
  • Received : 2022.05.25
  • Accepted : 2022.06.07
  • Published : 2022.06.30

Abstract

Because of the importance of the information, encryption algorithms are heavily used. Raw data is encrypted and secure, but problems arise when the key for decryption is exposed. In particular, large-scale Internet sites such as Facebook and Amazon suffer serious damage when user data is exposed. Recently, research into a new fourth-generation encryption technology that can protect user-related data without the use of a key required for encryption is attracting attention. Also, data clustering technology using encryption is attracting attention. In this paper, we try to reduce key exposure by using homomorphic encryption. In addition, we want to maintain privacy through similarity measurement. Additionally, holistic similarity measurements are time-consuming and expensive as the data size and scope increases. Therefore, Min-Hash has been studied to efficiently estimate the similarity between two signatures Methods of measuring similarity that have been studied in the past are time-consuming and expensive as the size and area of data increases. However, Min-Hash allowed us to efficiently infer the similarity between the two sets. Min-Hash is widely used for anti-plagiarism, graph and image analysis, and genetic analysis. Therefore, this paper reports privacy using homomorphic encryption and presents a model for efficient similarity measurement using Min-Hash.

Keywords

References

  1. Lee, JeeYoung, "A study on research trend analysis and topic class prediction of digital transformation using text mining," International journal of advanced smart convergence, Vol. 8, No. 2, pp. 183-190, 2019. https://doi.org/10.7236/IJASC.2019.8.2.183
  2. Jung, Soo-Mok, "Image Watermarking Algorithm using Spatial Encryption," The Journal of the Convergence on Culture Technology, Vol. 6, No. 1, pp. 485-488, 2020. https://doi.org/10.17703/JCCT.2020.6.1.485
  3. Hahm, Sangwoo, and Linlin Chen, "The Role of Professors' Intellectual Stimulation for Intellectual Growth among Chinese Students Who Study in Korea: The Moderating Effect of Growth Need Strength," International Journal of Advanced Culture Technology, Vol. 8, No. 3, pp. 45-53, 2020. https://doi.org/10.17703/IJACT.2020.8.3.45
  4. Seifoddini, Hamid, "A note on the similarity coefficient method and the problem of improper machine assignment in group technology applications," The international journal of production research, Vol. 27, No. 7, pp. 1161-1165, 1989. https://doi.org/10.1080/00207548908942614
  5. Welke, Pascal, Tamas Horvath, and Stefan Wrobel, "Min-hashing for probabilistic frequent subtree feature spaces," in International Conference on Discovery Science, Springer, Cham, pp.67-82, Oct, 2016.
  6. Russell, Paul F., and T. Ramachandra Rao, "On habitat and association of species of anopheline larvae in south-eastern Madras," Journal of the Malaria Institute of India, Vol. 3, No. 1, 1940.
  7. Sneath, P. H., and R. R. Sokal, "Numerical taxonomy," Bergey's manual of systematic bacteriology 1, pp. 39-42, 2006.
  8. Choi, Seung-Seok, Sung-Hyuk Cha, and Charles C. Tappert, "A survey of binary similarity and distance measures," Journal of systemics, cybernetics and informatics, Vol. 8, No. 1, pp. 43-48, 2010.
  9. Chum, Ondrej, James Philbin, and Andrew Zisserman, "Near duplicate image detection: Min-hash and TF-IDF weighting," in Bmvc. Vol. 810. pp. 812-815, Sep, 2008.
  10. Lee, David C., Qifa Ke, and Michael Isard, "Partition min-hash for partial duplicate image discovery," in European Conference on Computer Vision, Springer, Berlin, Heidelberg, pp. 648-662, Sep, 2010.
  11. Koslicki, David, and Hooman Zabeti, "Improving minhash via the containment index with applications to metagenomic analysis," Applied Mathematics and Computation, Vol. 354, pp. 206-215, 2019 https://doi.org/10.1016/j.amc.2019.02.018
  12. Tsiounis, Yiannis, and Moti Yung, "On the security of ElGamal based encryption," in International Workshop on Public Key Cryptography. Springer, Berlin, Heidelberg, pp. 117-134, Feb, 1998.
  13. Pan, Miao, Jinyuan Sun, and Yuguang Fang, "Purging the back-room dealing: Secure spectrum auction leveraging paillier cryptosystem," IEEE Journal on Selected Areas in Communications, Vol. 29, No. 4, pp. 866-876, 2011. https://doi.org/10.1109/JSAC.2011.110417
  14. Jeong, Yunsong, Joon Sik Kim, and Dong Hoon Lee, "Privacy-Preserving k-means Clustering of Encrypted Data," Journal of the Korea Institute of Information Security & Cryptology, Vol. 28, No. 6, pp. 1401-1414, 2018. https://doi.org/10.13089/JKIISC.2018.28.6.1401
  15. Almutairi, Nawal, Frans Coenen, and Keith Dures, "K-means clustering using homomorphic encryption and an updatable distance matrix: secure third party data clustering with limited data owner interaction," in International Conference on Big Data Analytics and Knowledge Discovery, Springer, Cham, pp. 274-285, Aug, 2017
  16. Syropoulos, Apostolos, "Mathematics of multisets," in Workshop on Membrane Computing, Springer, Berlin, Heidelberg, pp. 347-358, Aug, 2000.