Browse > Article
http://dx.doi.org/10.13089/JKIISC.2018.28.6.1401

Privacy-Preserving k-means Clustering of Encrypted Data  

Jeong, Yunsong (Graduate School of Information Security, Korea University)
Kim, Joon Sik (Graduate School of Information Security, Korea University)
Lee, Dong Hoon (Graduate School of Information Security, Korea University)
Abstract
The k-means clustering algorithm groups input data with the number of groups represented by variable k. In fact, this algorithm is particularly useful in market segmentation and medical research, suggesting its wide applicability. In this paper, we propose a privacy-preserving clustering algorithm that is appropriate for outsourced encrypted data, while exposing no information about the input data itself. Notably, our proposed model facilitates encryption of all data, which is a large advantage over existing privacy-preserving clustering algorithms which rely on multi-party computation over plaintext data stored on several servers. Our approach compares homomorphically encrypted ciphertexts to measure the distance between input data. Finally, we theoretically prove that our scheme guarantees the security of input data during computation, and also evaluate our communication and computation complexity in detail.
Keywords
Privacy-preserving clustering; Fully homomorphic encryption; k-means clustering;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Lindell and B. Pinkas, "A Proof of Security of Yao's Protocol for Two-Party Computation," Journal of Cryptology, vol. 22, no. 2, pp. 161-188, Apr. 2009.   DOI
2 Z. Brakerski, C. Gentry, and V. Vaikuntanathan, "(Leveled) fully homomorphic encryption without bootstrapping," ACM Transactions on Computation Theory, vol. 6, no. 13, July 2014.
3 HElib, "HElib", https://github.com/shaih/HElib, Last accessed 13 Dec. 2018.
4 I. Chillotti, N. Gama, M. Georgieva, and M. Izabachene, "Faster Fully Homomorphic Encryption: Bootstrapping in Less Than 0.1 Seconds," Advances in Cryptology, ASIACRYPT 2016, LNCS 10031, pp. 3-33, 2016.
5 C. Gentry, A. Sahai, and B. Waters, "Homomorphic Encryption from Learning with Errors: Conceptually - Simpler, Asymptotically-Faster, Attribute-Based," Advances in Cryptology, CRYPTO '13. LNCS 8042, pp. 75-92, 2013.
6 TFHE:Fast Fully Homomorphic Encryption over the Torus, "TFHE," https://tfhe.github.io/tfhe/, Last accessed 13 Dec. 2018.
7 B.K. Samanthula, Y. Elmehdwi, and W. Jiang, "k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1261-1273, May 2015.   DOI
8 H. Narumanchi, D. Goyal, N. Emmadi, and P. Gauravaram, "Performance Analysis of Sorting of FHE Data: Integer-Wise Comparison vs Bit-Wise Comparison," Proceedings of 2017 IEEE 31st International Conference on Advanced Information Networking and Applications, pp. 902-908, Mar. 2017.
9 J. MacQueen, "Some Methods for classification and Analysis of Multivariate Observations," Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, Jan. 1967.
10 A.C. Yao, "How to generate and exchange secrets," Proceedings of 27th Annual Symposium on Foundations of Computer Science, pp. 162-167, Oct. 1986.
11 R. Paul, and A.S.M.L. Hoque, "Clustering medical data to predict the likelihood of diseases," Proceedings of 2010 Fifth International Conference on Digital Information Management, pp. 44-49, July 2010.
12 C. Peikert and S.Shiehian, "Multi-Key FHE from LWE, Revisited", IACR ePrint 2016-196, Aug. 2016.
13 V. Lyubashevsky, C. Peikert, and O. Regev, "On Ideal Lattices and Learning with Errors over Rings", Advances in Cryptology, EUROCRYPT 2010, LNCS 6110, pp. 1-23, 2010.
14 S. Dolnicar, "Using cluster analysis for market segmentation - typical misconceptions, established methodological weaknesses and some recommendations for improvement," Australasian Journal of Market Research, vol. 11, no. 2, pp. 5-12, Nov. 2003.
15 M.N. Tuma, R. Decker, and S.W. Scholz, "A survey of the challenges and pitfalls of cluster analysis application in market segmentation," International Journal of Market Research, vol. 53, no. 3, pp. 391-414, May 2011.   DOI
16 M. GJ, "Cluster analysis and related techniques in medical research," Statistical Methods in Medical Research. vol. 1, no. 1, pp. 27-48, Mar. 1992.   DOI
17 P. Bunn and R. Ostrovsky, "Secure two-party k-means clustering," Proceedings of the 14th ACM conference on Computer and communications security, pp. 486-497, Oct. 2007.
18 G. Jagannathanand and R.N. Wright, "Privacy-preserving distributed kmeans clustering over arbitrarily partitioned data," Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 593-599, Aug. 2005.
19 S. Jha, L. Kruger, and P. McDaniel, "Privacy Preserving Clustering," European Symposium on Research in Computer Security, LNCS 3679, pp. 397-417, 2005.
20 A. Jaschke, and F. Armknecht, "Unsupervised Machine Learning on Encrypted Data," IACR ePrint, Report 2018-411, May 2018.
21 V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft, "Privacy-Preserving Ridge Regression on Hundreds of Millions of Records," Proceedings of 2013 IEEE Symposium on Security and Privacy, pp. 334-348, May 2013.