Differentially Private k-Means Clustering based on Dynamic Space Partitioning using a Quad-Tree

Goo, Hanjun;Jung, Woohwan;Oh, Seongwoong;Kwon, Suyong;Shim, Kyuseok;

doi:10.5626/JOK.2018.45.3.288

Journal of KIISE (정보과학회 논문지)

Volume 45 Issue 3
/
Pages.288-293
/
2018
/
2383-630X(pISSN)
/
2383-6296(eISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

DOI QR Code

Differentially Private k-Means Clustering based on Dynamic Space Partitioning using a Quad-Tree

쿼드 트리를 이용한 동적 공간 분할 기반 차분 프라이버시 k-평균 클러스터링 알고리즘

구한준 (서울대학교 전기 및 정보 공학부) ;
정우환 (서울대학교 전기 및 정보 공학부) ;
오성웅 (서울대학교 전기 및 정보 공학부) ;
권수용 (서울대학교 전기 및 정보 공학부) ;
심규석 (서울대학교 전기 및 정보 공학부)

Received : 2017.08.04
Accepted : 2017.12.08
Published : 2018.03.15

https://doi.org/10.5626/JOK.2018.45.3.288 Citation KSCI

⟨ Previous Next ⟩

Abstract

There have recently been several studies investigating how to apply a privacy preserving technique to publish data. Differential privacy can protect personal information regardless of an attacker's background knowledge by adding probabilistic noise to the original data. To perform differentially private k-means clustering, the existing algorithm builds a differentially private histogram and performs the k-means clustering. Since it constructs an equi-width histogram without considering the distribution of data, there are many buckets to which noise should be added. We propose a k-means clustering algorithm using a quad-tree that captures the distribution of data by using a small number of buckets. Our experiments show that the proposed algorithm shows better performance than the existing algorithm.

최근 공개되는 데이터에 적용하는 다양한 프라이버시 보호 기법들이 연구가 되어왔다. 그 중 차분 프라이버시는 본래의 데이터에 확률적인 노이즈를 더하여 공격자의 사전 지식에 상관없이 개인 정보를 보호한다. 기존 차분 프라이버시를 만족하는 k-평균 클러스터링은 데이터로부터 차분 프라이버시를 만족하는 히스토그램 형태로 바꾼 뒤. k-평균 클러스터링 알고리즘을 수행한다. 하지만 이는 데이터의 분포와 상관없이 등간격으로 히스토그램을 만들기 때문에 노이즈가 삽입되는 버킷이 많아지는 단점이 있다. 이를 해결하기 위해 본 논문에서는 데이터의 분포를 더 적은 버킷으로 나타낼 수 있는 쿼드 트리를 이용하여 히스토그램을 만든 뒤 k-평균을 찾는 알고리즘을 제안한다. 또한, 실험을 통해 기존의 알고리즘보다 더 좋은 성능을 가지는 것을 보인다.

Keywords

Acknowledgement

Grant : 차분 프라이버시 기반 비식별화 기술 개발

Supported by : 한국연구재단, 정보통신기술진흥센터

References

L. Sweeney, "k-anonymity: A model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, No. 5, pp. 557-570, 2002. https://doi.org/10.1142/S0218488502001648
A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, "l-diversity: Privacy beyond k-anonymity," TKDD, Vol. 1, No. 1, pp. 24-24, 2007.
R. Wong, A. Fu, K. Wang, J. Pei, "Minimality attack in privacy preserving data publishing," Proc. of the 33rd VLDB, pp. 543-554, 2007.
C. Dwork, "Differential privacy," ICALP, pp. 1-12, 2006.
X. Xiao, G. Wang, J. Gehrke, "Differential privacy via wavelet transforms," IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 8, pp. 1200-1214, 2011. https://doi.org/10.1109/TKDE.2010.247
J. Xu, Z. Zhang, X. Xiao, Y. Yang, G. Yu, M. Winslett, "Differentially private histogram publication," The VLDB Journal., Vol. 22, No. 6, pp. 797-822, 2013. https://doi.org/10.1007/s00778-013-0309-y
F. McSherry, "Privacy integrated queries: an extensible platform for privacy-preserving data analysis," Proc. of the 2009 ACM SIGMOD International Conference on Management of data, pp. 19-30, 2009.
Zhang, J., Xiao, X., Yang, Y., Zhang, Z., & Winslett, M., "PrivGene: differentially private model fitting using genetic algorithms," Proc. of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 665-676, 2013.
Su, D., Cao, J., Li, N., Bertino, E., & Jin, H., "Differentially private k-means clustering," Proc. of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26-37, 2016.
Ho, S. S. and S. Ruan, "Differential privacy for location pattern mining," Proc. of the 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS, pp. 17-24, 2011.
Qardaji, Wahbeh, W. Yang, and N. Li, "Differentially private grids for geospatial data," Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, pp. 757-768, 2013.
https://snap.stanford.edu/data/loc-gowalla.html

Journal of KIISE (정보과학회 논문지)

Differentially Private k-Means Clustering based on Dynamic Space Partitioning using a Quad-Tree

쿼드 트리를 이용한 동적 공간 분할 기반 차분 프라이버시 k-평균 클러스터링 알고리즘

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)