Browse > Article
http://dx.doi.org/10.3837/tiis.2019.10.023

Spatial Statistic Data Release Based on Differential Privacy  

Cai, Sujin (College of Computer and Information, HoHai University)
Lyu, Xin (College of Computer and Information, HoHai University)
Ban, Duohan (College of Computer and Information, HoHai University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.10, 2019 , pp. 5244-5259 More about this Journal
Abstract
With the continuous development of LBS (Location Based Service) applications, privacy protection has become an urgent problem to be solved. Differential privacy technology is based on strict mathematical theory that provides strong privacy guarantees where it supposes that the attacker has the worst-case background knowledge and that knowledge has been applied to different research directions such as data query, release, and mining. The difficulty of this research is how to ensure data availability while protecting privacy. Spatial multidimensional data are usually released by partitioning the domain into disjointed subsets, then generating a hierarchical index. The traditional data-dependent partition methods need to allocate a part of the privacy budgets for the partitioning process and split the budget among all the steps, which is inefficient. To address such issues, a novel two-step partition algorithm is proposed. First, we partition the original dataset into fixed grids, inject noise and synthesize a dataset according to the noisy count. Second, we perform IH-Tree (Improved H-Tree) partition on the synthetic dataset and use the resulting partition keys to split the original dataset. The algorithm can save the privacy budget allocated to the partitioning process and obtain a more accurate release. The algorithm has been tested on three real-world datasets and compares the accuracy with the state-of-the-art algorithms. The experimental results show that the relative errors of the range query are considerably reduced, especially on the large scale dataset.
Keywords
LBS; Differential Privacy; Data Partitioning; Data Release; Range Query;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "L-diversity: Privacy beyond K-anonymity," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, no. 1, pp. 24, 2007.
2 N. Li, T. Li, and S. Venkatasubramanian, "t-closeness: Privacy beyond k-anonymity and l-diversity," in Proc. of IEEE 23rd International Conference on Data Engineering, pp. 106-115, 2007.
3 C. Dwork, "Differential privacy," in Proc. of the 33rd International Colloquium on Automata, languages and programming, pp. 1-12, 2006.
4 C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating noise to sensitivity in private data analysis," in Proc. of the 3th Theory of Cryptography, pp. 265-284, 2006.
5 L. Fan, L. Xiong, and V. Sunderam, "Differentially private multi-dimensional time series release for traffic monitoring," Data and Applications Security and Privacy XXVII, pp. 33-48, 2013.
6 H. To, G. Ghinita, and C. Shahabi, "Privgeocrowd: A toolbox for studying private spatial crowdsourcing," in Proc. of the 31st IEEE International Conference on Data Engineering, pp. 1404-1407, 2015.
7 H. Samet, J. Gray, "Foundations of multidimensional and metric data structures," Morgan Kaufmann, vol. 45, no. 7, pp. 1165-1177, 2006.
8 A. Inan, M. Kantarcioglu, G. Ghinita, E. Bertino, "Private record matching using differential privacy," in Proc. of the 13th International Conference on Extending Database Technology, pp.123-134, 2010.
9 G. Cormode, C. Procopiuc, D. Srivastava, E. Shen, and T. Yu. "Differentially private spatial decompositions," in Proc. of IEEE 28th International Conference on Data Engineering (ICDE), pp. 20-31, 2012.
10 H. To, L. Fan, "Differentially Private H-Tree," in Proc. of the 2nd ACM SIGSPATIAL Workshop on Privacy in Geographic Information Collection and Analysis, 2015.
11 F. McSherry and K. Talwar, "Mechanism design via differential privacy," in Proc. of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 94-103, 2007.
12 R. Chen, N. Mohammed and B.C.M. Fung, "Publishing set-valued data via differential privacy," in Proc. of the 37th Conference of Very Large Databases (VLDB), vol. 4, no.11, pp.1087-1098, 2011.
13 W. Qardaji, W. Yang, and N. Li, "Differentially private grids for geospatial data," in Proc. of IEEE 29th International Conference on Data Engineering (ICDE), pp. 757-768, 2013.
14 Y. Xiao, L. Xiong, and C. Yuan, "Differentially private data release through multidimensional partitioning," in Proc. of the 7th VLDB Workshop on Secure Data Management, pp. 150-168, 2010.
15 J. Zhang, X. Xiao, X. Xie, "PrivTree: a differentially private algorithm for hierarchical decompositions," in Proc. of the 36th ACM International Conference on Management of Data, pp. 155-170, 2016.
16 J. Lee, C. W. Clifton, "Top-k frequent item sets via differentially private fp-trees," in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 931-940, 2014.
17 P. Xiong, T. Zhu, W. Niu, "A differentially private algorithm for location data release," Knowledge And Information System, vol. 47, no. 3, pp. 647-669, 2016.   DOI
18 R. Chen, Q. Xiao, Y. Zhang, "Differentially private high-dimensional data publication via sampling-based inference," in Proc. of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 129-138, 2015.
19 M. Fanaeepour, B. I. P. Rubinstein, "Differentially private counting of users' spatial regions," Knowledge And Information System, vol. 54, no. 1, pp. 5-32, 2018.   DOI
20 X. Zhang, K. Jin, X. Meng, "Private Spatial Decomposition with Adaptive Grid," Computer Research and Development, vol. 55, no. 6, pp. 1143-1156, 2018.
21 H. To, G. Ghinita, L. Fan, "Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing," IEEE Transactions on Mobile Computing, vol. 16, no. 4, pp. 934-949, 2017.   DOI
22 K. Al-Hussaeni, B. C. M. Fung, F. Iqbal, "Differentially private multidimensional data publishing," Knowledge And Information System, vol. 56, no. 3, pp. 717-752, 2018.   DOI
23 S. Muthukrishnan, V. Poosala, T. Suel, "On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications," in Proc. of International Conference on Database Theory ( ICDT), pp. 236-256, 1999.
24 F. McSherry, "Privacy integration queries: an extensible platform for privacy-preserving data analysis," in Proc. of the 2009 ACM SIGMOD International Conference on Management of data, pp. 19-30, 2009.
25 L. Sweeney, "k-anonymity: A model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no.05, pp. 557-570, 2002.   DOI