DOI QR코드

DOI QR Code

A Study on Performing Join Queries over K-anonymous Tables

  • Received : 2017.04.28
  • Accepted : 2017.06.18
  • Published : 2017.07.31

Abstract

Recently, there has been an increasing need for the sharing of microdata containing information regarding an individual entity. As microdata usually contains sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, to avoid the privacy problems that occur through the release of microdata for public use, extensive studies have been conducted in the area of privacy-preserving data publishing (PPDP). The k-anonymity algorithm, which is the most popular method, guarantees that, for each record, there are at least k-1 other records included in the released data that have the same values for a set of quasi-identifier attributes. Given an original table, the corresponding k-anonymous table is obtained by generalizing each record in the table into an indistinguishable group, called the equivalent class, by replacing the specific values of the quasi-identifier attributes with more general values. However, query processing over the anonymized data is a very challenging task, due to generalized attribute values. In particular, the problem becomes more challenging with an equi-join query (which is the most common type of query in data analysis tasks) over k-anonymous tables, since with the generalized attribute values, it is hard to determine whether two records can be joinable. Thus, to address this challenge, in this paper, we develop a novel scheme that is able to effectively perform an equi-join between k-anonymous tables. The experiment results show that, through the proposed method, significant gains in accuracy over using a naive scheme can be achieved.

Keywords

References

  1. A. Narayanan and V. Shmatikov, "Robust De-anonymization of Large Sparse Datasets", In Proceedings of the 2008 IEEE Symposium on Security and Privacy Page, 2008.
  2. J. Kim, K.Jung, H. Lee, S. Kim, J.W. Kim and Y.D. Chung, "Models for Privacy-preserving Data Publishing : A Survey", Journal of KIISE, Vol. 44, No. 2, pp. 195-207, 2017. https://doi.org/10.5626/JOK.2017.44.2.195
  3. B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, "Privacy-preserving data publishing: A survey of recent developments", ACM Computing Surveys, 42(4), June 2010.
  4. N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C.K. Lee, "Centralized and distributed anonymization for high-dimensional healthcare data", ACM Transactions on Knowledge Discovery from Data, 4(4), October 2010.
  5. L. Sweeney, "k-anonymity: A model for protecting privacy", International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557-570, 2002. https://doi.org/10.1142/S0218488502001648
  6. K. LeFevre, D.J. DeWitt and R. Ramakrishnan, "Incognito: Efficient full domain k-anonymity", In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005.
  7. A. Machanavajjhala, D. Kifer, J. Gehrke and M. Venkitasubramaniam, "l-diversity: Privacy beyond k-anonymity", ACM Transactions on Knowledge Discovery from Data, 1(1), 2007.
  8. N. Li, T. Li and S. Venkatasubramanian, "t-closeness: Privacy beyond k-anonymity and l-diversity", In Proceedings of the International Conference on Data Engineering, 2007.
  9. S. Kim, H. Lee, Y.D. Chung, "Privacy-preserving data cub for electronic medical records: An experimental evaluation", International Journal of medical Informatics, 2017
  10. J. Byun, A. Kamra, E. Bertino, N. Li, "Efficient k-Anonymization Using Clustering Technique", DASFAA 2007: Advances in Databases: Concepts, Systems and Applications pp 188-200, 2007
  11. Health Insurance Review and Assessment Service in Korea. http://opendata.hira.or.kr (2012).