DOI QR코드

DOI QR Code

Limiting Attribute Disclosure in Randomization Based Microdata Release

  • Guo, Ling (University of North Carolina at Charlotte) ;
  • Ying, Xiaowei (University of North Carolina at Charlotte) ;
  • Wu, Xintao (University of North Carolina at Charlotte)
  • Received : 2011.02.01
  • Accepted : 2011.03.20
  • Published : 2011.09.30

Abstract

Privacy preserving microdata publication has received wide attention. In this paper, we investigate the randomization approach and focus on attribute disclosure under linking attacks. We give efficient solutions to determine optimal distortion parameters, such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomization approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss.

Keywords

References

  1. P. Samarati and L. Sweeney, "Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression," Proceedings of the IEEE Symposium on Research in Security and Privacy, Oakland, CA, 1998.
  2. A. Machanavajjhala, J. Gehrke, D. Kiefer, and M. Venkitasubramanian, "L-diversity: privacy beyond k-anonymity," Proceedings of the 22nd International Conference on Data Engineering, Atlanta, GA, 2006.
  3. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, "Incognito: efficient full-domain K-anonymity," ACM SIGMOD International Conference on Management of Data, Baltimore, MD, 2005, pp. 49-60.
  4. Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, "Aggregate query answering on anonymized tables," Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 116-125.
  5. X. Xiao and Y. Tao, "Anatomy: simple and effective privacy preservation," Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 2006, pp. 139-150.
  6. D. Lambert, "Measures of disclosure risk and harm," Journal of Official Statistics, vol. 9, no. 2, pp. 313-331, 1993.
  7. N. Li, T. Li, and S. Venkatasubramanian, "T-closeness: privacy beyond k-anonymity and l-diversity," Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 106-115.
  8. W. Du and Z. Zhan, "Using randomized response techniques for privacy-preserving data mining," Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2003, pp. 505-510.
  9. S. J. Rizvi and J. R. Haritsa, "Maintaining data privacy in association rule mining," Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002, pp. 682-693.
  10. L. Guo, S. Guo, and X. Wu, "Privacy preserving market basket data analysis," The 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, 2007, pp. 103-114.
  11. L. Guo and X. Wu, "Privacy preserving categorical data analysis with unknown distortion parameters," Transactions on Data Privacy, vol. 2, no. 3, pp. 185-205, 2009.
  12. Z. Teng and W. Du, "Comparisons of k-anonymization and randomization schemes under linking attacks," Proceedings of the 6th International Conference on Data Mining, Hong Kong, China, 2006, pp. 1091-1096.
  13. G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, "Approximation algorithms for k-anonymity," Journal of Privacy Technology, 20051120001, Nov. 2005.
  14. G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "A framework for efficient data anonymization under privacy and accuracy constraints," ACM Transactions on Database Systems, vol. 34, no. 2, pp. 9:1-9:47, 2009.
  15. D. Kifer and J. Gehrke, "Injecting utility into anonymized datasets," ACM SIGMOD International Conference on Management of Data, Chicago, IL, 2006, pp. 217-228.
  16. N. Koudas, D. Srivastava, T. Yu, and Q. Zhang, "Distributionbased microdata anonymization," Proceedings of the 35th International Conference on Very Large Data Bases, Lyon, France, 2009.
  17. A. Narayanan and V. Shmatikov, "Robust de-anonymization of large sparse datasets," IEEE Symposium on Security and Privacy, Oakland, CA, 2008, pp. 111-125.
  18. J. Brickell and V. Shmatikov, "The cost of privacy: destruction of data-mining utility in anonymized data publishing," The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, 2008, pp. 70-78.
  19. T. Li and N. Li, "On the tradeoff between privacy and utility in data publishing," The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 517-525.
  20. T. M. Truta and B. Vinay, "Privacy protection: p-sensitive k-anonymity property," Proceedings of the 22nd IEEE Internationl Conference on Data Engineering, Atlanta, GA, 2006, p. 94.
  21. R. C. W. Wong, J. Li, A. W. C. Fu, and K. Wang, "$({\alpha},\;k)-anonymity:$
  22. D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern, "Worst-case background knowledge for privacy-preserving data publishing," Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 126-135.
  23. R. C. W. Wong, A. W. C. Fu, K. Wang, and J. Pei, "Minimality attack in privacy preserving data publishing," Proceedings of the 33nd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 543-554.
  24. B. Chen, K. Lefevre, and R. Ramakrishnan, "Privacy skyline: privacy with multidimensional adversarial knowledge," Proceedings of the 33nd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 770-781.
  25. M. E. Nergiz, M. Atzori, and C. Clifton, "Hiding the presence of individuals from shared databases," ACM SIGMOD International Conference on Management of Data, Beijing, China, 2007, pp. 665-676.
  26. J. Li, Y. Tao, and K. Xiao, "Preservation of proximity privacy in publishing numerical sensitive data," ACM SIGMOD International Conference on Management of Data, Vancouver, BC, 2008, pp. 473-485.
  27. Q. Wei, Y. Lu, and Q. Lou, "(t, ${\lamda})-uniqueness:$ anonymity management for data publication," The 7th IEEE/ACIS International Conference on Computer and Information Science, Portland, OR, 2008, pp. 107-112. https://doi.org/10.1109/ICIS.2008.45
  28. S. L. Warner, "Randomized response: a survey technique for eliminating evasive answer bias," Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-66, 1965. https://doi.org/10.2307/2283137
  29. R. Agrawal and R. Srikant, "Privacy-preserving data mining," ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000, pp. 439-450.
  30. C. C. Aggarwal and P. S. Yu, "A survey of randomization methods for privacy-preserving data mining," Privacy-Preserving Data Mining: Models and Algorithms, C. C. Aggarwal and P. S. Yu, Eds., New York, NY: Springer US, pp. 137-156, 2008.
  31. S. Agrawal and J. R. Haritsa, "A framework for high-accuracy privacy-preserving mining," Proceedings of the 21st International Conference on Data Engineering, Tokyo, Japan, 2005, pp. 193-204.
  32. C. C. Aggarwal, "On unifying privacy and uncertain data models," Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 386-395.
  33. Y. Zhu and L. Liu, "Optimal randomization for privacy preserving data mining," The 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, 2004, pp. 761-766.
  34. D. Rebollo-Monedero, J. Forne, and J. Domingo-Ferrer, "From t-closeness-like privacy to postrandomization via information theory," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 11, pp. 1623-1636, 2010. https://doi.org/10.1109/TKDE.2009.190
  35. J. M. Gouweleeuw, P. Kooiman, L. C. R. J. Willenborg, and P. P. de Wolf, "Post randomisation for statistical disclosure control: theory and implementation," Journal of Official Statistics, vol. 14, no. 4, pp. 463-478, 1998.
  36. Z. Huang and W. Du, "OptRR: optimizing randomized response schemes for privacy-preserving data mining," Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 705-714.
  37. X. Xiao, Y. Tao, and M. Chen, "Optimal random perturbation at multiple privacy levels," Proceedings of the 35th International Conference on Very Large Data Bases, Lyon, France, 2009, pp. 814-825.
  38. R. Chaytor and K. Wang, "Small domain randomization: same privacy, more utility," Proceedings of the 36th International Conference on Very Large Data Bases, Singapore, 2010, pp. 608-618.
  39. A. Chaudhuri and R. Mukerjee, Randomized Response: Theory and Techniques, New York: Marcel Dekker, 1988.
  40. T. F. Coleman, J. Liu, and W. Yuan, "A new trust-region algorithm for equality constrained optimization," Computational Optimization and Applications, vol. 21, no. 2, pp. 177-199, 2002. https://doi.org/10.1023/A:1013764800871
  41. A. Asuncion and D. J. Newman, "UCI machine learning repository," http://mlearn.ics.uci.edu/MLRepository.html.
  42. P. Samarati, "Protecting respondents' identities in microdata release," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, 2001. https://doi.org/10.1109/69.971193
  43. W. Du, Z. Teng, and Z. Zhu, "Privacy-MaxEnt: integrating background knowledge in privacy quantification," ACM SIGMOD International Conference on Management of Data, Vancouver, BC, 2008, pp. 459-472.
  44. L. Guo, X. Ying, and X. Wu, "On attribute disclosure in randomization based privacy preserving data publishing," Proceedings of the 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 2010, pp. 466-473. https://doi.org/10.1109/ICDMW.2010.76
  45. G. Strang, Introduction to Linear Algebra, 3rd ed., Wellesley, MA: Wellesley-Cambridge Press, 2003.