Limiting Attribute Disclosure in Randomization Based Microdata Release

Guo, Ling;Ying, Xiaowei;Wu, Xintao;

doi:10.5626/JCSE.2011.5.3.169

Journal of Computing Science and Engineering

Volume 5 Issue 3
/
Pages.169-182
/
2011
/
1976-4677(pISSN)
/
2093-8020(eISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

DOI QR Code

Limiting Attribute Disclosure in Randomization Based Microdata Release

Guo, Ling (University of North Carolina at Charlotte) ;
Ying, Xiaowei (University of North Carolina at Charlotte) ;
Wu, Xintao (University of North Carolina at Charlotte)

Received : 2011.02.01
Accepted : 2011.03.20
Published : 2011.09.30

https://doi.org/10.5626/JCSE.2011.5.3.169 Citation PDF KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

Privacy preserving microdata publication has received wide attention. In this paper, we investigate the randomization approach and focus on attribute disclosure under linking attacks. We give efficient solutions to determine optimal distortion parameters, such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomization approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss.

Keywords

References

P. Samarati and L. Sweeney, "Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression," Proceedings of the IEEE Symposium on Research in Security and Privacy, Oakland, CA, 1998.
A. Machanavajjhala, J. Gehrke, D. Kiefer, and M. Venkitasubramanian, "L-diversity: privacy beyond k-anonymity," Proceedings of the 22nd International Conference on Data Engineering, Atlanta, GA, 2006.
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, "Incognito: efficient full-domain K-anonymity," ACM SIGMOD International Conference on Management of Data, Baltimore, MD, 2005, pp. 49-60.
Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, "Aggregate query answering on anonymized tables," Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 116-125.
X. Xiao and Y. Tao, "Anatomy: simple and effective privacy preservation," Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 2006, pp. 139-150.
D. Lambert, "Measures of disclosure risk and harm," Journal of Official Statistics, vol. 9, no. 2, pp. 313-331, 1993.
N. Li, T. Li, and S. Venkatasubramanian, "T-closeness: privacy beyond k-anonymity and l-diversity," Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 106-115.
W. Du and Z. Zhan, "Using randomized response techniques for privacy-preserving data mining," Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2003, pp. 505-510.
S. J. Rizvi and J. R. Haritsa, "Maintaining data privacy in association rule mining," Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002, pp. 682-693.
L. Guo, S. Guo, and X. Wu, "Privacy preserving market basket data analysis," The 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, 2007, pp. 103-114.
L. Guo and X. Wu, "Privacy preserving categorical data analysis with unknown distortion parameters," Transactions on Data Privacy, vol. 2, no. 3, pp. 185-205, 2009.
Z. Teng and W. Du, "Comparisons of k-anonymization and randomization schemes under linking attacks," Proceedings of the 6th International Conference on Data Mining, Hong Kong, China, 2006, pp. 1091-1096.
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, "Approximation algorithms for k-anonymity," Journal of Privacy Technology, 20051120001, Nov. 2005.
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "A framework for efficient data anonymization under privacy and accuracy constraints," ACM Transactions on Database Systems, vol. 34, no. 2, pp. 9:1-9:47, 2009.
D. Kifer and J. Gehrke, "Injecting utility into anonymized datasets," ACM SIGMOD International Conference on Management of Data, Chicago, IL, 2006, pp. 217-228.
N. Koudas, D. Srivastava, T. Yu, and Q. Zhang, "Distributionbased microdata anonymization," Proceedings of the 35th International Conference on Very Large Data Bases, Lyon, France, 2009.
A. Narayanan and V. Shmatikov, "Robust de-anonymization of large sparse datasets," IEEE Symposium on Security and Privacy, Oakland, CA, 2008, pp. 111-125.
J. Brickell and V. Shmatikov, "The cost of privacy: destruction of data-mining utility in anonymized data publishing," The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, 2008, pp. 70-78.
T. Li and N. Li, "On the tradeoff between privacy and utility in data publishing," The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 517-525.
T. M. Truta and B. Vinay, "Privacy protection: p-sensitive k-anonymity property," Proceedings of the 22nd IEEE Internationl Conference on Data Engineering, Atlanta, GA, 2006, p. 94.
R. C. W. Wong, J. Li, A. W. C. Fu, and K. Wang, "$({\alpha},\;k)-anonymity:$
D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern, "Worst-case background knowledge for privacy-preserving data publishing," Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 126-135.
R. C. W. Wong, A. W. C. Fu, K. Wang, and J. Pei, "Minimality attack in privacy preserving data publishing," Proceedings of the 33nd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 543-554.
B. Chen, K. Lefevre, and R. Ramakrishnan, "Privacy skyline: privacy with multidimensional adversarial knowledge," Proceedings of the 33nd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 770-781.
M. E. Nergiz, M. Atzori, and C. Clifton, "Hiding the presence of individuals from shared databases," ACM SIGMOD International Conference on Management of Data, Beijing, China, 2007, pp. 665-676.
J. Li, Y. Tao, and K. Xiao, "Preservation of proximity privacy in publishing numerical sensitive data," ACM SIGMOD International Conference on Management of Data, Vancouver, BC, 2008, pp. 473-485.
Q. Wei, Y. Lu, and Q. Lou, "(t, ${\lamda})-uniqueness:$ anonymity management for data publication," The 7th IEEE/ACIS International Conference on Computer and Information Science, Portland, OR, 2008, pp. 107-112. https://doi.org/10.1109/ICIS.2008.45
S. L. Warner, "Randomized response: a survey technique for eliminating evasive answer bias," Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-66, 1965. https://doi.org/10.2307/2283137
R. Agrawal and R. Srikant, "Privacy-preserving data mining," ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000, pp. 439-450.
C. C. Aggarwal and P. S. Yu, "A survey of randomization methods for privacy-preserving data mining," Privacy-Preserving Data Mining: Models and Algorithms, C. C. Aggarwal and P. S. Yu, Eds., New York, NY: Springer US, pp. 137-156, 2008.
S. Agrawal and J. R. Haritsa, "A framework for high-accuracy privacy-preserving mining," Proceedings of the 21st International Conference on Data Engineering, Tokyo, Japan, 2005, pp. 193-204.
C. C. Aggarwal, "On unifying privacy and uncertain data models," Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 386-395.
Y. Zhu and L. Liu, "Optimal randomization for privacy preserving data mining," The 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, 2004, pp. 761-766.
D. Rebollo-Monedero, J. Forne, and J. Domingo-Ferrer, "From t-closeness-like privacy to postrandomization via information theory," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 11, pp. 1623-1636, 2010. https://doi.org/10.1109/TKDE.2009.190
J. M. Gouweleeuw, P. Kooiman, L. C. R. J. Willenborg, and P. P. de Wolf, "Post randomisation for statistical disclosure control: theory and implementation," Journal of Official Statistics, vol. 14, no. 4, pp. 463-478, 1998.
Z. Huang and W. Du, "OptRR: optimizing randomized response schemes for privacy-preserving data mining," Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 705-714.
X. Xiao, Y. Tao, and M. Chen, "Optimal random perturbation at multiple privacy levels," Proceedings of the 35th International Conference on Very Large Data Bases, Lyon, France, 2009, pp. 814-825.
R. Chaytor and K. Wang, "Small domain randomization: same privacy, more utility," Proceedings of the 36th International Conference on Very Large Data Bases, Singapore, 2010, pp. 608-618.
A. Chaudhuri and R. Mukerjee, Randomized Response: Theory and Techniques, New York: Marcel Dekker, 1988.
T. F. Coleman, J. Liu, and W. Yuan, "A new trust-region algorithm for equality constrained optimization," Computational Optimization and Applications, vol. 21, no. 2, pp. 177-199, 2002. https://doi.org/10.1023/A:1013764800871
A. Asuncion and D. J. Newman, "UCI machine learning repository," http://mlearn.ics.uci.edu/MLRepository.html.
P. Samarati, "Protecting respondents' identities in microdata release," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, 2001. https://doi.org/10.1109/69.971193
W. Du, Z. Teng, and Z. Zhu, "Privacy-MaxEnt: integrating background knowledge in privacy quantification," ACM SIGMOD International Conference on Management of Data, Vancouver, BC, 2008, pp. 459-472.
L. Guo, X. Ying, and X. Wu, "On attribute disclosure in randomization based privacy preserving data publishing," Proceedings of the 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 2010, pp. 466-473. https://doi.org/10.1109/ICDMW.2010.76
G. Strang, Introduction to Linear Algebra, 3rd ed., Wellesley, MA: Wellesley-Cambridge Press, 2003.

Journal of Computing Science and Engineering

Limiting Attribute Disclosure in Randomization Based Microdata Release

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)