Privacy-Constrained Relational Data Perturbation: An Empirical Evaluation

Deokyeon Jang;Minsoo Kim;Yon Dohn Chung;

doi:10.3745/JIPS.04.0316

Journal of Information Processing Systems

Volume 20 Issue 4
/
Pages.524-534
/
2024
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Privacy-Constrained Relational Data Perturbation: An Empirical Evaluation

Deokyeon Jang (Dept. of Computer Science & Engineering, Korea University) ;
Minsoo Kim (Dept. of Computer Science & Engineering, Korea University) ;
Yon Dohn Chung (Dept. of Computer Science & Engineering, Korea University)

Received : 2023.09.26
Accepted : 2023.11.12
Published : 2024.08.31

https://doi.org/10.3745/JIPS.04.0316 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The release of relational data containing personal sensitive information poses a significant risk of privacy breaches. To preserve privacy while publishing such data, it is important to implement techniques that ensure protection of sensitive information. One popular technique used for this purpose is data perturbation, which is popularly used for privacy-preserving data release due to its simplicity and efficiency. However, the data perturbation has some limitations that prevent its practical application. As such, it is necessary to propose alternative solutions to overcome these limitations. In this study, we propose a novel approach to preserve privacy in the release of relational data containing personal sensitive information. This approach addresses an intuitive, syntactic privacy criterion for data perturbation and two perturbation methods for relational data release. Through experiments with synthetic and real data, we evaluate the performance of our methods.

Keywords

Acknowledgement

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (No. IITP-2023-2020-0-01819, IITP-2021-0-00634), and the National Research Foundation of Korea (No. NRF-2020R1A2C2013286, NRF-2021R1A6A1A13044830).

References

A. Evfimievski, J. Gehrke, and R. Srikant, "Limiting privacy breaches in privacy preserving data mining," in Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, San Diego, CA, USA, 2003, pp. 211-222. https://doi.org/10.1145/773153.773174
Y. Zhu and L. Liu, "Optimal randomization for privacy preserving data mining," in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 2004, pp. 761-766. https://doi.org/10.1145/1014052.1014153
S. Agrawal and J. R. Haritsa, "A framework for high-accuracy privacy-preserving mining," in Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, 2005, pp. 193-204. https://doi.org/10.1109/ICDE.2005.8
C. C. Aggarwal and P. S. Yu, Privacy-Preserving Data Mining: Models and Algorithms. New York, NY: Springer, 2008. https://doi.org/10.1007/978-0-387-70992-5
C. Li, "Optimizing linear queries under differential privacy," Ph.D. dissertation, University of Massachusetts Amherst, Amherst, MA, USA, 2013.
L. Sweeney, "k-anonymity: a model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002. https://doi.org/10.1142/S0218488502001648
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "l-diversity: privacy beyond k-anonymity," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, no. 1, article no. 3-es, 2007. https://doi.org/10.1145/1217299.1217302
V. T. Gowda, R. Bagai, G. Spilinek, and S. Vitalapura, "Efficient near-optimal t-closeness with low information loss," in Proceedings of 2021 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Cracow, Poland, 2021, pp. 494-498. https://doi.org/10.1109/IDAACS53288.2021.9661004
C. Dwork, "Differential privacy," in Automata, Languages, And Programming. Heidelberg, Germany: Springer, 2006, pp. 1-12. https://doi.org/10.1007/11787006_1
J. Dong, A. Roth, and W. J. Su, "Gaussian differential privacy," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 84, no. 1, pp. 3-37, 2022. https://doi.org/10.1111/rssb.12454
T. Zhu, G. Li, W. Zhou, and S. Y. Philip, "Differentially private data publishing and analysis: a survey," IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1619-1638, 2017. https://doi.org/10.1109/TKDE.2017.2697856
H. Jiang, J. Pei, D. Yu, J. Yu, B. Gong, and X. Cheng, "Applications of differential privacy in social network analysis: a survey," IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 108-127, 2023. https://doi.org/10.1109/TKDE.2021.3073062
J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao,"). PrivBayes: private data release via Bayesian networks," ACM Transactions on Database Systems (TODS), vol. 42, no. 4, pp. 1-41, 2017. https://doi.org/10.1145/3134428
P. H. Lu, P. C. Wang, and C. M. Yu, "Empirical evaluation on synthetic data generation with generative adversarial network," in Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, Seoul, Republic of Korea, 2019, pp. 1-6. https://doi.org/10.1145/3326467.3326474
J. Fan, T. Liu, G. Li, J. Chen, Y. Shen, and X. Du, "Relational data synthesis using generative adversarial networks: a design space exploration," 2020 [Online]. Available: https://arxiv.org/abs/2008.12763.
Financial Services Commission, "Guidelines for Financial Data Pseudonymization and Anonymization," 2022 [Online]. Available: https://www.fsec.or.kr/bbs/detail?menuNo=246&bbsNo=6484.
Korean Law Information Center, "Personal Information Protection Act," 2023 [Online]. Available: https://www.law.go.kr/LSW/lsInfoP.do?chrClsCd=010203&lsiSeq=142563&viewCls=engLsInfoR&urlMode=engLsInfoR/1000#0000.
PWS Cup 2018 [Online]. Available: https://www.iwsec.org/pws/2018/cup18.html.
M. Rahman, M. K. Paul, and A. S. Sattar, "Efficient perturbation techniques for preserving privacy of multivariate sensitive data," Array, vol. 20, article no. 100324, 2023. https://doi.org/10.1016/j.array.2023.100324
Privacy enhancing data de-identification terminology and classification of techniques, ISO/IEC 20889:2018, 2018.
C. C. Aggarwal, "On k-anonymity and the curse of dimensionality," in Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005, pp. 901-909. https://dl.acm.org/doi/10.5555/1083592.1083696
D. Wang, B. Guo, and Y. Shen, "Method for measuring the privacy level of pre-published dataset," IET Information Security, vol. 12, no. 5, pp. 425-430, 2018. https://doi.org/10.1049/iet-ifs.2017.0341
C. K. Liew, U. J. Choi, and C. J. Liew, "A data distortion by probability distribution," ACM Transactions on Database Systems (TODS), vol. 10, no. 3, pp. 395-411, 1985. https://doi.org/10.1145/3979.4017
R. Agrawal and R. Srikant, "Privacy-preserving data mining," in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 2000, pp. 439-450. https://doi.org/10.1145/342009.335438
Github, "rho_RDP," 2023 [Online]. Available: https://github.com/jXXXXDy/rho_RDP/tree/main.
F. Prasser, J. Eicher, H. Spengler, R. Bild, and K. A. Kuhn, "Flexible data anonymization using ARX: current status and challenges ahead," Software: Practice and Experience, vol. 50, no. 7, pp. 1277-1304, 2020. https://doi.org/10.1002/spe.2812
C. E. Jakob, F. Kohlmayer, T. Meurers, J. J. Vehreschild, and F. Prasser, "Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19," Scientific Data, vol. 7, article no. 435, 2020. https://doi.org/10.1038/s41597-020-00773-y
A. C. Haber, U. Sax, F. Prasser, and NFDI4Health Consortium, "Open tools for quantitative anonymization of tabular phenotype data: literature review," Briefings in Bioinformatics, vol. 23, no. 6, article no. bbac440, 2022. https://doi.org/10.1093/bib/bbac440
UCI Machine Learning Repository, "Adults dataset," 1996 [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Adult.