Browse > Article
http://dx.doi.org/10.13089/JKIISC.2020.30.5.945

A Study on Synthetic Data Generation Based Safe Differentially Private GAN  

Kang, Junyoung (Kongju National University)
Jeong, Sooyong (Kongju National University)
Hong, Dowon (Kongju National University)
Seo, Changho (Kongju National University)
Abstract
The publication of data is essential in order to receive high quality services from many applications. However, if the original data is published as it is, there is a risk that sensitive information (political tendency, disease, ets.) may reveal. Therefore, many research have been proposed, not the original data but the synthetic data generating and publishing to privacy preserve. but, there is a risk of privacy leakage still even if simply generate and publish the synthetic data by various attacks (linkage attack, inference attack, etc.). In this paper, we propose a synthetic data generation algorithm in which privacy preserved by applying differential privacy the latest privacy protection technique to GAN, which is drawing attention as a synthetic data generative model in order to prevent the leakage of such sensitive information. The generative model used CGAN for efficient learning of labeled data, and applied Rényi differential privacy, which is relaxation of differential privacy, considering the utility aspects of the data. And validation of the utility of the generated data is conducted and compared through various classifiers.
Keywords
Synthetic Data; CGAN; Differential Privacy; $R{\acute{e}}nyi$ Differential Privacy; Data Privacy;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D. and Xiao, X. "Privbayes: Private data release via bayesian networks." ACM Transactions on Database Systems (TODS), 42(4), pp. 1-41, Oct. 2017
2 Mironov, I., Talwar, K. and Zhang, L. "Renyi Differential Privacy of the Sampled Gauusian Mechnism." arXiv preprint arXiv:1908.10530, Aug. 2019
3 Dal Pozzolo, A., Caelen, O., Johnson, R. A. and Bontempi, G. "Calibrating probability with undersampling for unbalanced calssification." In 2015 IEEE Symposium Series on Computational Intelligence, IEEE, pp. 159-166, Jan. 2015
4 Fernandes, K., Cardoso, J. S. and Fernandes, J. "Transfer learning with partial observability applied to cervical cancer screening." In Iberian conference on pattern recognition and image analysis, Springer, Cham, pp. 243-250, May. 2017
5 Asuncion, A. and Newman, D. "UCI machine learning repository." http://archive.ics.uci.edu/ml, 2007
6 Buczak, Anna L., Steven B., and Linda M. "Data-driven approach for creating synthetic electronic medical records." BMC medical informatics and decision making 10(1), 59. Oct. 2010   DOI
7 Sweeney, L. "Matching known patients to health records in Washington State data." Available at SSRN 2289850, Jul. 2013
8 McLachlan, S., Kudakwashe D., and Thomas G. "Using the caremap with health incidents statistics for generating the realistic synthetic electronic healthcare record." IEEE International Conference on Healthcare Informatics (ICHI). 2016 IEEE, 2016. pp. 439-448. Oct. 2016
9 Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F. and Sun, J. "Generating multi-label discrete patient records using generative adversarial networks." arXiv preprint arXiv:1703.06490. Mar. 2017
10 Narayanan, A, and Vitaly S. "Robust de-anonymization of larg sparse datasets." 2008 IEEE Symposium on Security and Privacy, 2008 IEEE, pp. 111-125, May. 2008
11 Dwork, C., McSherry, F., Nissim, K. and Smith, A. "Calibrating noise to sensitivity in private data analysis." Journal of Privacy and Confidentiality, 7(3), pp. 17-51, May. 2016
12 Dwork, C. "Differential privacy: A survey of results." International conference on theory and applications of models of computation. Springer, pp. 1-19, Apr. 2008
13 Bowen, C. M., and Liu, F. "Comparative study of differentially private data synthesis methods." arXiv preprint arXiv: 1602.01063. Feb. 2016
14 Liu, F. "Model-based differentially private data synthesis." arXiv preprint arXiv:1606.08052, Jun. 2016
15 Li, H., Xiong, L. and Jiang, X. "Differentially private synthesization of multi-dimensional data using copula functions." InAdvances in database technology: proceedings. International conference on extending database technology, vol. 2014. NIH Public Access, pp. 475, Nov. 2014
16 Song, S., Chaudhuri, K. and Sarwate, A. D. "Stochastic gradient descent with differentially private updates." In 2013 IEEE Global Conference on Signal and Information Processing, IEEE, pp. 245-248, Dec. 2013
17 Abay, N. C., Zhow, Y., Kantarcioglu, M., Thuraisingham, B. and Sweeney, L. "Privacy preserving synthetic data release using deep learning." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Cham, pp. 510-526, Jan. 2018
18 Jordon, J., Yoon, J. and van der Scharr, M. "PATE-GAN: Generating synthetic data with differential privacy guarantees." In International Conference on Learning Representations. Sep. 2018
19 http://www.technologyreview.com/10-breakthrough-technologies/2020/
20 Abadi, M., Chu, A., Goodfellow, I., Mamahan, H. B., Mironov, I., Talwar, K. and Zhang, L. "Deep learning with differential privacy." In Proceedings of the 2016 ACM SIGSAC Confernece on Computer and Communications Security, pp. 308-318, Oct. 2016
21 Mironov, I. "Renyi differential privacy." In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), IEEE, pp. 263-275, Aug. 2017
22 Goodfellow, I., Pouget_Abadie, J., Mirza, N., Xu, B. WardeFarley, D., Ozair, S., Courville, A. and Bengio Y. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680, Jun. 2014
23 Mirza, M and Ssindero, S. "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784, Nov. 2014
24 Dwork, C. and Roth, A. "The algorihmic foundations of differential privacy." Foundations and Trends in Theoretical Computer Science, 9(3-4), pp. 211-407, Aug. 2014   DOI