Acknowledgement
이 논문은 2022년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No.2022-0-00937, 통계데이터 재현자료기법의 활용성과 유용성을 높여야 하는 문제 해결)
References
- Alaa A, Van Breugel B, Saveliev ES, and van der Schaar M (2022). How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models, International Conference on Machine Learning, 290-306, PMLR.
- Arjovsky M, Chintala S, and Bottou L (2017). Wasserstein generative adversarial networks, International Conference on Machine Learning, 214-223, PMLR.
- Arthur D and Vassilvitskii S (2007) K-means plus plus: The advantages of careful seeding, In Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms, New Orleans, Louisiana, USA, 1027-1035.
- Breiman L, Friedman JH, Olshen RA, and Stone CJ (2017). Classification and Regression Trees, Routledge, New York.
- Dhariwal P and Nichol A (2021). Diffusion models beat gans on image synthesis, Advances in Neural Information Processing Systems, 34, 8780-8794.
- Drechsler J and Reiter JP (2009). Disclosure risk and data utility for partially synthetic data: An empirical study using the german iab establishment survey, Journal of Official Statistics, 25, 589-603.
- EI Emam K, Mosquera L, and Bass J (2020). Evaluating identity disclosure risk in fully synthetic health data: Model development and validation, Journal of Medical Internet Research, 22, e23139.
- Elliot M (2015). Final report on the disclosure risk associated with the synthetic data produced by the sylls team, Report 2015, 2.
- Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, and Courville AC (2017). Improved training of Wasserstein GANs, Advances in Neural Information Processing Systems, 30, 1-11.
- Hilprecht B, Harterich M, and Bernau D (2019). Monte carlo and reconstruction membership inference attacks ' against generative models, Proceedings on Privacy Enhancing Technologies, 2019, 232-249.
- Hu J and Savitsky TD (2018). Bayesian data synthesis and disclosure risk quantification: An application to the consumer expenditure surveys, Available from: arXiv preprint arXiv:1809.10074
- Ishwaran H and James LF (2001). Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96, 161-173. https://doi.org/10.1198/016214501750332758
- Karr AF, Kohnen CN, Oganian A, Reiter JP, and Sanil AP (2006). A framework for evaluating the utility of data altered to protect confidentiality, The American Statistician, 60, 224-232. https://doi.org/10.1198/000313006X124640
- Khamis H (2008). Measures of association: How to choose?, Journal of Diagnostic Medical Sonography, 24, 155-162. https://doi.org/10.1177/8756479308317006
- Kingma DP and Welling M (2013). Auto-encoding variational Bayes, Available from: arXiv preprint arXiv:1312.6 114
- Kim HJ, Drechsler J, and Thompson KJ(2021). Synthetic microdata for establishment surveys under informative sampling, Journal of the Royal Statistical Society: Series A, 184, 255-281. https://doi.org/10.1111/rssa.12622
- Kim J and Park M-J (2019). Multiple imputation and synthetic data, The Korean Journal of Applied Statistics, 32, 83-97. https://doi.org/10.5351/KJAS.2019.32.1.083
- Kullback S and Leibler RA (1951). On information and sufficiency, The Annals of Mathematical Statistics, 22, 79-86. https://doi.org/10.1214/aoms/1177729694
- Lee Y (2013). Review on statistical methods for protecting privacy and measuring risk of disclosure when releasing information for public use, Journal of the Korean Data and Information Science Society, 24, 1029-1041. https://doi.org/10.7465/JKDI.2013.24.5.1029
- Lin Z, Khetan A, Fanti G, and Oh S (2018). The power of two samples in generative adversarial networks, Advances in Neural Information Processing Systems, 31, 1-10.
- Little RJA (1993). Statistical analysis of masked data, Journal of Official Statistics, Stockholm, 9, 407-407.
- Markus H, Rudolf M, and Andreas E (2020). A baseline for attribute disclosure risk in synthetic data, In Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy (CODASPY'20), March 16-18, 2020, New Orleans, LA, USA, ACM, New York, NY, USA, 11, Available from: https://doi.org/10.1145/3374664.3375722
- Murray JS and Reiter JP (2016). Multiple imputation of missing categorical and continuous values via bayesian mixture models with local dependence, Journal of the American Statistical Association, 111, 1466-1479. https://doi.org/10.1080/01621459.2016.1174132
- Nowok B, Raab GM, and Dibben C (2016). Synthpop: Bespoke creation of synthetic data in R, Journal of Statistical Software, 74, 1-26. https://doi.org/10.18637/jss.v074.i11
- Park MJ, Kwon SP, and Shim KH (2013). Microdata masking for Survey of Household Finances and Living Conditions, Statistical Research Institute, Daejeon.
- Park M-J, Han J, and Park N (2020). Study on synthetic data generation methods with applications to statistics Korea RDC data, Technical report, Statistical Research Institute.
- Raghunathan TE, Reiter JP, and Rubin DB (2003). Multiple imputation for statistical disclosure limitation, Journal of Official Statistics, 19, 1-16.
- Reiter JP (2003). Inference for partially synthetic, public use microdata sets, Survey Methodology, 29, 181-188.
- Reiter JP (2005). Using CART to generate partially synthetic public use microdata, Journal of Official Statistics, 21, 441-462.
- Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41
- Rubin DB (1993). Statistical disclosure limitation, Journal of Official Statistics, 9, 461-468.
- Scholkopf B, Platt JC, Shawe-Taylor J, Smola AJ, and Williamson RC (2001). Estimating the support of a highdimensional distribution, Neural Computation, 13, 1443-1471. https://doi.org/10.1162/089976601750264965
- Snoke J, Raab GM, Nowok B, Dibben C, and Slavkovic A (2018). General and specific utility measures for synthetic data, Journal of the Royal Statistical Society: Series A, 181, 663-688. https://doi.org/10.1111/rssa.12358
- Song Y and Ermon S (2019). Generative modeling by estimating gradients of the data distribution, Advances in Neural Information Processing Systems, 32, 11895-11907.
- Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, and Poole B (2020). Score-based generative modeling through stochastic differential equations, International Conference on Learning Representations, Available from: https://arxiv.org/abs/2011.13456
- Stan M, Jordi N, Morvarid S, and Tomasz S (2015). A review of attribute disclosure control, Advanced Research in Data Privacy, 567, 41-61. https://doi.org/10.1007/978-3-319-09885-2_4
- Villani C (2008). Optimal Transport: Old and New, Springer, New York.
- Woo M-J, Reiter JP, Oganian A, and Karr AF (2009). Global measures of data utility for microdata masked for disclosure limitation, Journal of Privacy and Confidentiality, 1, 111-124. https://doi.org/10.29012/jpc.v1i1.568
- Xu L, Skoularidou M, Cuesta-Infante A, and Veeramachaneni K (2019). Modeling tabular data using conditional GAN, Advances in Neural Information Processing Systems, 32, 7333-7343.
- Yoon J, Jarrett D, and Van der Schaar M (2019). Time-series generative adversarial networks, Advances in Neural Information Processing Systems, 32, 5509-5519.