DOI QR코드

DOI QR Code

A parametric bootstrap test for comparing differentially private histograms

모수적 부트스트랩을 이용한 차등정보보호 히스토그램의 동질성 검정

  • Received : 2021.08.11
  • Accepted : 2021.10.25
  • Published : 2022.02.28

Abstract

We propose a test of consistency for two differentially private histograms using parametric bootstrap. The test can be applied when the original raw histograms are not available but only the differentially private histograms and the privacy level α are available. We also extend the test for the case where the privacy levels are different for different histograms. The resident population data of Korea and U.S in year 2020 are used to demonstrate the efficacy of the proposed test procedure. The proposed test controls the type I error rate at the nominal level and has a high power, while a conventional test procedure fails. While the differential privacy framework formally controls the risk of privacy leakage, the utility of such framework is questionable. This work also suggests that the power of a carefully designed test may be a viable measure of utility.

본 논문에서는 모수적 부트스트랩을 이용한 두 차등정보보호 히스토그램의 동질성 검정을 제안한다. 제안된 검정 방법은 차등정보보호 히스토그램과 적용된 차등정보보호 수준 정보만 있을 때에도 사용 가능하며, 비교하고자 하는 두 히스토그램에 적용된 차등정보보호의 수준이 다를 때에도 사용할 수 있다는 장점이 있다. 검정 방법의 성능을 평가하기 위해 미국과 한국의 연령별 인구분포 자료를 사용하고, 제 1종 오류의 확률이 잘 통제됨과 높은 검정력을 확인한다.

Keywords

Acknowledgement

이 논문은 한국연구재단(NRF)과 과학기술정보통신부(MSIT)의 지원을 받아 연구되었음(No. 2019R1A2C2002256).

References

  1. Dwork C, McSherry F, Nissim K, and Smith A (2006a). Calibrating noise to sensitivity in private data analysis, Theory of Cryptography Conference, 265-284.
  2. Dwork C (2006b). Differential privacy, International Colloquium on Automata, Languages, and Programming, 1-12.
  3. Gaboardi M, Lim H, Rogers R, and Vadhan S (2016). Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. In Proceedings of Machine Learning, 48, 2111-2120.
  4. Geng Q and Viswanath P (2015). The optimal noise-adding mechanism in differential privacy, IEEE Transactions on Information Theory, 62, 925-951. https://doi.org/10.1109/TIT.2015.2504967
  5. Homer N, Szelinger S, Redman M, et al. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays, Plos Genetics, 4, e1000167. https://doi.org/10.1371/journal.pgen.1000167
  6. Lee J, Wang Y, and Kifer D (2015). Maximum likelihood postprocessing for differential privacy under consistency constraints. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 635-644.
  7. Park M, Lee Y, and Kwon S (2018). An study on differential privacy, Statistics Research Institute.
  8. Park M, Kwon S, and Jung J (2019). An experimental study on applying differential privacy, Statistics Research Institute.
  9. Wang Y, Lee J, and Kifer D (2017). Revisiting differentially private hypothesis tests for categorical data, Retrieved June 22nd, 2021 from: arXiv:1511.03376.
  10. Wasserman L and Zhou S (2010). A statistical framework for differential privacy, Journal of the American Statistical Association, 105, 375--389. https://doi.org/10.1198/jasa.2009.tm08651
  11. US Census Bureau (2020). 2020 Demographic analysis estimates press kit, Retrieved feb 9th 2022 from https://www.census.gov/newsroom/press-kits/2020/2020-demographic-analysis.html