차분 프라이버시 기반 비식별화 기술에 대한 연구

  • 정강수 (서강대학교 컴퓨터공학과 데이터베이스 연구실) ;
  • 박석 (서강대학교 컴퓨터공학과 데이터베이스 연구실)
  • 발행 : 2018.04.30

초록

차분 프라이버시는 통계 데이터베이스 상에서 수행되는 질의 결과에 의한 개인정보 추론을 방지하기 위한 수학적 모델로써 2006년 Dwork에 의해 처음 소개된 이후로 통계 데이터에 대한 프라이버 보호의 표준으로 자리잡고 있다. 차분 프라이버시는 데이터의 삽입/삭제 또는 변형에 의한 질의 결과의 변화량을 일정 수준 이하로 유지함으로써 정보 노출을 제한하는 개념이다. 이를 구현하기 위해 메커니즘 상의 연구(라플라스 메커니즘, 익스퍼넨셜 메커니즘)와 다양한 데이터 분석 환경(히스토그램, 회귀 분석, 의사 결정 트리, 연관 관계 추론, 클러스터링, 딥러닝 등)에 차분 프라이버시를 적용하는 연구들이 수행되어 왔다. 본 논문에서는 처음 Dwork에 의해 제안되었을 때의 차분 프라이버시 개념에 대한 이해부터 오늘날 애플 및 구글에서 차분 프라이버시가 적용되고 있는 수준에 대한 연구들의 진행 상황과 앞으로의 연구 주제에 대해 소개한다.

키워드

참고문헌

  1. K. Nissim, S. Raskhodnikova, and A. Smith, "Smooth sensitivity and sampling in private data analysis," in Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, CA, 2007, pp. 75-84.
  2. C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, "Our data, ourselves: privacy via distributed noise generation," in Proceedings of the 24th Annual International Conference on The Theory and Applications of Cryptographic Techniques, Saint Petersburg, Russia, 2006, pp. 486-503
  3. F. McSherry and K. Talwar, "Mechanism design via differential privacy," in Proceedings of the 48th Annual IEEE Journal of Computing Science and Engineering, Vol. 7, No. 3, September 2013, pp. 177-186
  4. A. Ghosh, T. Roughgarden, and M. Sundararajan, "Universally utility-maximizing privacy mechanisms," in Proceedings of the 41st Annual ACM Symposium on Theory of Computing, Bethesda, MD, 2009, pp. 351-360.
  5. Chaudhuri, Kamalika, Claire Monteleoni, and Anand D. Sarwate. "Differentially private empirical risk minimization." Journal of Machine Learning Research 12.Mar (2011): 1069-1109.
  6. R. Sarathy and K. Muralidhar, "Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data," Transactions on Data Privacy, vol. 4, no. 1, pp. 1-17, 2011.
  7. K. Muralidhar and R. Sarathy, "Does Differential Privacy Protect Terry Gross' Privacy?," in Privacy in Statistical Databases, vol. 6344, J. Domingo? Ferrer and E. Magkos, Eds. Springer Berlin / Heidelberg, 2011, pp. 200-209
  8. Bambauer, J. R., Muralidhar, K., & Sarathy, R. (2013). Fool's gold: an illustrated critique of differential privacy.
  9. Frank mcsherry, https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md
  10. A. Haeberlen, B. C. Pierce, and A. Narayan, "Differential privacy under fire," in Proceedings of the 20th USENIX Conference on Security, San Francisco, CA, 2011
  11. P. Mohan, A. Thakurta, E. Shi, D. Song, and D. Culler, "GUPT: Privacy preserving data analysis made easy," in Proc. 2012 ACM SIGMOD Int. Conf. Management Data, pp. 349-360.
  12. D. Kifer and A. Machanavajjhala, "No free lunch in data privacy," in Proceedings of the 2011 international conference on Management of data, 2011, pp. 193-204
  13. Dinur I, Nissim K. Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2003, 202-210.
  14. C. Dwork and S. Yekhanin, "New efficient attacks on statistical disclosure control mechanisms," in Proceedings of the 28th Annual Conference on Cryptology: Advances in Cryptology, Santa Barbara, CA, 2008, pp. 469-480.
  15. A. Blum, K. Ligett, and A. Roth, "A learning theory approach to non-interactive database privacy," in Proceedings of the 40th Annual ACM Symposium on Theory of Computing, Victoria, BC, 2008, pp. 609-618
  16. A. Roth and T. Roughgarden, "Interactive privacy via the median mechanism," in Proceedings of the 42nd ACM symposium on Theory of computing, New York, NY, USA, 2010, pp. 765-774
  17. M. Hardt and G. N. Rothblum, "A multiplicative weights mechanism for privacy-preserving data analysis," in Proceedings of the IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, 2010, pp. 61-70
  18. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar, "Privacy, accuracy, and consistency too: a holistic solution to contingency table release," in Proceedings of the twenty?sixth ACM SIGMOD?SIGACT? SIGART symposium on Principles of database systems, New York, NY, USA, 2007, pp. 273-282.
  19. G. Acs, C. Castelluccia, and R. Chen, "Differentially private histogram publishing through lossy compression," in Proceedings of the IEEE 12th International Conference on Data Mining, Brussels, Belgium, 2012.
  20. M. Hay, V. Rastogi, G. Miklau, and D. Suciu, "Boosting the accuracy of differentially private histograms through consistency," Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 1021-1032, 2010.
  21. C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor, "Optimizing linear counting queries under differential privacy," in Proceedings of the 29th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems, Indianapolis, IN, 2010, pp. 123-134
  22. C. Li and G. Miklau, "An adaptive mechanism for accurate query answering under differential privacy," Proceedings of the VLDB Endowment, vol. 5, no. 6, pp. 514-525, 2012
  23. Chen, R., Fung, B. C. M., and Desai, B. C. Differentially private trajectory data publication. CoRR (2011), ?1?1
  24. Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., and Yu, T. Differentially private spatial decompositions. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering (Washington, DC, USA, 2012), ICDE '12, IEEE Computer Society, pp. 20-31.
  25. S. Peng, Y. Yang, Z. Zhang, M. Winslett, and Y. Yu, "DPtree: indexing multi-dimensional data under differential privacy," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Scottsdale, AZ, 2012, pp. 864-864.
  26. Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. Functional mechanism: Regression analysis under differential privacy. In International Conference on Very Large Data Bases, pages 1364-1375, 2012
  27. Bonomi, Luca, and Li Xiong. "A two-phase algorithm for mining sequential patterns with differential privacy." Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2013.
  28. Li, N., Qardaji, W., Su, D., and Cao, J. Privbasis: frequent itemset mining with differential privacy. Proc. VLDB Endow. 5, 11 (July 2012), 1340-1351
  29. Geetha Jagannathan, Krishnan Pillaipakkamnatt, and Rebecca N. Wright. A practical differentially private random decision tree classifier. In International Conference on Data Mining Workshops, pages 114-121, 2009.
  30. Arik Friedman and Assaf Schuster. Data mining with differential privacy. In International Conference on Knowledge Discovery and Data Mining, pages 493-502, 2010.
  31. Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 371-380, 2009.
  32. dam Smith. Efficient, differentially private point estimators. In Computing Research Repository, 2008.
  33. Jing Lei. Differentially private M-estimators. In Advances in Neural Information Processing Systems, pages 361-369, 2011.
  34. HO, S.-S. AND RUAN, S. 2011. Differential privacy for location pattern mining. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS. SPRINGL '11. ACM, New York, NY, USA, 17-24
  35. W. Qardaji, W. Yang and N. Li, "Differentially private grids for geospatial data," 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, QLD, pp. 757-768., 2013
  36. Jun Zhang, Xiaokui Xiao, and Xing Xie., "PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions.", In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 155-170., 2016
  37. Rui Chen, Benjamin C.M. Fung, Bipin C. Desai, and Neriah M. Sossou., "Differentially private transit data publication: a case study on the montreal transportation system.", In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '12). ACM, New York, NY, USA, 213-221., 2012
  38. D. Shao, K. Jiang, T. Kister, S. Bressan and K.-L. Tan, "Publishing trajectory with differential privacy: A priori vs. a posteriori sampling mechanisms", In DEXA, pages 357-365, 2013
  39. Xi He, Graham Cormode, Ashwin Machanavajjhala, Cecilia M. Procopiuc, and Divesh Srivastava., "DPT: differentially private trajectory synthesis using hierarchical reference systems.", Proc. VLDB Endow. 8, pp. 1154-1165, 2015
  40. Dwork et al., "Differential privacy in new settings", SODA 2010
  41. Yves-Alexandre de Montjoye, Cesar A. Hidalgo, Michel Verleysen, Vincent D. Blondel, "Unique in the crowd: The privacy bounds of human mobility", Sci. Rep., 3(1376), 2013
  42. Georgios Kellaris, Stavros Papadopoulos, Xiaokui Xiao, and Dimitris Papadias, "Differentially private event sequences over infinite streams.", Proc. VLDB Endow. 7, 1155-1166, 2014
  43. Y. Cao and M. Yoshikawa, "Differentially Private Real-Time Data Release over Infinite Trajectory Streams," 2015 16th IEEE International Conference on Mobile Data Management, Pittsburgh, PA, pp. 68-73., 2015
  44. Georgios Kellaris, Stavros Papadopoulos, Xiaokui Xiao, and Dimitris Papadias, "Differentially private event sequences over infinite streams.", Proc. VLDB Endow. 7, 1155-1166, 2014
  45. C. Task and C. Clifton, "A Guide to Differential Privacy Theory in Social Network Analysis," in 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.
  46. E. Shen and T, Yu, "Mining Frequent Graph Patterns with Differential Privacy," in KDD' 13, August 11-14, 2013, Chicago, Illinois, USA.
  47. V. Rastogi, M. Hay, G. Miklau and D. Suciu, "Relationship Privacy: Output Perturbation for Queries with Joins," in PODS' 09, June 29-July 2, 2009, Providence, Rhode Island, USA.
  48. Kifer, Daniel, and Ashwin Machanavajjhala. "Pufferfish: A framework for mathematical privacy definitions." ACM Transactions on Database Systems (TODS) 39.1 (2014): 3
  49. He, Xi, Ashwin Machanavajjhala, and Bolin Ding. "Blowfish privacy: Tuning privacy-utility trade-offs using policies." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014.
  50. Zhang, Jun, et al. "Privbayes: Private data release via bayesian networks." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014
  51. Yang, Bin, Issei Sato, and Hiroshi Nakagawa. "Bayesian differential privacy on correlated data." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015
  52. Chen, Rui, et al. "Correlated network data publication via differential privacy." The VLDB Journal 23.4 (2014): 653-676 https://doi.org/10.1007/s00778-013-0344-8
  53. Liu, Changchang, Supriyo Chakraborty, and Prateek Mittal. "Dependence Makes You Vulnberable: Differential Privacy Under Dependent Tuples." NDSS. 2016
  54. F. D. McSherry, "Privacy integrated queries: an extensible platform for privacy-preserving data analysis," in Proceedings of the 35th SIGMOD International Conference on Management of Data, Providence, RI, 2009, pp. 19-30.
  55. I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel, "Airavat: Security and privacy for mapreduce," in Proc. 7th USENIX Conf. Networked Systems Design and Implementation (NSDI '10), Berkeley, CA.
  56. Machanavajjhala, A., Kifer, D., Abowd, J. M., Gehrke, J., and Vilhuber, L. Privacy: Theory meets practice on the map. In ICDE'08 (2008), pp. 277-286
  57. Li, N., Qardaji, W., Su, D., and Cao, J. Privbasis: frequent itemset mining with differential privacy. Proc. VLDB Endow. 5, 11 (July 2012), 1340-1351
  58. Kotsogiannis, I., Hay, M., Machanavajjhala, A., Miklau, G., & Orr, M. (2017, May). DIAS: Differentially Private Interactive Algorithm Selection using Pythia. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1679-1682). ACM.
  59. Erlingsson, U., Pihur, V., & Korolova, A. (2014, November). Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security (pp. 1054-1067).
  60. Prochlo: Strong Privacy for Analytics in the Crowd
  61. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
  62. J. Lee, C. Clifton, "How Much is Enough? Choosing Epsilon for Differential Privacy" Proceedings of the International Conference on Information Security, pp. 325-340, 2011.
  63. J. Hsu, et al, "Differential Privacy: An Economic Method for Choosing Epsilon", Proceedings of the 27th IEEE Computer Security Foundations Symposium, pp.1-29, 2014.
  64. L. Fleischer, Y. Lyu, C. Science, D. College, "Approximately Optimal Auctions for Selling Privacy when Costs are Correlated with Data" Proceedings of the 13th ACM Conference on Electronic Commerce, pp. 568-585, 2012.