Browse > Article
http://dx.doi.org/10.36498/kbigdt.2020.5.2.43

Analysis of k Value from k-anonymity Model Based on Re-identification Time  

Kim, Chaewoon (고려대학교 정보보호대학원)
Oh, Junhyoung (고려대학교 정보보호대학원)
Lee, Kyungho (고려대학교 정보보호대학원)
Publication Information
The Journal of Bigdata / v.5, no.2, 2020 , pp. 43-52 More about this Journal
Abstract
With the development of data technology, storing and sharing of data has increased, resulting in privacy invasion. Although de-identification technology has been introduced to solve this problem, it has been proved many times that identifying individuals using de-identified data is possible. Even if it cannot be completely safe, sufficient de-identification is necessary. But current laws and regulations do not quantitatively specify the degree of how much de-identification should be performed. In this paper, we propose an appropriate de-identification criterion considering the time required for re-identification. We focused on the case of using the k-anonymity model among various privacy models. We analyzed the time taken to re-identify data according to the change in the k value. We used a re-identification method based on linkability. As a result of the analysis, we determined which k value is appropriate. If the generalized model can be developed by results of this paper, the model can be used to define the appropriate level of de-identification in various laws and regulations.
Keywords
Data de-identification; Data Privacy; Data security;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, "Information security in big data: privacy and data mining," Ieee Access, vol. 2, pp. 1149-1176, 2014.   DOI
2 F. K. Dankar, K. El Emam, A. Neisa, and T. Roffey, "Estimating the re-identification risk of clinical data sets," BMC medical informatics and decision making, vol. 12, no. 1, p. 66, 2012.   DOI
3 ISO DIS 25237 "Health informatics - Pseudonymization," 2017.
4 C. A. Cassa, S. C. Wieland, and K. D. Mandl, "Re-identification of home addresses from spatial locations anonymized by Gaussian skew," International journal of health geographics, vol. 7, no. 1, p. 45, 2008.   DOI
5 K. El Emam, E. Jonker, and B. M. Luk Arbuckle, "A systematic review of re-identification attacks on health data," PloS one, vol. 6, no. 12, 2011.
6 C. Culnane, B. Rubinstein, and V. Teague, "Health data in an open world: a report on re-identifying patients in the MBS/PBS data set and the implications on future releases of Australian government data," 2017.
7 M. Douriez, H. Doraiswamy, J. Freire, and C. T. Silva, "Anonymizing nyc taxi data: Does it matter?," in 2016 IEEE international conference on data science and advanced analytics (DSAA), pp. 140-148. 2016.
8 K. El Emam, "Methods for the de-identification of electronic health records for genomic research," Genome Medicine, vol. 3, no. 4, p. 25, 2011.   DOI
9 A. Gkoulalas-Divanis, G. Loukides, and J. Sun, "Publishing data from electronic health records while preserving privacy: A survey of algorithms," Journal of biomedical informatics, vol. 50, pp. 4-19, 2014.   DOI
10 R. Leenes, R. Van Brakel, S. Gutwirth, and P. De Hert, Data protection and privacy: the age of intelligent machines. Bloomsbury Publishing, 2017.
11 A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "l-diversity: Privacy beyond k-anonymity," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, no. 1, pp. 3-es, 2007.   DOI
12 L. Sweeney, "k-anonymity: A model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002.   DOI
13 G. Loukides, J. C. Denny, and B. Malin, "The disclosure of diagnosis codes can breach research participants' privacy," Journal of the American Medical Informatics Association, vol. 17, no. 3, pp. 322-327, 2010.   DOI
14 F. K. Dankar, K. El Emam, A. Neisa and T. Roffey, "Estimating the re-identification risk of clinical data sets," BMC Medical Informatics and Decision Making, vol. 12, no. 66, 2012.
15 Z. Yang, R. Wang, D. Luo and Y. Xiong, "Rapid Re-Identification Risk Assessment for Anonymous Data Set in Mobile Multimedia Scene," IEEE Access, vol. 8, pp.41557-41565, 2020.   DOI
16 S. Garfinkel of National Institute of Standards and Technology (NIST) "De-Identifying Government Datasets (2nd Draft)," 2016.
17 Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) 2016.
18 A. Cavoukian and D. Castro, "Big data and innovation, setting the record straight: de-identification does work," Information and Privacy Commissioner, vol. 18, 2014.
19 N. Y. S. D. o. Health. Hospital Inpatient Discharges (SPARCS De-Identified): 2017.
20 N. Li, T. Li, and S. Venkatasubramanian, "t-closeness: Privacy beyond k-anonymity and l-diversity," in 2007 IEEE 23rd International Conference on Data Engineering, pp. 106-115. 2007.
21 L. Rocher, J. M. Hendrickx, and Y.-A. De Montjoye, "Estimating the success of re-identifications in incomplete datasets using generative models," Nature communications, vol. 10, no. 1, pp. 1-9, 2019.   DOI
22 P. Samarati and L. Sweeney, "Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression," 1998.
23 A. Basu, T. Nakamura, S. Hidano and S. Kiyomoto, "k-anonymity: Risks and the Reality," IEEE Trustcom/BigDataSE/ISPA, pp. 983-989, 2015.
24 Office for Civil Rights, HHS. "Standards for privacy of individually identifiable health information. Final rule," Fed Regist. 2002 Aug 14;67(157): 53181-273, 2002.
25 G. E. Simon, S. M. Shortreed, R. Y. Coley, R.B. Penfold, R. C. Rossom, B. E. Waitzfelder, K. Sanchez, and F. L. Lynch, "Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records," EGEMS (Washington, DC), 7(1), 6, 2019.   DOI