Browse > Article
http://dx.doi.org/10.9717/kmms.2017.20.11.1785

Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments  

Kim, Dae-Ho (Dept. of Computer Science, Sangmyung University)
Kim, Jong Wook (Dept. of Computer Science, Sangmyung University)
Publication Information
Abstract
Generally, big data contains sensitive information about individuals, and thus directly releasing it for public use may violate existing privacy requirements. Therefore, privacy-preserving data publishing (PPDP) has been actively researched to share big data containing personal information for public use, while protecting the privacy of individuals with minimal data modification. Recently, with increasing demand for big data sharing in various area, there is also a growing interest in the development of software which supports a privacy-preserving data publishing. Thus, in this paper, we develops the system which aims to effectively and efficiently support privacy-preserving data publishing. In particular, the system developed in this paper enables data owners to select the appropriate anonymization level by providing them the information loss matrix. Furthermore, the developed system is able to achieve a high performance in data anonymization by using distributed Hadoop clusters.
Keywords
Privacy-Preserving Data Publishing; Hadoop; Distributed Computing; k-Anonymity;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 J. Byun, A. Kamra, E. Bertino, and N. Li, "Efficient K-Anonymization Using Clustering Technique," Proceeding of International Conference on Database Systems for Advanced Applications 2007: Advances in Databases: Concepts, Systems and Applications, pp. 188-200, 2007.
2 K. Wang, P.S. Yu, and S. Chakraborty, "Bottom-up Generalization: A Data Mining Solution to Privacy Protection," Proceedings of the IEEE International Conference on Data Mining, pp. 249-256, 2004.
3 B.C.M. Fung, K. Wang, and P.S. Yu, "Top-down Specialization for Information and Privacy Preservation," Proceedings of the IEEE International Conference on Data Engineering, pp. 205-216, 2005.
4 K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional K-anonymity," Proceedings of the IEEE International Conference on Data Engineering, pp. 25-35, 2006.
5 G. Aggarwal, R. Panigrahy, T. Feder, D. Thomas, K. Kenthapadi, S. Khuller, et al., "Achieving Anonymity Via Clustering," Association for Computing Machinery Transactions on Algorithms, Vol. 6, No. 3 pp. 49-19, 2010.
6 A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “L-diversity: Privacy Beyond K-anonymity,” Association for Computing Machinery Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, pp. 3-52, 2007.   DOI
7 N. Li, T. Li, and S. Venkatasubramanian, "T-closeness: Privacy Beyond K-anonymity and L-diversity," Proceedings of the International Conference on Data Engineering, pp. 106-115, 2007.
8 S. Kim, H. Lee, and Y.D. Chung, "Privacy-preserving Data Cub for Electronic Medical Records: An Experimental Evaluation," International Journal of Medical Informatics, Vol 97, pp. 33-42, 2017.   DOI
9 D.H. Kim and J.W. Kim, "A Study on Performing Join Queries over K-anonymous Tables," Journal of The Korea Society of Computer and Information, Vol. 22, No. 7, pp. 55-62, 2017.   DOI
10 Apache Spark, https://spark.apache.org (accessed Sep., 1, 2017).
11 C. Dai, G. Ghinita, E. Bertino1, J.W. Byun, and N. Li, "TIAMAT: A Tool for Interactive Analysis of Microdata Anonymization Techniques," Proceedings of the International Conference on Very Large Databases, pp. 1618-1621, 2009.
12 J.W. Kim, “Data Partitioning on MapReduce by Leveraging Data Utility,” Journal of Korea Multimedia Society, Vol. 16, No. 5, pp. 657-666, 2013.   DOI
13 K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full Domain K-anonymity," Proceedings of the Association for Computting Machinery Special Interest Group on Management of Data International Conference on Management of Data, pp. 49-60, 2005.
14 Apache Hadoop, http://hadoop.apache.org (accessed Sep., 1, 2017).
15 J. Kim, K. Jung, H. Lee, S. Kim, J. Kim, and Y. Chung, "Models for Privacy-preserving Data Publishing: A Survey," Journal of Korean Institute of Information Scientists and Engineers, Vol. 44, No. 2, pp. 195-207, 2017.
16 B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, “Privacy-preserving Data Publishing: A Survey of Recent Developments,” Association for Computing Machinery Computing Surveys, Vol. 42, No. 4, pp. 14-53, 2010.
17 N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C.K. Lee, “Centralized and Distributed Anonymization for High-dimensional Healthcare Data,” Association for Computing Machinery Transactions on Knowledge Discovery from Data, Vol. 4, No. 4, pp. 18-33, 2010.
18 L. Sweeney, "K-anonymity: A Model for Protecting Privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, Issue 05, pp. 557-570, 2002.   DOI