DOI QR코드

DOI QR Code

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra (Department of Computer Science & Applications, Maharshi Dayanand University) ;
  • Preeti Gulia (Department of Computer Science & Applications, Maharshi Dayanand University) ;
  • Nasib Singh Gill (Department of Computer Science & Applications, Maharshi Dayanand University)
  • Received : 2023.10.05
  • Published : 2023.10.30

Abstract

In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

Keywords

Acknowledgement

The authors are thankful to the Department of Computer science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India for its support. The authors are grateful to https://archive.ics.uci.edu/ml/index.php and https://www.kaggle.com/datasets for providing the datasets for experimental purposes.

References

  1. A. Altalhi, M. AL-Saedi, H. Alsuwat, and E. Alsuwat, "Privacy-Preserving in the Context of Data Mining and Deep Learning," Int. J. Comput. Sci. Netw. Secur., vol. 21, no. 6, pp. 137-142, 2021.
  2. M. Thottipalayam Andavan and N. Vairaperumal,"Privacy protection domain-user integra tag deduplication in cloud data server," Int. J. Electr. Comput. Eng. IJECE, vol. 12, no. 4, p. 4155, Aug. 2022, doi: 10.11591/ijece.v12i4.pp4155-4163.
  3. P. Gulia, "Privacy Preserving Data Mining Of Vertically Partitioned Data In Distributed Environment-An Experimental Analysis," J. Theor. Appl. Inf. Technol., vol. 96, no. 10, 2018.
  4. R. Ratraand P. Gulia, "Privacy Preserving Data Mining: Techniques and Algorithms," Int. J. Eng. Trends Technol., vol. 68, no. 11, pp. 56-62, Nov. 2020, doi: 10.14445/22315381/IJETT-V68I11P207.
  5. H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, "Random-data perturbation techniques and privacy-preserving data mining," Knowl. Inf. Syst., vol. 7, no. 4, pp. 387-414, 2005. https://doi.org/10.1007/s10115-004-0173-6
  6. K. Liu, H. Kargupta, and J. Ryan, "Random projection-based multiplicative data perturbation for privacy preserving distributed data mining," IEEE Trans. Knowl. Data Eng., vol. 18, no. 1, pp. 92-106, 2005. https://doi.org/10.1109/TKDE.2006.14
  7. N. Gayathri Devi and K. Manikandan,"Improved perturbation technique privacy-preserving rotation-based condensation algorithm for privacy preserving in big data stream using Internet of Things," Trans. Emerg. Telecommun. Technol., vol. 31, no. 12, pp. 1-12, 2020. https://doi.org/10.1002/ett.3970
  8. K. M. Chong, "Privacy-preserving healthcare informatics: a review," ITM Web Conf., vol. 36, p. 04005, 2021, doi: 10.1051/itmconf/20213604005.
  9. M. Al-Rubaie, P. Wu, J. M. Chang, and S.-Y. Kung, "Privacy-preserving PCA on horizontally-partitioned data," in 2017 IEEE Conference on Dependable and Secure Computing, 2017, pp. 280-287.
  10. M. Dabhade and J. J. Hilda,"Privacy Preserving In Data Mining Using Data Perturbation And Classification Method," Ecer Iioabj, vol. 8, pp. 346-352.
  11. C. Eyupoglu, M. A. Aydin, A. H. Zaim, and A. Sertbas," An efficient big data anonymization algorithm based on chaos and perturbation techniques," Entropy, vol. 20, no. 5, p.373, 2018.
  12. D. El Majdoubi, H. El Bakkali, S. Sadki, Z. Maqour, and A. Leghmid, "The Systematic Literature Review of Privacy-Preserving Solutions in Smart Healthcare Environment," Secur. Commun. Netw., vol. 2022, pp. 1-26, Mar. 2022, doi:10.1155/2022/5642026.
  13. O. Mir, M. Roland, and R. Mayrhofer, "Decentralized, Privacy-Preserving, Single Sign-On," Secur. Commun. Netw., vol. 2022, pp. 1-18, Jan. 2022, doi: 10.1155/2022/9983995.
  14. M. B. Malik, M. A. Ghazi, and R. Ali, "Privacy preserving data mining techniques: current scenario and future prospects," in 2012 third international conference on computer and communication technology, 2012, pp. 26-32.
  15. G. D. N and M. K, "Improved perturbation technique privacy - preserving rotation - based condensation algorithm for privacy preserving in big data stream using Internet of Things," Trans. Emerg. Telecommun. Technol., vol. 31, no. 12, Dec. 2020, doi: 10.1002/ett.3970.
  16. R. Ratra, P. Gulia, and N. S. Gill, "Evaluation of Re-identification Risk using Anonymization and Differential Privacy in Healthcare," Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 2, 2022.
  17. W. Shen, Q. Guo, H. Zhu, K. Tang, S. Zhan, and Z. Hao, "The Privacy Data Protection Model Based on Random Projection Technology," in Big Data and Security, Singapore, 2021, pp. 215-226. doi: 10.1007/978-981-16-3150-4_19.
  18. X. Fan, G. Wang, K. Chen, X. He, and W. Xu, "Ppca: Privacy-preserving principal component analysis using secure multiparty computation (mpc)," ArXiv Prepr. ArXiv210507612, 2021.
  19. C. Sun, L. Ippel, A. Dekker, M. Dumontier, and J. van Soest, "A systematic review on privacy-preserving distributed data mining," Data Sci., no. Preprint, pp. 1-30.
  20. V. Sharma, D. Soni, D. Srivastava, and P. Kumar, "A Novel Hybrid Approach of Suppression and Randomization for Privacy Preserving Data Mining.," Ilk. Online, vol. 20, no. 5, 2021.
  21. P. H. Li, T. Lee, and H. Y. Youn, "Dimensionality Reduction with Sparse Locality for Principal Component Analysis," Math. Probl. Eng., vol. 2020, pp. 1-12, May 2020, doi: 10.1155/2020/9723279.
  22. B. B. Mehta and U. P. Rao, "Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishing," J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1423-1430, Apr. 2022, doi: 10.1016/j.jksuci.2019.08.006.
  23. S. Mariammal, "An Additive Rotational Perturbation Technique for Privacy Preserving Data Mining," Turk. J. Comput. Math. Educ. TURCOMAT, vol. 12, no. 9, pp. 2675-2681, 2021. https://doi.org/10.17762/turcomat.v12i3.1295
  24. R. V. Banu and N. Nagaveni, "Preservation of data privacy using PCA based transformation," in 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009, pp. 439-443..
  25. V. Sharma, D. Soni, D. Srivastava, and P. Kumar, "A Novel Hybrid Approach of Suppression and Randomization for Privacy Preserving Data Mining." Ilk. Online, vol. 20, no. 5, 2021.
  26. Department of Computer Science and Engineering, JNTUA, Anantapuramu, Andhra Pradesh, India, P. R. M. Rao, S. M. Krishna, and A. P. S. Kumar," Novel algorithm for efficient privacy preservation in data analytics," Indian J. Sci. Technol., vol. 14, no. 6, pp. 519-526, Feb. 2021, doi: 10.17485/IJST/v14i6.1773.
  27. Associate Professor, Dept of CSE, Sathyabama Institute of Science and Technology, Chennai-600119, India. and Av. Mary," A Random Projection Approach To Secure Medical Images.," Int. J. Adv. Res., vol. 7, no. 3, pp. 1298-1301, Mar. 2019, doi: 10.21474/IJAR01/8763.
  28. S. Ghosh, S. Sadhu, S. Biswas, D. Sarkar, and P. P. Sarkar, "A Comparison between Different Classifiers for Tennis Match Result Prediction,"Malays. J. Comput. Sci., vol. 32, no. 2, pp. 97-111, Apr. 2019, doi: 10.22452/mjcs.vol32no2.2.
  29. M. Al-Rubaie, P. Wu, J. M. Chang, and S.-Y. Kung, "Privacy-preserving PCA on horizontally-partitioned data," in 2017 IEEE Conference on Dependable and Secure Computing, 2017, pp. 280-287.
  30. S. Mariammal, "An Additive Rotational Perturbation Technique for Privacy Preserving Data Mining," Turk. J. Comput. Math. Educ. TURCOMAT, vol. 12, no. 9, pp. 2675-2681, 2021. https://doi.org/10.17762/turcomat.v12i3.1295
  31. A. Pika, M. T. Wynn, S. Budiono, A. H. M. ter Hofstede, W. M. P. van der Aalst, and H. A. Reijers," Privacy-Preserving Process Mining in Healthcare," Int. J. Environ. Res. Public. Health, vol. 17, no. 5, p. 1612, Mar. 2020, doi: 10.3390/ijerph17051612.
  32. S. Patel, G. Shah, and A. Patel, "'Techniques of data perturbation for privacy preserving data mining," Int J Advent Res Comput Electron, vol. 1, pp. 5-10, 2014
  33. X. Liu, Y. Lin, Q. Liu, and X. Yao, "A Privacy-Preserving Principal Component Analysis Outsourcing Framework," in 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering(TrustCom/BigDataSE), New York, NY, USA, Aug. 2018, pp. 1354-1359. doi: 10.1109/TrustCom/BigDataSE.2018.00187.
  34. R. Ratra, P. Gulia, N. S. Gill, and J. M. Chatterjee, "Big Data Privacy Preservation Using Principal Component Analysis and Random Projection in Healthcare," Math. Probl. Eng., vol. 2022, p. 6402274, Aug. 2022, doi:10.1155/2022/6402274.
  35. R. Ratra and P. Gulia, "Experimental Evaluation of Open Source Data Mining Tools(WEKAandOrange)," Int. J. Eng. Trends Technol., vol. 68, no. 8, pp. 30-35, Aug. 2020, doi: 10.14445/22315381/IJETT-V68I8P206S.
  36. A. Amkor, K. Maaider, and N. El Barbri, "An evaluation of machine learning algorithms coupled to an electronic olfactory system: a study of the mintcase," Int. J. Electr. Comput. Eng. IJECE, vol. 12, no. 4, p. 4335, Aug. 2022, doi: 10.11591/ijece.v12i4.pp4335-4344.
  37. "Find Open Datasets and Machine Learning Projects | Kaggle." https://www.kaggle.com/datasets (accessed May 31, 2022). 
  38. S. K. David, M. Rafiullah, and K. Siddiqui, "Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease," J. Healthc. Eng., vol. 2022, pp. 1-9, Apr. 2022, doi: 10.1155/2022/7378307