Browse > Article
http://dx.doi.org/10.13089/JKIISC.2022.32.2.405

Utility Analysis of Federated Learning Techniques through Comparison of Financial Data Performance  

Jang, Jinhyeok (Soongsil University)
An, Yoonsoo (Soongsil University)
Choi, Daeseon (Soongsil University)
Abstract
Current AI technology is improving the quality of life by using machine learning based on data. When using machine learning, transmitting distributed data and collecting it in one place goes through a de-identification process because there is a risk of privacy infringement. De-identification data causes information damage and omission, which degrades the performance of the machine learning process and complicates the preprocessing process. Accordingly, Google announced joint learning in 2016, a method of de-identifying data and learning without the process of collecting data into one server. This paper analyzed the effectiveness by comparing the difference between the learning performance of data that went through the de-identification process of K anonymity and differential privacy reproduction data using actual financial data. As a result of the experiment, the accuracy of original data learning was 79% for k=2, 76% for k=5, 52% for k=7, 50% for 𝜖=1, and 82% for 𝜖=0.1, and 86% for Federated learning.
Keywords
Federated learning; credit data; K-anonmity; Differential-privacy; synthesis data;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Junyoung Kang, Sooyong Jeong, Dowon Hong, Changho Seo, "A Study on Synthetic Data Generation Based Safe Differentially Private GAN", Journal of The Korea Institute of Information Security & Cryptology, Vol.30, no 5, pp 945-956, Oct. 2020.   DOI
2 Tian Li, Anit Kumar Sahu, Ameet Talwalkar et. al., "Federated Learning: Challenges, methods, and future directions", IEEE SIGNAL PROCESSING MAGAZINE, vol 37 no 3, pp. 50-60, May. 2020.   DOI
3 Xin Yao, Tianchi Huang et. al.,"Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs", arXiv, Sep. 2019
4 H. Brendan, McMahan Eider, Moore Daniel Ramage et. al., "Communication-Efficient Learning of Deep Networks from Decentralized Data", Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), vol 54, pp. 1273-1282 , Feb. 2017.
5 Chikwang Hwang, Jongwon Choe, ChoongSeon Hong, "A Study on Service-based Secure Anonymization for Data Utility Enhancement", Journal of KIISE, vol 42, no 5, pp. 681-689, May. 2015.   DOI
6 Hyunil Kim, Cheolhee Park, Dowon Hong et.al, "A Study on a Differentially Private Model for Financial Data", Journal of the Korea Institute of Information Security & Cryptology, vol 27, no 6, pp. 1519-1534, Dec. 2017.   DOI
7 Bowen, C. M., and Liu, F. "Comparativestudy of differentially private datasynthesis methods." arXiv preprintarXiv: 1602.01063, Feb. 2016.   DOI
8 Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, et. al., "Federated Learning with Non-IID Data", arXiv, Jun. 2018
9 Jakub Konecny, H. Brendan McMahan, Felix X. Yu et. al., "FEDERATED LEARNING: STRATEGIES FOR IMPROVING COMMUNICATION EFFICIENCY", arXiv, Oct . 2017.
10 Andrew Hard, Kanishka Rao, Rajiv Mathews, "FEDERATED LEARNING FOR MOBILE KEYBOARD PREDICTION", arXiv, Feb. 2019.
11 Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp et. al., "TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN", Proceedings of the 2nd SysML Conference, Mar. 2019.
12 LendingClub Statistics, [internet], Available: https://www.lendingclub.com/info/statistics.action, 2020.07.03.
13 Cynthia Dwork, Frank McSherry, Kobbi Nissim et. al. , "Calibrating Noise to Sensitivity in Private Data Analysis", In Theory of Cryptography Conference (TCC), Spr. 2006.
14 Hyejung Moon, Hyun Suk Cho, "Risk based policy at big data era: Case study of privacy invasion", Informatization Policy, vol 19, no 4, pp. 63-82, 2012.
15 Jusung Kang, Jinyoung Kang, Okyeon Yi, Dowon Hong, "A study on the algorithms to achieve the data privacy based on some anonymity measures", Journal of the Korea Institute of Information Security & Cryptology, vol 21, no 5, pp. 149-160, Oct. 2011.   DOI
16 Taewhan Kim, Seog Park, "Differentially Private Synthetic Data Generation Methods for Online Community Data", the Korean Information Science Society Conference, pp. 209-211, Jun. 2018.
17 Google AI, Federated learning [internet], Available: https://federated.withgoogle.com, 2020.07.02.
18 Seungwhoun Kim, Sunghae Jun, "Data De-identification using Autoencoder", Journal of Korean Institute of Intelligent Systems vol 30, no 3, pp. 228-235, Jun. 2020.   DOI
19 Youngha Ryu, Kangsoo Jung, Seog Park, "Anonymization Technique Preserving Privacy against Inference Attack using Statistical Background Knowledge" Journal of KIISE : Computing Practices and Letters, Vol 17, no 3, pp. 195-199, Mar. 2011.
20 Cynthia Dwork, Aaron Roth, "The Algorithmic Foundations of Differential Privacy", Foundations and Trends in Theoretical Computer Science, vol 9, no (3-4), pp. 211-407, 2014.   DOI
21 Joungyoun Kim, Minjeong Park, "Multiple imputation and synthetic data", The Korean Journal of applied Statistics, vol 32 no 1, pp.83-97, 2019.   DOI
22 Narayanan, A, and Vitaly S. "Robustde-anonymization of larg sparsedatasets." 2008 IEEE Symposium on Security and Privacy, pp.111-125, May. 2008.
23 LATANYA SWEENEY, "k-anonymity: a model for protecting privacy," International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, vol 10, no 5, pp. 557-570, 2012.   DOI
24 Qiang Yang, Yang Liu, Tianjian Chen et. al., "Federated Machine Learning: Concept and Applications", ACM Transactions on Intelligent Systems and Technology, vol 10, no 2, pp. 12:1-19, Jan. 2019.
25 Surim Lee, Woongtae Jang, Jaeyoung Bae et.al., "Raising Risk and Suggesting Solution about Personal Information De-identification in Big-Data Environment", the Korea Information Processing Society Conference, Vol 23, no 2, pp. 297-300, Nov. 2016.
26 Kyoungsung Min, Dohnchung Yon, "An Anonymization Method for Privacy Protection in Data Streams", Journal of KIISE : Databases, vol 41, no 1, pp. 8-20 Feb. 2014.
27 Kangsoo Jung, Seog Park, Daeseon Choi, "Analysis of Privacy Violation Possibility of Partially Anonymized Big Data", Journal of The Korea Institute of Information Security & Cryptology, vol 28, no 3, pp. 669-673, Jun. 2018.
28 Neha Patki, Roy Wedge, Kalyan Veeramachaneni, "The Synthetic data vault", 2016 IEEE International Conference on Data Science and Advanced Analytics, pp. 399-410, Oct. 2016.
29 Heuiju Chun, Hyun Jee Yi, Kyupil Yeon et. al., "Data Quality Measurement on a De-identified Data Set Based on Statistical Modeling", JOURNAL OF THE KOREA CONTENTS ASSOCIATION, vol 19, no 5, pp. 553-561, May. 2019.   DOI
30 DongHyun Kang, HyunSeok Oh, WooSeok Yong et.al, , "A Study on the Preservation of Similarity of privated Data", the Korea Information Processing Society Conference, pp. 285-288, Nov. 2017