[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13089/JKIISC.2022.32.5.817

Model Type Inference Attack Using Output of Black-Box AI Model

An, Yoonsoo (Soongsil University)
Choi, Daeseon (Soongsil University)

Publication Information

Journal of the Korea Institute of Information Security & Cryptology / v.32, no.5, 2022 , pp. 817-826 More about this Journal

Abstract

AI technology is being successfully introduced in many fields, and models deployed as a service are deployed with black box environment that does not expose the model's information to protect intellectual property rights and data. In a black box environment, attackers try to steal data or parameters used during training by using model output. This paper proposes a method of inferring the type of model to directly find out the composition of layer of the target model, based on the fact that there is no attack to infer the information about the type of model from the deep learning model. With ResNet, VGGNet, AlexNet, and simple convolutional neural network models trained with MNIST datasets, we show that the types of models can be inferred using the output values in the gray box and black box environments of the each model. In addition, we inferred the type of model with approximately 83% accuracy in the black box environment if we train the big and small relationship feature that proposed in this paper together, the results show that the model type can be infrerred even in situations where only partial information is given to attackers, not raw probability vectors.

Keywords

AI security; Privacy; Exploratory Attack; Inference Attack;

Citations & Related Records

Reference

1	M. Ribeiro, K. Grolinger and M. A. M. Capretz, "MLaaS: Machine Learning as a Service," 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 896-902, Dec 2015.
2	A. Ilyas, L. Engstrom, A. Athalye and J. Lin "Black-box Adversarial Attacks with Limited Queries and Information," Proceedings of the 35th International Conference on Machine Learning, PMLR vol. 80, pp. 2137-2146, Jul 2018.
3	K. Ren, T. Zheng, Z. Qin and X. Liu, "Adversarial Attacks and Defenses in Deep Learning," Engineering vol. 6, no. 3, pp.346-360, March. 2020. DOI
4	M.Barreno, B. Nelson, A.D..Josephand J.D. Tygar, "The security of machine learning," Machine Learning 81, pp.121-148, May. 2010. DOI
5	A. Oseni, N. Moustafa, H. Janicke, P.Liu, Z. Tari and A. Vasilakos, "Security and Privacy for Artificial Intelligence: Opportunities and Challenges," arXiv, Feb. 2021.
6	O. Bastani, C. Kim, and H. Bastani."Interpreting Blackbox Models viaModel Extraction," arXiv, May. 2017.
7	M. Fredrikson, S. Jha and T. Ristenpart, "Model Inversion Attacks that Exploit Confidence Information andBasicCountermeasures," In Proceedings of the 22nd ACM SIGSAC Conference onComputer and Communications Security(CCS '15), pp. 1322- 1333, Oct 2015.
8	B. Biggio. et al. "Evasion Attacks against Machine Learning at Test Time," In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Scie nce, vol. 8190, pp. 387-402, Sep. 2013.
9	M. Jagielski et al, "Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning," 2018 IEEE Symposium on Security and Privacy (SP), pp. 19-35, May.2018.
10	M. kesarwani, B. Mukhoty, V. Aryaand S. Mehta. "Model ExtractionWarningin MLaaS Paradigm," arXiv, Nov. 2017.
11	A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Association for Computing Machine ry, vol.60, 6 pp.84-90, June. 2017. DOI
12	A. F. Agarap, "Deep Learning using Rectified Linear Units (ReLU)," arXiv, Feb. 2019.
13	R. Pascanu, T. Mikolov and Y. Bengio,"On the difficulty of training recurrent neural networks," Proceedings of the30th International Conference onMachine Learning, PMLR, vol. 28 no. 3, pp. 1310-1318, Jun 2013.
14	L. Deng, "The MNIST Database of Handwritten Digit Images for MachineLearning Research [Best of the Web]," in IEEE Signal Processing Magazine,vol. 29, no. 6, pp. 141-142, Nov. 2012. DOI
15	S. Alfeld, X. Zhu and P. Barford, "Data Poisoning Attacks against Autoregressive Models," Proceedings of theAAAI Conference on Artificial Intelligence.vol.30, no.1, Feb 2016.
16	W. Brendel J. Rauber and M. Bethge"Decision-Based Adversarial Attacks:Reliable Attacks Against Black-BoxMachine Learning Models," International Conference on Learning Representations. Feb 2018.
17	R. Shokri, M. Stronati, C. SongandV. Shmatikov, "Membership InferenceAttacks Against Machine LearningModels," 2017 IEEE SymposiumonSecurity and Privacy (SP), pp. 3-18, June.2017.
18	J. Hayes, L. Melis, G. Danezis andE.D. Cristofaro, "LOGAN: Membershipinference attacks against generativemodels," arXiv, Aug, 2018.
19	O. Russakovsky, J. Deng, H. Su et a l., "ImageNet Large Scale Visual Recognition Challenge," International Journal of Computer Vision 115, pp. 211-252, Apr. 2015. DOI
20	T. S. Sethi and M. Kantardzic, "Data driven exploratory attacks on black box classifiers in adversarial domains," Neurocomputing vol. 289, pp. 129-143, Mar. 2018. DOI
21	N. Zhang, Y. Chen and J. Wang, "Image parallel processing based on GPU," 2010 2nd International Conference on Advanced Computer Control, pp. 367-370, June 2010.
22	S. Hochreiter, "The vanishing gradientproblem during learning recurrent neural nets and problemsolutions," International Journal of Uncertainty, Fuzziness Knowledge-Based Systems. vol.6, no. 2, pp.107-116, April. 1998. DOI
23	K. He, X. Zhang, S. Ren andJ. Sun,"Deep Residual Learning for ImageRecognition," in 2016 IEEE ConferenceonComputer Vision and Pattern Recogniti on (CVPR), pp. 770-778, June 2016.
24	K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks forLarge-Scale Image Recognition," arXiv,Apr. 2015.
25	T. Fawcett, "An introduction to ROCanalysis," in Pattern RecognitionLetters, vol.27 no.8 pp. 861-874, Dec. 2005. DOI
26	Keras code example(Computer Vision), Simple MNIIST convnet, Available:https://keras.io/examples/vision/mnist_convnet/

KSCI

Model Type Inference Attack Using Output of Black-Box AI Model 블랙 박스 모델의 출력값을 이용한 AI 모델 종류 추론 공격

Model Type Inference Attack Using Output of Black-Box AI Model