[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2021.04.019

Defending and Detecting Audio Adversarial Example using Frame Offsets

Gong, Yongkang (College of Information Science and Engineering, Ningbo University)
Yan, Diqun (College of Information Science and Engineering, Ningbo University)
Mao, Terui (College of Information Science and Engineering, Ningbo University)
Wang, Donghua (College of Information Science and Engineering, Ningbo University)
Wang, Rangding (College of Information Science and Engineering, Ningbo University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.4, 2021 , pp. 1538-1552 More about this Journal

Abstract

Machine learning models are vulnerable to adversarial examples generated by adding a deliberately designed perturbation to a benign sample. Particularly, for automatic speech recognition (ASR) system, a benign audio which sounds normal could be decoded as a harmful command due to potential adversarial attacks. In this paper, we focus on the countermeasures against audio adversarial examples. By analyzing the characteristics of ASR systems, we find that frame offsets with silence clip appended at the beginning of an audio can degenerate adversarial perturbations to normal noise. For various scenarios, we exploit frame offsets by different strategies such as defending, detecting and hybrid strategy. Compared with the previous methods, our proposed method can defense audio adversarial example in a simpler, more generic and efficient way. Evaluated on three state-of-the-arts adversarial attacks against different ASR systems respectively, the experimental results demonstrate that the proposed method can effectively improve the robustness of ASR systems.

Keywords

Speech Recognition Safety; Adversarial Defense; Adversarial Detection; Audio Adversarial Example; ASR;

Citations & Related Records

Reference

1	H. Kwon, H. Yoon, and K. W. Park, "Poster: Detecting audio adversarial example through audio modification," in Proc. of the ACM Conference on Computer and Communications Security, pp. 2521-2523, 2019.
2	V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady, vol. 10, no. 8, pp. 707-710, 1966.
3	Open SLR. [Online]. Available: http://www.openslr.org/12
4	K. Rajaratnam, K. Shah, and J. Kalita, "Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition," in Proc. of the 30th Conference on Computational Linguistics and Speech Processing(ROCLING), pp. 16-30, 2018.
5	DeepSpeech. [Online]. Available: https://github.com/mozilla/DeepSpeech
6	Kaichen. [Online]. Available: http://kaichen.org
7	Y. Qin, N. Carlini, I. Goodfellow, G. Cottrell, and C. Raffel, "Imperceptible, Robust, and targeted adversarial examples for automatic speech recognition," in Proc. of the 36th International Conference on Machine Learning(ICML), pp. 9141-9150, 2019.
8	N. Carlini and D. Wagner, "Towards evaluating the robustness of neural networks," in Proc. of IEEE Symposium on Security and Privacy(SP), pp. 39-57, 2017.
9	C. Yang, J. Qi, P. Chen, X. Ma, and C. Lee, "Characterizing speech adversarial examples using self-attention U-Net enhancement," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pp. 3107-3111, 2020.
10	P. Ma, S. Petridis, and M. Pantic, "Detecting adversarial attacks on audio-visual speech recognition," arXiv Prepr. arXiv1912.08639, 2019.
11	X. Yuan, Y. Chen, Y. Long, X. Liu, K. Chen, H. Huang, and X. Wang, "CommanderSong: A systematic approach for practical adversarial voice recognition," in Proc. of the 27th USENIX Security Symposium, pp. 49-64, 2018.
12	S. Latif, R. Rana, and J. Qadir, "Adversarial machine learning and speech emotion recognition: utilizing generative adversarial networks for robustness," arXiv Prepr. arXiv1811.11402, 2018.
13	R. Yang, Z. Qu, and J. Huang, "Detecting digital audio forgeries by checking frame offsets," in Proc. of the 10th ACM Workshop on Multimedia and Security, 2008.
14	Audio Adversarial Examples. [Online]. Available: https://nicholas.carlini.com/code/audio adversarial_examples
15	L. Schonherr, K. Kohls, S. Zeiler, T. Holz, and D. Kolossa, "Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding," in Proc. of Network and Distributed Systems Security(NDSS) Symposium, 2019.
16	I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," in Proc. of the 3rd International Confernce at Learn. Represent(ICLR), 2015.
17	S. Sun, C. F. Yeh, M. Ostendorf, M. Y. Hwang, and L. Xie, "Training augmentation with adversarial examples for robust speech recognition," in Proc. of Annuual Conference International Speech Communncation Association(INTERSPEECH), pp. 2404-2408, 2018.
18	Kaldi. [Online]. Available: https://kaldi-asr.org
19	S. Samizade, Z. Tan, C. Shen and X. Guan, "Adversarial example detection by classification for deep speech recognition," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pp. 3102-3106, 2020.
20	T. N. Sainath and C. Parada, "Convolutional neural networks for small-footprint keyword spotting," in Proc. of INTERSPEECH 2015, pp. 1478-1482, 2015.
21	M. Alzantot, B. Balaji, and M. Srivastava, "Did you hear that? Adversarial examples against automatic speech recognition," arXiv Prepr. arXiv1801.00554, 2018.
22	N. Carlini and D. Wagner, "Audio adversarial examples: Targeted attacks on speech-to-text," in Proc. of 2018 IEEE Symposium on Security and Privacy Workshops, pp. 1-7, 2018.
23	X. Liu, K. Wan, Y. Ding, X. Zhang, and Q. Zhu, "Weighted-sampling audio adversarial example attack," AAAI Technical Track: Machine Learning, vol. 34, no. 04, pp. 4908-4915, Apr. 2020.
24	Z. Yang, P. Y. Chen, B. Li, and D. Song, "Characterizing audio adversarial examples using temporal dependency," in Proc. of the 7th International Conference on Learning Represent(ICLR) 2019.