Browse > Article
http://dx.doi.org/10.7472/jksii.2021.22.6.33

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification  

Hong, Chunho (Department of Defense Science(Computer Engineering and Cyberwarfare Major), Korea National Defense University)
Cho, Youngho (Department of Defense Science(Computer Engineering and Cyberwarfare Major), Korea National Defense University)
Publication Information
Journal of Internet Computing and Services / v.22, no.6, 2021 , pp. 33-40 More about this Journal
Abstract
Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.
Keywords
Voice Recognition; AI Voice Recognition Speaker; Automatic Speaker Verification; Voice Conversion;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 JungWon Kim, YouJin Song, YongjunSung, SejungMarina Choi, "AI Speaker for the Elderly : Functional and Emotional Evaluation of AI Speaker," Journal of Media Economics & Culture 18(4), pp.7-35, 2020. https://doi.org/10.21328/JMEC.2020.11.18.4.7   DOI
2 Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu, "Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems," 29th USENIX Security Symposium, 2020. https://www.usenix.org/conference/usenixsecurity20/presentation/sugawara
3 Byeongon Yang, "Theory and Substance of Speech Alalysis using PRRAT," Mansoo Publishing Company, 2010.
4 Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena, "All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines,"ESORICS 2015: Computer Security - ESORICS 2015, pp 599-621, 2015. https://doi.org/10.1007/978-3-319-24177-7_30   DOI
5 Ji-seop Lee, Soo-young Kang, Seung-joo Kim, "Study on the AI Speaker Security Evaluations and Countermeasure," Journal of the Korea Institute of Information Security & Cryptology, Vol.28, No.6, 2018. https://doi.org/10.13089/JKIISC.2018.28.6.1523   DOI
6 Ministry of Science and ICT, 「2020 Internet Use Survey Results, 2021.
7 Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury, "Inaudible Voice Commands: The Long-Range Attack and Defense," 15th USENIX Symposium on Networked Systems Design and Implementation, 2018. https://www.usenix.org/conference/nsdi18/presentation/roy
8 Suji Baek, Youngjae Lee, "Game Interface based on Voice Recognition for Smartphone," Korean Institute of Information Technology, Proceedings of KIIT summer Technology, pp.454-458, 2012. http://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE01881015
9 Ravika Naike, "An Overview of Automatic Speaker Verification System," Advances in Intelligent Systems and Computing, Vol.673, pp.603-610, 2017. https://doi.org/10.1007/978-981-10-7245-1_59   DOI
10 Seongwoo Kim, Chulsu Shin, BongGyu Kim, "A Study on Fighter Airplane's Voice Command Recognition System Design and Verification Environment," The Korean Society for Aeronautical & Space Sciences, pp.327-331, 2012. https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE02085538
11 Serife Kucur Ergunay, Elie Khoury, Alexandros Lazaridis and Sebastien Marcel, "On the vulnerability of speaker verification to realistic voice spoofing," IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS), pp.1-6, 2015. https://doi.org/10.1109/BTAS.2015.7358783   DOI
12 Henry Turner, Giulio Lovisotto, and Ivan Martinovic, "Attacking Speaker Recognition Systems with Phoneme Morphing," ESORICS 2019: Computer Security - ESORICS 2019, pp 471-492, 2019. https://doi.org/10.1007/978-3-030-29959-0_23   DOI
13 Prosody, https://www.prosody-tts.com/
14 Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Hector Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, and Kong Aik Lee, "ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection," arXiv preprint arXiv:1904.05441v2, 2019. https://arxiv.org/abs/1904.05441
15 Gallup Korea, "2012-2021 Smartphone Utilization & Brands," 2021
16 Hongsu Yoon, "AI Speaker Trends," The Korean Institute of Electrical Engineers, Vol.68, No.10, pp.16-21, 2019.
17 Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Dongok. Won, Insoo Oh, and Seongwhan Lee, "Melspectrogram augmentation for sequence to sequence voice conversion," arXiv preprint arXiv:2001.01401, 2020. https://arxiv.org/abs/2001.01401
18 Seyed Hamidreza Mohammadi, Alexamder Kain, "An Overview of Voice Conversion Systems," Speech Communication, Vol.88, pp.65-82, 2017. https://doi.org/10.1016/j.specom.2017.01.008   DOI
19 Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, MicahSherr, Clay Shields, David Wagner and Wenchao Zhou, "Hidden Voice Commands," 25th USENIX Security Symposium, 2016. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/carlini
20 Kyunhwha Kim, Buungmin So, Hajin Yu, "Forensic Automatic Speaker Identification System for Korean Speakers," Phonetics and Speech Sciences, Vol.4, No.3, pp.95-101, 2012. https://doi.org/10.13064/KSSS.2012.4.3.095   DOI