[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7776/ASK.2021.40.2.161

An acoustic Doppler-based silent speech interface technology using generative adversarial networks

Lee, Ki-Seung (Department of Electronic Engineering, Konkuk University)

Publication Information

The Journal of the Acoustical Society of Korea / v.40, no.2, 2021 , pp. 161-168 More about this Journal

Abstract

In this paper, a Silent Speech Interface (SSI) technology was proposed in which Doppler frequency shifts of the reflected signal were used to synthesize the speech signals when 40kHz ultrasonic signal was incident to speaker's mouth region. In SSI, the mapping rules from the features derived from non-speech signals to those from audible speech signals was constructed, the speech signals are synthesized from non-speech signals using the constructed mapping rules. The mapping rules were built by minimizing the overall errors between the estimated and true speech parameters in the conventional SSI methods. In the present study, the mapping rules were constructed so that the distribution of the estimated parameters is similar to that of the true parameters by using Generative Adversarial Networks (GAN). The experimental result using 60 Korean words showed that, both objectively and subjectively, the performance of the proposed method was superior to that of the conventional neural networks-based methods.

Keywords

Silent speech interface(SSI); Generative adversarial networks(GAN); Ultrasonic Doppler; Speech synthesis;

Citations & Related Records

Reference

1	B. Denby and M. Stone, "Speech synthesis from real time ultrasound images of the tongue," Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 685-688 (2004).
2	S. Li, Y. Tian, G. Lu, Y. Zhang, H. Lv, X. Yu, H. Xue, H. Zhang, J. Wang, and X. Jing, "A 94-GHz milimeter-wave sensor for speech signal acquisition," Sensors, 13, 14248-14260 (2013). DOI
3	K. S. Lee, "Silent speech interface using Doppler sonar," IEICE Trans. on Information and Systems, E103-D, 1875-1887, (2020). DOI
4	T. Toda and K. Shikano, "NAM-to-Speech convertsion with Gaussian Mixture Models," Proc. INTERSPEECH, 1957-1960 (2005).
5	K.-S. Lee, "Prediction of acoustic feature parameters using myoelectric signals," IEEE Trans. on Biomed. Eng, 57, 1587-1595 (2010). DOI
6	I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Proc. Adv. NIPS. 2672-2680 (2014).
7	D. W. Griffin and J. S. Lim, "Signal estimation from the modified short-time fourier transform," IEEE Trans. on ASSP. 32, 236-243 (1984). DOI
8	ITU-T, Rec. P. 862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for end-to-end Speech Quality Assessment of Narrow Band Telephone Networks and Speech Codecs, 2001.
9	B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, "Silent speech interfaces," Speech Comm. 52, 270-287 (2010). DOI
10	T. Hueber, G. Aversano, G. Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, and M. Stone, "Eigen-tongue feature extraction for an ultrasound-based silent speech interface," Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1245-1248 (2007).

KSCI

An acoustic Doppler-based silent speech interface technology using generative adversarial networks 생성적 적대 신경망을 이용한 음향 도플러 기반 무 음성 대화기술

An acoustic Doppler-based silent speech interface technology using generative adversarial networks