Browse > Article
http://dx.doi.org/10.7776/ASK.2021.40.2.161

An acoustic Doppler-based silent speech interface technology using generative adversarial networks  

Lee, Ki-Seung (Department of Electronic Engineering, Konkuk University)
Abstract
In this paper, a Silent Speech Interface (SSI) technology was proposed in which Doppler frequency shifts of the reflected signal were used to synthesize the speech signals when 40kHz ultrasonic signal was incident to speaker's mouth region. In SSI, the mapping rules from the features derived from non-speech signals to those from audible speech signals was constructed, the speech signals are synthesized from non-speech signals using the constructed mapping rules. The mapping rules were built by minimizing the overall errors between the estimated and true speech parameters in the conventional SSI methods. In the present study, the mapping rules were constructed so that the distribution of the estimated parameters is similar to that of the true parameters by using Generative Adversarial Networks (GAN). The experimental result using 60 Korean words showed that, both objectively and subjectively, the performance of the proposed method was superior to that of the conventional neural networks-based methods.
Keywords
Silent speech interface(SSI); Generative adversarial networks(GAN); Ultrasonic Doppler; Speech synthesis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 B. Denby and M. Stone, "Speech synthesis from real time ultrasound images of the tongue," Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 685-688 (2004).
2 S. Li, Y. Tian, G. Lu, Y. Zhang, H. Lv, X. Yu, H. Xue, H. Zhang, J. Wang, and X. Jing, "A 94-GHz milimeter-wave sensor for speech signal acquisition," Sensors, 13, 14248-14260 (2013).   DOI
3 K. S. Lee, "Silent speech interface using Doppler sonar," IEICE Trans. on Information and Systems, E103-D, 1875-1887, (2020).   DOI
4 T. Toda and K. Shikano, "NAM-to-Speech convertsion with Gaussian Mixture Models," Proc. INTERSPEECH, 1957-1960 (2005).
5 K.-S. Lee, "Prediction of acoustic feature parameters using myoelectric signals," IEEE Trans. on Biomed. Eng, 57, 1587-1595 (2010).   DOI
6 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Proc. Adv. NIPS. 2672-2680 (2014).
7 D. W. Griffin and J. S. Lim, "Signal estimation from the modified short-time fourier transform," IEEE Trans. on ASSP. 32, 236-243 (1984).   DOI
8 ITU-T, Rec. P. 862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for end-to-end Speech Quality Assessment of Narrow Band Telephone Networks and Speech Codecs, 2001.
9 B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, "Silent speech interfaces," Speech Comm. 52, 270-287 (2010).   DOI
10 T. Hueber, G. Aversano, G. Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, and M. Stone, "Eigen-tongue feature extraction for an ultrasound-based silent speech interface," Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1245-1248 (2007).