Fig. 1. A block diagram of the proposed method.
Fig. 2. Band pass characteristics for each mel frequency band.
Fig. 3. Structure of MLP.
Fig. 4. Structure of LSTM RNN.
Fig. 5. Structure of LSTM cell.
Fig. 6. Configuration of the acoustic Doppler microphone.
Fig. 7. An example of the spectrograms of (a) received ultrasonic signal, (b) corresponding speech signal.
Fig. 8. RMSE of LSTM according to the number of hidden nodes.
Fig. 9. RMSE of LSTM and MLP according to the number of layers.
Fig. 10. RMSE of LSTM and MLP according to the number of ultrasonic Doppler signal channels.
Fig. 11. Comparison of MLP and LSTM feature variables estimation.
References
- B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, "Silent speech interfaces," Speech Comm. 52, 270-287 (2010). https://doi.org/10.1016/j.specom.2009.08.002
- K. S. Lee, "Prediction of acoustic feature parameters using myoelectric signals," IEEE Trans. On Biomed. Eng. 51, 1587-1595 (2010).
- T. Toda and K. Shikano, "NAM-to-Speech conversion with Gaussian Mixture Models," Proc. Interspeech, 1957-1960 (2005).
- S. Li, J. Q. Wang, M. Niu, T. Liu, and X. J. Jing, "The enhancement of millimeter wave conduct speech based on perceptual weighting," Progress in Electromagnetics Research B, 9, 199-214 (2008). https://doi.org/10.2528/PIERB08063001
- K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech," Speech Comm. 54, 134-146 (2012). https://doi.org/10.1016/j.specom.2011.07.007
- K. Kalgaonkar and B. Raj, "An acoustic Dopplerbased front end for hands free spoken user interaces," Proc. SLT, 158-161 (2006).
- K. Kalgaonkar and B. Raj, "Acoustic Doppler sonar for gait recognition," Proc. 2007 IEEE Conf. Advanced Video and Signal Based Surveillance, 27-32 (2007).
- K. Kalgaonkar and B. Raj, "One-handed gesture recognition using ultrasonic Doppler sonar," Proc. ICASSP, 1889-1892 (2009).
- S. Srinivasan, B. Raj, and T. Ezzat, "Ultrasonic sensing for robust speech recognition," Proc. ICASSP, 5102-5105 (2010).
- K. Livescu, B. Zhu, and J. Glass, "On the phonetic information in ultrasonic microphone signals," Proc. ICASSP, 4621-4624 (2009).
- A. R. Toth, B. Raj, K. Kalgaonkar, and T. Ezzat, "Synthesizing speech from Doppler signals," Proc. ICASSP, 4638-4641 (2010).
- K. S. Lee, "Speech synthesis using acoustic Doppler signal", J. Acoust. Soc. Kr. 35, 134-142 (2016). https://doi.org/10.7776/ASK.2016.35.2.134
- K. S. Lee, "Automatic speech recognition using acoustic doppler signal", J. Acoust. Soc. Kr. 35, 74-82 (2016). https://doi.org/10.7776/ASK.2016.35.1.074
- K. S. Lee, "Speech Enhancement using ultrasonic doppler sonar," Speech Comm. 110, 21-32 (2019). https://doi.org/10.1016/j.specom.2019.03.008
- F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (Spartan Books, Washington DC, 1961), pp. 3-585.
- D.E. Rumelhart, G.E. Hilton, and R.J. Williams, "Learning internal representations by error propagation," in parallel distributed processing: Explorations in the microstructure of cognitio (MIT press, Cambridge, 1986), pp. 318-362.
- S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation. 9, 1735-1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
- F. Gers, N. Schraudolph, and J. Schmidhuber, "Learning Precise Timing with LSTM Recurrent Networks," Journal of Machine Learning Research. 3, 115-143 (2002).
- Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Understanding-LSTMs/, 2019.
- J. Turian, J. Bergstra, and Y. Bengio, "Quadratic features and deep architectures for chunking," Proc. NAACL HLT 2009, 245-248 (2009).
- Ptb_Word_lm.py, https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py