Estimating speech parameters for ultrasonic Doppler signal using LSTM recurrent neural networks

Joo, Hyeong-Kil;Lee, Ki-Seung;

doi:10.7776/ASK.2019.38.4.433

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 38 Issue 4
/
Pages.433-441
/
2019
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Estimating speech parameters for ultrasonic Doppler signal using LSTM recurrent neural networks

LSTM 순환 신경망을 이용한 초음파 도플러 신호의 음성 패러미터 추정

Joo, Hyeong-Kil (Department of Electronic Engineering, Konkuk University) ;
Lee, Ki-Seung

주형길 (건국대학교 전기전자공학부) ;
이기승 (건국대학교 전기전자공학부)

Received : 2019.05.15
Accepted : 2019.07.11
Published : 2019.07.31

https://doi.org/10.7776/ASK.2019.38.4.433 Citation PDF KSCI HTML

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, a method of estimating speech parameters for ultrasonic Doppler signals reflected from the articulatory muscles using LSTM (Long Short Term Memory) RNN (Recurrent Neural Networks) was introduced and compared with the method using MLP (Multi-Layer Perceptrons). LSTM RNN were used to estimate the Fourier transform coefficients of speech signals from the ultrasonic Doppler signals. The log energy value of the Mel frequency band and the Fourier transform coefficients, which were extracted respectively from the ultrasonic Doppler signal and the speech signal, were used as the input and reference for training LSTM RNN. The performance of LSTM RNN and MLP was evaluated and compared by experiments using test data, and the RMSE (Root Mean Squared Error) was used as a measure. The RMSE of each experiment was 0.5810 and 0.7380, respectively. The difference was about 0.1570, so that it confirmed that the performance of the method using the LSTM RNN was better.

본 논문에서는 입 주변에 방사한 초음파 신호가 반사되어 돌아올 때 발생하는 초음파 도플러 신호를 LSTM(Long Short Term Memory) 순환 신경망 (Recurrent Neural Networks, RNN)을 이용해 음성 패러미터를 추정하는 방법을 소개하고 다층 퍼셉트론 (Multi-Layer Perceptrons, MLP) 신경망을 이용한 방법과 성능 비교를 하였다. 본 논문에서는 LSTM 순환 신경망을 이용해 초음파 도플러 신호로부터 음성 신호의 푸리에 변환 계수를 추정하였다. LSTM 순환 신경망을 학습하기 위한 입력 및 기준값으로 초음파 도플러 신호와 음성 신호로부터 각각 추출된 멜 주파수 대역별 에너지 로그값과 푸리에 변환 계수가 사용되었다. 테스트 데이터를 이용한 실험을 통해 LSTM 순환 신경망과 MLP의 성능을 평가, 비교하였고 척도로는 평균 제곱근 오차(Root Mean Squared Error, RMSE)가 사용되었다.각 실험의 RMSE는 각각 0.5810, 0.7380로 나타났다. 약 0.1570 차이로 LSTM 순환 신경망을 이용한 방법의 성능 우세한 것으로 확인되었다.

Keywords

GOHHBH_2019_v38n4_433_f0001.png 이미지

Fig. 1. A block diagram of the proposed method.

GOHHBH_2019_v38n4_433_f0002.png 이미지

Fig. 2. Band pass characteristics for each mel frequency band.

GOHHBH_2019_v38n4_433_f0003.png 이미지

Fig. 3. Structure of MLP.

GOHHBH_2019_v38n4_433_f0004.png 이미지

Fig. 4. Structure of LSTM RNN.

GOHHBH_2019_v38n4_433_f0005.png 이미지

Fig. 5. Structure of LSTM cell.

GOHHBH_2019_v38n4_433_f0006.png 이미지

Fig. 6. Configuration of the acoustic Doppler microphone.

GOHHBH_2019_v38n4_433_f0007.png 이미지

Fig. 7. An example of the spectrograms of (a) received ultrasonic signal, (b) corresponding speech signal.

GOHHBH_2019_v38n4_433_f0008.png 이미지

Fig. 8. RMSE of LSTM according to the number of hidden nodes.

GOHHBH_2019_v38n4_433_f0009.png 이미지

Fig. 9. RMSE of LSTM and MLP according to the number of layers.

GOHHBH_2019_v38n4_433_f0010.png 이미지

Fig. 10. RMSE of LSTM and MLP according to the number of ultrasonic Doppler signal channels.

GOHHBH_2019_v38n4_433_f0011.png 이미지

Fig. 11. Comparison of MLP and LSTM feature variables estimation.

References

B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, "Silent speech interfaces," Speech Comm. 52, 270-287 (2010). https://doi.org/10.1016/j.specom.2009.08.002
K. S. Lee, "Prediction of acoustic feature parameters using myoelectric signals," IEEE Trans. On Biomed. Eng. 51, 1587-1595 (2010).
T. Toda and K. Shikano, "NAM-to-Speech conversion with Gaussian Mixture Models," Proc. Interspeech, 1957-1960 (2005).
S. Li, J. Q. Wang, M. Niu, T. Liu, and X. J. Jing, "The enhancement of millimeter wave conduct speech based on perceptual weighting," Progress in Electromagnetics Research B, 9, 199-214 (2008). https://doi.org/10.2528/PIERB08063001
K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech," Speech Comm. 54, 134-146 (2012). https://doi.org/10.1016/j.specom.2011.07.007
K. Kalgaonkar and B. Raj, "An acoustic Dopplerbased front end for hands free spoken user interaces," Proc. SLT, 158-161 (2006).
K. Kalgaonkar and B. Raj, "Acoustic Doppler sonar for gait recognition," Proc. 2007 IEEE Conf. Advanced Video and Signal Based Surveillance, 27-32 (2007).
K. Kalgaonkar and B. Raj, "One-handed gesture recognition using ultrasonic Doppler sonar," Proc. ICASSP, 1889-1892 (2009).
S. Srinivasan, B. Raj, and T. Ezzat, "Ultrasonic sensing for robust speech recognition," Proc. ICASSP, 5102-5105 (2010).
K. Livescu, B. Zhu, and J. Glass, "On the phonetic information in ultrasonic microphone signals," Proc. ICASSP, 4621-4624 (2009).
A. R. Toth, B. Raj, K. Kalgaonkar, and T. Ezzat, "Synthesizing speech from Doppler signals," Proc. ICASSP, 4638-4641 (2010).
K. S. Lee, "Speech synthesis using acoustic Doppler signal", J. Acoust. Soc. Kr. 35, 134-142 (2016). https://doi.org/10.7776/ASK.2016.35.2.134
K. S. Lee, "Automatic speech recognition using acoustic doppler signal", J. Acoust. Soc. Kr. 35, 74-82 (2016). https://doi.org/10.7776/ASK.2016.35.1.074
K. S. Lee, "Speech Enhancement using ultrasonic doppler sonar," Speech Comm. 110, 21-32 (2019). https://doi.org/10.1016/j.specom.2019.03.008
F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (Spartan Books, Washington DC, 1961), pp. 3-585.
D.E. Rumelhart, G.E. Hilton, and R.J. Williams, "Learning internal representations by error propagation," in parallel distributed processing: Explorations in the microstructure of cognitio (MIT press, Cambridge, 1986), pp. 318-362.
S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation. 9, 1735-1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
F. Gers, N. Schraudolph, and J. Schmidhuber, "Learning Precise Timing with LSTM Recurrent Networks," Journal of Machine Learning Research. 3, 115-143 (2002).
Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Understanding-LSTMs/, 2019.
J. Turian, J. Bergstra, and Y. Bengio, "Quadratic features and deep architectures for chunking," Proc. NAACL HLT 2009, 245-248 (2009).
Ptb_Word_lm.py, https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py

The Journal of the Acoustical Society of Korea (한국음향학회지)

Estimating speech parameters for ultrasonic Doppler signal using LSTM recurrent neural networks

LSTM 순환 신경망을 이용한 초음파 도플러 신호의 음성 패러미터 추정

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)