DOI QR코드

DOI QR Code

Non-Intrusive Speech Intelligibility Estimation Using Autoencoder Features with Background Noise Information

  • Jeong, Yue Ri (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology) ;
  • Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)
  • Received : 2020.07.21
  • Accepted : 2020.08.01
  • Published : 2020.08.31

Abstract

This paper investigates the non-intrusive speech intelligibility estimation method in noise environments when the bottleneck feature of autoencoder is used as an input to a neural network. The bottleneck feature-based method has the problem of severe performance degradation when the noise environment is changed. In order to overcome this problem, we propose a novel non-intrusive speech intelligibility estimation method that adds the noise environment information along with bottleneck feature to the input of long short-term memory (LSTM) neural network whose output is a short-time objective intelligence (STOI) score that is a standard tool for measuring intrusive speech intelligibility with reference speech signals. From the experiments in various noise environments, the proposed method showed improved performance when the noise environment is same. In particular, the performance was significant improved compared to that of the conventional methods in different environments. Therefore, we can conclude that the method proposed in this paper can be successfully used for estimating non-intrusive speech intelligibility in various noise environments.

Keywords

References

  1. Ludovic Malfait, Jens Berger, and Martin Kastner, "P.563 -The ITU-T standard for single-ended speech quality assessment," IEEE Transactions on Audio, Speech, and Language Processing 14.6, pp.1924-1934, 2006. DOI: 10.1109/TASL.2006.883177
  2. C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125- 2136, 2011. DOI: https://www.doi.org/10.1109/TASL.2011.2114881
  3. Dushyant Sharma, Yu Wang, Patrick A. Naylor, Mike Brookes, "A data-driven non-intrusive measure of speech quality and intelligibility," Speech Communication, vol. 80, June 2016, pp. 84-94, June 2016. DOI: https://doi.org/10.1016/j.specom.2016.03.005
  4. A. H. Andersen, J. M. de Haan, Z. tan and J. Jensen, "Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1925-1939, Oct. 2018. DOI: 10.1109/TASLP.2018.2847459
  5. Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, and Johannes Gehrke, "Nonintrusive Speech Quality Assessment Using Neural Networks," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 18777982, May 2019. DOI: 10.1109/ICASSP.2019.8683175
  6. D. K. Yun, H. N. Lee, and S. H. Choi, "A Deep Learning-Based Approach to Non-Intrusive Speech Intelligibility Estimation," IEICE Trans. Information and Systems, pp. 1207-1208, Apr. 2018. DOI: 10.1587/transinf.2017EDL8225
  7. Y. H. Kim, D. K. Yun, H. N. Lee, and S. H. Choi, "A Non-Intrusive Speech Intelligibility Estimation Method Based on Deep Learning Using Autoencoder Features" IEICE Trans. Information and Systems, Vol.E103-D No.3, March. 2020. DOI: 10.1587/transinf.2019EDL8150
  8. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997. DOI: 10.1162/neco.1997.9.8.1735
  9. Hasim Sak, Andrew W. Senior, and Françoise Beaufays, "Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling models," Proc. INTERSPEECH, pp. 338-342, 2014.
  10. Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramab, "Auto-encoder bottleneck features using deep belief networks," Proc. ICASSP, pp. 4153-4156, 2012. DOI: 10.1109/ICASSP.2012.6288833
  11. V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," Proc. of the 27th international conference on machine learning (ICML-10), pp. 807-814. 2010. DOI: https://dl.acm.org/citation.cfm?id=3104425
  12. Diederik P. Kingma and Jimmy Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014. DOI: https://arxiv.org/abs/1412.6980
  13. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic phonetic continuous speech corpus CDROM," NIST, 1993.