DOI QR코드

DOI QR Code

Recovery of Lost Speech Segments Using Incremental Subspace Learning

  • Huang, Jianjun (Institute of Command Automation, PLA University of Science and Technology) ;
  • Zhang, Xiongwei (Institute of Command Automation, PLA University of Science and Technology) ;
  • Zhang, Yafei (Institute of Command Automation, PLA University of Science and Technology)
  • Received : 2011.09.22
  • Accepted : 2012.03.23
  • Published : 2012.08.30

Abstract

An incremental subspace learning scheme to recover lost speech segments online is presented. Our contributions in this work are twofold. First, the recovery problem is transformed into an interpolation problem of the time-varying gains via nonnegative matrix factorization. Second, incremental nonnegative matrix factorization is employed to allow online processing and track the evolution of speech statistics. The effectiveness of the proposed scheme is confirmed by the experiment results.

Keywords

References

  1. Appendix I: A High Quality Low-Complexity Algorithm for Packet Loss Concealment with G.711, ITU-T Recommend G.711, Sept. 1999.
  2. Y.J. Liang, N. Färber, and B. Girod, "Adaptive Playout Scheduling and Loss Concealment for Voice Communication over IP Networks," IEEE Trans. Multimedia, vol. 5, no. 2, June 2003, pp. 532-543.
  3. E. Zavarehei and S. Vaseghi, "Interpolation of Lost Speech Segments Using LP-HNM Model with Codebook Post-Processing," IEEE Trans. Multimedia, vol. 10, no. 3, Apr. 2008, pp. 493-502. https://doi.org/10.1109/TMM.2008.917345
  4. C.A. Rødbro et al., "Hidden Markov Model-Based Packet Loss Concealment for Voice over IP," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, Sept. 2006, pp. 1609-1623. https://doi.org/10.1109/TSA.2005.858561
  5. S.S. Bucak and B. Gunsel, "Incremental Subspace Learning via Non-negative Matrix Factorization," Pattern Recognition, vol. 42, no. 5, May 2009, pp. 788-797. https://doi.org/10.1016/j.patcog.2008.09.002
  6. T. Virtanen, "Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 3, Mar. 2007, pp. 1066-1074. https://doi.org/10.1109/TASL.2006.885253
  7. X. Zhu, G.T. Beauregard, and L.L. Wyse, "Real-Time Signal Estimation from Modified Short-Time Fourier Transform Magnitude Spectra," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, July 2007, pp. 1645-1653. https://doi.org/10.1109/TASL.2007.899236
  8. G. Zhou et al., "Online Blind Source Separation Using Incremental Nonnegative Matrix Factorization with Volume Constraint," IEEE Trans. Neural Netw., vol. 22, no. 4, Apr. 2011, pp. 550-560. https://doi.org/10.1109/TNN.2011.2109396
  9. Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, ITU-T Recommendation P.862, 2001.

Cited by

  1. Approach for time-scale modification of speech based on TCNMF vol.49, pp.1, 2013, https://doi.org/10.1049/el.2012.3262
  2. Adaptive Speech Streaming Based on Speech Quality Estimation and Artificial Bandwidth Extension for Voice over Wireless Multimedia Sensor Networks vol.11, pp.6, 2015, https://doi.org/10.1155/2015/395752
  3. Adaptive Speech Streaming Based on Packet Loss Prediction Using Support Vector Machine for Software-Based Multipoint Control Unit over IP Networks vol.38, pp.6, 2012, https://doi.org/10.4218/etrij.16.2716.0013