DOI QR코드

DOI QR Code

Packet Loss Concealment Algorithm Using Pitch Harmonic Motion Estimation and Adaptive Signal Scale Estimation

피치 하모닉 움직임 예측과 적응적 신호 크기 예측을 이용한 패킷 손실 은닉 알고리즘

  • Kim, Tae-Ha (School of Information Communication Eng., Research Institute for Computer and Information Communication, Chu ngbuk National University) ;
  • Lee, In-Sung (School of Information Communication Eng., Research Institute for Computer and Information Communication, Chu ngbuk National University)
  • Received : 2021.07.30
  • Accepted : 2021.08.10
  • Published : 2021.08.30

Abstract

In this paper, we propose a packet loss concealment (PLC) algorithm using pitch harmonic motion prediction and adaptive signal amplitude prediction and. The spectral motion prediction method divides the spectral motion of the previous usable frame into predetermined sub-bands to predict and restore the motion of the lost signal. In the proposed algorithm, the speech signal is classified into voiced and unvoiced sounds. In the case of voiced sounds, it is further divided into pitch harmonics using the pitch frequency to predict and restore the pitch harmonic motion of the lost frame, and for the unvoiced sound, the lost frame is restored using the spectral motion prediction method. When the continuous loss of speech frames occurs, a method of adjusting the gain using the least mean square (LMS) predictor is proposed. The performance of the proposed algorithm was evaluated through the objective evaluation method, PESQ (Perceptual Evaluation of Speech Quality) and was showed MOS 0.1 improvement over the conventional method.

본 논문에서는 피치 하모닉 움직임 예측과 적응적 신호 크기 예측을 이용한 패킷 손실 알고리즘을 제안한다. 스펙트럼 움직임 예측 방법은 사용 가능한 이전 패킷의 스펙트럼 상의 움직임을 일정한 부대역으로 나누어 손실된 신호의 움직임을 예측하여 복원한다. 제안하는 알고리즘에서는 음성신호를 유성음과 무성음으로 구분하여 유성음의 경우 피치 주파수를 활용하여 피치 하모닉으로 나누어 손실된 신호의 피치 하모닉 움직임을 예측하여 복원하고 무성음의 경우 스펙트럼 움직임 예측 방법을 사용하여 신호를 복원한다. 음성 프레임의 연속 손실이 발생한 경우 LMS(Least Mean Square) 예측기를 사용하여 이전 프레임의 이득 정보를 활용하여 신호 크기를 예측하여 출력 신호의 이득을 조절하는 방법을 제안한다. 객관적 평가방법인 PESQ (Perceptual Evaluation of Speech Quality) 시험을 통해 제안된 알고리즘의 성능을 평가하였고 기존의 방법보다 MOS 0.1의 성능 개선을 보였다.

Keywords

References

  1. S. Y. Jo, "Trends on Standardization of Voice Service Support in LTE Networks", TTA Journal 139, pp.100-103, 2012.
  2. M. S. Lee, D. Y. Kim, and B. S. Lee, "Trends of Codec Technology for 4G Mobile Enhanced Voice Service", Electronics and Telecommunications Trends 25, pp. 29-37, 2010. https://doi.org/10.22648/ETRI.2010.J.250604
  3. M. S. Lee, D. Y. Kim, and B. S. Lee, "Trends on Speech Codec for Voice Communication Service", Electronics and Telecommunications Trends 16, pp. 46-58, 2010. https://doi.org/10.22648/ETRI.2001.J.160505
  4. B. Y. Chang, D. W. Seo, and B. J. Park, "Study on VoIP Service Quality Management", J. the Institute of Webcasting, Internet and Telecommunication 11, pp.245-252, 2011.
  5. S. H. Han, J. S. Kim, H. W. Lee, W. Ryu and M. S. Hahn, "Performance Improvement of Packet Loss Concealment Algorithm in G.711 Using Speech Characteristics", Phonetics and Speech Sciences, 57, pp.175-198, 2006.
  6. ITU-T Recommendation, Appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711, 1999.
  7. V. P. Bhute and U. N. Shrawankar, "Speech Packet Concealment Techniques Based on Time-Scale Modification for VoIP", IEEE International Conference on Computer Science and Information Technology(ICCSIT), pp.825-828, 2008.
  8. S. K. Pedram, S. Vaseghi, and B. Langari, "Audio packet loss concealment using spectral motion", IEEE International Conference on Acoustic, Speech and Signal Processing(ICASSP), pp. 6707-6710, 2014.
  9. S. J. Miller, "The Method of Least Squares", Mathematics Department Brown University, 2006
  10. E. Zavarehei, and S. Vaseghi, "Interpolation of Lost Speech Segments Using LP-HNM Model With Codebook Post-Processing", IEEE Transactions on Multimedia 10, pp. 493-502, 2008. https://doi.org/10.1109/TMM.2008.917345
  11. X. Mei, J. Pan and S. Sun, "Efficient algorithms for speech pitch estimation", IEEE International Symposium on Intelligent Multimedia, Video and Speech Processing(ISIMP), pp.421-424, 2001.
  12. K. C. Kim, S. J. Park, S. P. Lee, and M. Y. Kim, "Pitch Estimation Method in an Integrated Time and Frequency Domain by Applying Linear Interpolation", The Institute of Electronics and Information Engineers 47, pp.100-108, 2010.
  13. Zeidler, J.R., "performance analysis of LMS adaptive prediction filter", Proceedings of the IEEE 78, pp.1781-1806, 1990.
  14. ITU-T Recommendation, Perceptual evaluation of speech quality(PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, 2001.
  15. M. Mushkin and I. Bar-david, "Capacity and coding for the Gilbert-Elliot channels", IEEE Transactions on Information Theory 35, pp.1277-1290, 1989. https://doi.org/10.1109/18.45284
  16. ITU-T Recommendation, Amendment 2: New Appendix III - Audio quality enhancement toolbox, 2009.