Browse > Article
http://dx.doi.org/10.7776/ASK.2012.31.7.427

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification  

Lim, Sangjun (Department of Electronics Eng., Pusan National University)
Kim, Hyung Soon (Department of Electronics Eng., Pusan National University)
Abstract
The overlap-add technique based on waveform similarity (WSOLA) method is known to be an efficient high-quality algorithm for time scaling of speech signal. The computational load of WSOLA is concentrated on the repeated normalized cross-correlation (NCC) calculation to evaluate the similarity between two signal waveforms. To reduce the computational complexity of WSOLA, this paper proposes a fast NCC computation method, in which NCC is obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations in the adjacent regions. While the denominator part of NCC has much redundancy irrespective of the time-scale factor, the numerator part of NCC has less redundancy and the amount of redundancy is dependent on both the time-scale factor and optimal shift value, thereby requiring more sophisticated algorithm for fast computation. The simulation results show that the proposed method reduces about 40%, 47% and 52% of the WSOLA execution time for the time-scale compression, 2 and 3 times time-scale expansions, respectively, while maintaining exactly the same speech quality of the conventional WSOLA.
Keywords
Time-scale modification; WSOLA; Fast normalized cross-correlation computation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W. Verhelst and M. Roelands, "An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Minneapolis, MN, pp. 554-557, 1993.
2 E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech Commun., vol. 9, no. 5-6, pp. 453-467, 1990.   DOI   ScienceOn
3 D. S. Kim, Y. H. Lee, H. K. Kim, S. H. Choi, J. W. Kim, M. B. Kim, "Complexity reduction of WSOLAbased time-scale modification using signal period estimation," Communications in Computer and Information Science, vol. 120, pp. 155-161, 2010.   DOI
4 J. Luo and E. E. Konofagou, "A fast normalized cross-correlation calculation method for motion estimation," IEEE Trans. Ultrasonics, Ferroelectrics and Frequency Control, vol. 57, no. 6, pp. 1347-1357, 2010.   DOI   ScienceOn
5 임상준, 정용원, 김형순 "WSOLA 기반의 음속 변환을 위한 고속의 정규상호상관도 계산," 2011 한국음성학회 가을 학술대회 발표논문집, 85-86쪽, 2011.
6 D. W. Griffin and J. S. Lim, "Signal estimation from modified short-time Fourier transform," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 2, pp. 236-243, Apr. 1984.
7 S. Grotit, Y. Lavner, Time-scale modification of audio signals using enhanced WSOLA with management of transients, IEEE Trans. on Audio, Speech, and Language Processing, vol. 16, no. 1, pp. 106-115, Jan. 2008.   DOI   ScienceOn
8 S. Roucos and A. M. Wilgus, "High quality time-scale modification for speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Tampa, FL, pp. 493-496, 1985.