Browse > Article

An Active Region Detection Method for The Speech Playback-speed Control  

Yoo, Deok-Hyeon (Department of Information and Communication Engineering, Dongguk University)
Kim, Dong-Hyeok (Department of Information and Communication Engineering, Dongguk University)
Jeon, Joon-Hyeon (Department of Information and Communication Engineering, Dongguk University)
Publication Information
Abstract
This paper describes a new method for a speech playback speed control with high quality. The proposed method provides an adaptive threshold filtering solution for detecting active regions of a speech signal that are followed by playback speed. For a given playback speed, threshold value is adaptively determined with the statistics(:mean and standard deviation) of each frame in speech, and is used to select only active blocks within the current frame. To minimize quality degradation(i.e., pitch degradation) caused due to high-speed playback, the threshold filtering priorly eliminates relatively low-activity blocks including voice and unvoice. Simulation results show that the proposed scheme provides a playback speed control solution with higher quality than SOLA(Synchonized OverLap Add) method using the pitch extraction of speech.
Keywords
playback speed control; active resion detection; time-scale modification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 김종국 외, "음성 합성 및 발성 변환 기술," 대한전자공학회 전자공학회지, 제31권, 제6호, 672-778쪽, 2004년 6월
2 H. Valbret, E. Moulines, and J. P. Tubach, "Voice transformation using PSOLA technique," Speech Communication, vol. 11, pp.175-187, June 1992.   DOI   ScienceOn
3 M. A. Richards, "Helium speech enhancement using he short-time fourier transform," IEEE Trans. on Acoustic Speech and Signal Processing, vol. ASSP-30, No. 6, pp.841-853, december, 1982.
4 E. Moulines and F. Charpentier, "Pitch Synchronous Waveform Processing Techniques for Text-to-speech Synthesis using Diphones," Speech Communication, vol. 9 (5/6), pp.453-467, December 1990.
5 T. F. Quatieri and R. J. Mcaulay, "Shape invariance time-scale & pitch modification of speech," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 40, No. 3, pp.497-510, March, 1992.   DOI   ScienceOn
6 양경철 외, "잡음 환경에서의 음성 검출 알고리즘 비교 연구," 한국음성학회, 춘계학술대회 논문집, pp.45-48, 2006년
7 이규범 외, "PDA 기반의 음성과 영상을 이용한 VAD의 구현," 한국인터넷정보학회, 제9권, 제1호, pp.267-272, 2008년
8 하동경 외, "엔트로피 차와 신호의 에너지에 기반한 잡음환경에서의 음성검출," 한국마린엔지니어링학회, 제32권, 제5호, pp.768-774, 2008년
9 강상익 외, "통계적 모델 기반의 음성 검출기를 위한 변별적 가중치 학습," 한국음향학회, 제26권, 제 5호, pp.194-198, 2007년
10 J. L. Wayman and D. L. Wilson, "Some improvements on the synchronized-overlap-add method of time scale modification for use in real-time speech compression and noise filtering," IEEE Transactions on ASSP, pp. 139-140, January 1988.
11 T. F. Quatieri and R. J. Mcaulay, "Shape invariance time-scale & pitch modification of speech," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 40, No. 3, pp.497-510, March, 1992.   DOI   ScienceOn
12 김이길 외, "프레임 길이 조절을 통한 개선 된 SOLA기반 오디오 재생 속도 조절," 대한전자공학회, 하계학술대회 논문집, 909-910쪽, 2009년
13 D. Dorran, M. Centee, "Audio Time-Scale Modification Using a Hybrid Time-Frequency Domain Approach," IEEE workshop on Applications of signal processing to Audio and Acoustics, pp.279-282, 2005.
14 L. Ke, L. Jia, "SOLA based om Zero-Crossing Point," IMACS Multiconference on Computational Engineering in Systems Applications, pp.110-112, 2006.