Browse > Article

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique  

장효종 (숭실대학교 컴퓨터학과)
최형일 (숭실대학교 미디어학부)
Abstract
This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.
Keywords
formant trajectory; spectral smoothing; acoustic characteristic; speech synthesis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Esther Klabbers, Raymond Veldhuis, Reducing Audible Spectral Discontivuities, IEEE Transactions on Speech and Audio Processing, Vol 9, No. 1, Jan 2001
2 H. van den Heuvel, B.Cranen, T.Rietveld, Speaker variability in the coarticulation of /a,i,u/, Speech Communication 18, pp113-130, 1996   DOI   ScienceOn
3 L. R. Rabiner, R. W. Schafer, Digital Processing of Speech Signals, Prentice-hall, 1978
4 H. S. Hou and H. C. Andrews, Cubic Splines for Image Interpolatio and Digital Filtering, IEEE Trans. Acoustics,Speech,and Signal Processing, ASSP-26,6, December 1978, 508-517
5 Wouters, J. ,Macon, M.W. ,Control of Spectral Dynamics in Concatenative Speech Synthesis, Speech and Audio Processing, IEEE Transactions on, Vol 9, No. 1, pp30-38, Jan 2001   DOI   ScienceOn
6 Hossein Najafzadeh-Azghandi, Perceptual Coding of Narrowband Signals, Ph.D The-sis, Department of Electrical & Computer Engineering, McGill University, Montreal, Canada, April 2000
7 John H. L. Hansen and David T.Chappell, An Auditory-Based Distortion Measure with Application to Concatenative Speech Synthesis, Speech and Audio Processing, IEEE Transactions on, Vol 6, No.5, pp489-495, Sep 1998   DOI   ScienceOn
8 Kleijn W.B., Haagen J., Waveform interpolation for coding and synthesis, Speech Coding and Synthesis, Chapter 5, pp175-207, 1995
9 David T. Chappell, John H.L. Hansen, A Comparison of Spectral Smoothing methods for segment concatenation based speech synthesis, Speech Communication 36, pp343-374, 2002   DOI   ScienceOn
10 Conkie, A.D., Isard S., Optimal coupling of diphones Progress in Speech Synthesis, Springer, New York, Chapter 23, pp293-304, 1997
11 R.E. Donovan, P.C. Woodland, A hidden Markov model based trainable speech synthesizer, Computer Speech and Language, pp1-19, 1999   DOI   ScienceOn