Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2003.10B.6.665

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing  

Jang, Hyo-Jong (숭실대학교 대학원 컴퓨터학과)
Kim, Kwan-Jung (한서대학교 컴퓨터정보학과)
Kim, Gye-Young (숭실대학교 컴퓨터학부)
Choi, Hyung-Il (숭실대학교 미디어학부)
Abstract
This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human's acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human's acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.
Keywords
Diphone Clustering; Formant Trajectory; Spectral Smoothing; Acoustic Characteristic; Speech Synthesis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Wouters, J., Macon, M.W., 'Control of Spectral Dynamics in Concatenative Speech Synthesis,' IEEE Transanctions on Speech and Audio Processing, Vol.9, No.1, pp.30-38, Jan., 2001   DOI   ScienceOn
2 David T. Chappell, John H.L. Hansen, 'A Comparison of Spectral Smoothing methods for segment conatenation based speech synthesis,' Speech Communication Vol.36, pp.343-374, 2002   DOI   ScienceOn
3 Hossein Najafzadeh-Azghandi, 'Perceptual Coding of Narrowband Signals,' Ph.D Thesis, Department of Electrical & Computer Engineering, McGill University, Montreal, Canada, April, 2000
4 Esther Klabbers, Raymond Veldhuis, 'Reducing Audible Spectral Discontivuities,' IEEE Transactions on Speech and Audio Processing, Vol.9, No.1, Jan., 2001   DOI   ScienceOn
5 H.van den Heuvel, B. Cranen, T. Rietveld, 'Speaker variability in the coarticulation of/a, i, u/,' Speech Communication, Vol.18, pp.113-130, 1996   DOI   ScienceOn
6 John H.L. Hansen and David T. Chappell, 'An Auditory Based Distortion Measure with Application to Concatenative Speech Synthesis,' IEEE Transactions on Speech and Audio Processing, Vol.6, No.5, pp.489-495, Sep., 1998   DOI   ScienceOn
7 Conkie, A.D., Isard, S., 'Optimal coupling of diphones Progress in Speech Synthesis,' Springer, New York, Chapter 23, pp.293-304, 1997
8 L.R. Rabiner, R.W. Schafer, 'Digital Processing of Speech Signals,' Prentice-hall, 1978
9 R.E. Donovan, P.C. Woodland, 'A hidden Markov model based trainable speech synthesizer,' Computer Speech and Lanquage, pp.1-19, 1999   DOI   ScienceOn
10 Kleijin, W.B., Haagen, J., 'Waveform interpolation for coding and synthesis,' Speech Coding and Synthesis, Chapter 5, pp.175-207, 1995
11 H.S. Hou and H.C. Andrews, 'Cubic Splines for Image Interpolatio and Digital Filtering,' IEEE Transactions on Acoustics Speech and Signal Processing, ASSP, Vol.26, No.6, pp.508-517, December, 1978   DOI