[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTB.2003.10B.6.665

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing

Jang, Hyo-Jong (숭실대학교 대학원 컴퓨터학과)
Kim, Kwan-Jung (한서대학교 컴퓨터정보학과)
Kim, Gye-Young (숭실대학교 컴퓨터학부)
Choi, Hyung-Il (숭실대학교 미디어학부)

Publication Information

The KIPS Transactions:PartB / v.10B, no.6, 2003 , pp. 665-672 More about this Journal

Abstract

This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human's acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human's acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.

Keywords

Diphone Clustering; Formant Trajectory; Spectral Smoothing; Acoustic Characteristic; Speech Synthesis;

Citations & Related Records

Reference

1	Wouters, J., Macon, M.W., 'Control of Spectral Dynamics in Concatenative Speech Synthesis,' IEEE Transanctions on Speech and Audio Processing, Vol.9, No.1, pp.30-38, Jan., 2001 DOI ScienceOn
2	David T. Chappell, John H.L. Hansen, 'A Comparison of Spectral Smoothing methods for segment conatenation based speech synthesis,' Speech Communication Vol.36, pp.343-374, 2002 DOI ScienceOn
3	Hossein Najafzadeh-Azghandi, 'Perceptual Coding of Narrowband Signals,' Ph.D Thesis, Department of Electrical & Computer Engineering, McGill University, Montreal, Canada, April, 2000
4	Esther Klabbers, Raymond Veldhuis, 'Reducing Audible Spectral Discontivuities,' IEEE Transactions on Speech and Audio Processing, Vol.9, No.1, Jan., 2001 DOI ScienceOn
5	H.van den Heuvel, B. Cranen, T. Rietveld, 'Speaker variability in the coarticulation of/a, i, u/,' Speech Communication, Vol.18, pp.113-130, 1996 DOI ScienceOn
6	John H.L. Hansen and David T. Chappell, 'An Auditory Based Distortion Measure with Application to Concatenative Speech Synthesis,' IEEE Transactions on Speech and Audio Processing, Vol.6, No.5, pp.489-495, Sep., 1998 DOI ScienceOn
7	Conkie, A.D., Isard, S., 'Optimal coupling of diphones Progress in Speech Synthesis,' Springer, New York, Chapter 23, pp.293-304, 1997
8	L.R. Rabiner, R.W. Schafer, 'Digital Processing of Speech Signals,' Prentice-hall, 1978
9	R.E. Donovan, P.C. Woodland, 'A hidden Markov model based trainable speech synthesizer,' Computer Speech and Lanquage, pp.1-19, 1999 DOI ScienceOn
10	Kleijin, W.B., Haagen, J., 'Waveform interpolation for coding and synthesis,' Speech Coding and Synthesis, Chapter 5, pp.175-207, 1995
11	H.S. Hou and H.C. Andrews, 'Cubic Splines for Image Interpolatio and Digital Filtering,' IEEE Transactions on Acoustics Speech and Signal Processing, ASSP, Vol.26, No.6, pp.508-517, December, 1978 DOI

KSCI

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing 다이폰 군집화와 개선된 스펙트럼 완만화에 의한 음성합성

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing