• Title/Summary/Keyword: Formant synthesis

Search Result 34, Processing Time 0.03 seconds

Implementation of Formant Speech Analysis/Synthesis System (포만트 분석/합성 시스템 구현)

  • Lee, Joon-Woo;Son, Ill-Kwon;Bae, Keuo-Sung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.295-314
    • /
    • 1997
  • In this study, we will implement a flexible formant analysis and synthesis system. In the analysis part, the two-channel (i.e., speech & EGG signals) approach is investigated for accurate estimation of formant information. The EGG signal is used for extracting exact pitch information that is needed for the pitch synchronous LPC analysis and closed phase LPC analysis. In the synthesis part, Klatt formant synthesizer is modified so that the user can change synthesis parameters arbitarily. Experimental results demonstrate the superiority of the two-channel analysis method over the one-channel(speech signal only) method in analysis as well as in synthesis. The implemented system is expected to be very helpful for studing the effects of synthesis parameters on the quality of synthetic speech and for the development of Korean text-to-speech(TTS) system with the formant synthesis method.

  • PDF

A Study on the Text-to-Speech Conversion Using the Formant Synthesis Method (포만트 합성방식을 이용한 문자-음성 변환에 관한 연구)

  • Choi, Jin-San;Kim, Yin-Nyun;See, Jeong-Wook;Bae, Geun-Sune
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.9-23
    • /
    • 1997
  • Through iterative analysis and synthesis experiments on Korean monosyllables, the Korean text-to-speech system was implemented using the phoneme-based formant synthesis method. Since the formants of initial and final consonants in this system showed many variations depending on the medial vowels, the database for each phoneme was made up of formants depending on the medial vowels as well as duration information of transition region. These techniques were needed to improve the intelligibility of synthetic speech. This paper investigates also methods of concatenating the synthesis units to improve the quality of synthetic speech.

  • PDF

Formant Synthesis of Haegeum Sounds Using Cepstral Envelope (캡스트럼 포락선을 이용한 해금 소리의 포만트 합성)

  • Hong, Yeon-Woo;Cho, Sang-Jin;Kim, Jong-Myon;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.526-533
    • /
    • 2009
  • This paper proposes a formant synthesis method of Haegeum sounds using cepstral envelope for spectral modeling. Spectral modeling synthesis (SMS) is a technique that models time-varying spectra as a combination of sinusoids (the "deterministic" part), and a time-varying filtered noise component (the "stochastic" part). SMS is appropriate for synthesizing sounds of string and wind instruments whose harmonics are evenly distributed over whole frequency band. Formants extracted from cepstral envelope are parameterized for synthesis of sinusoids. A resonator by Impulse Invariant Transform (IIT) is applied to synthesize sinusoids and the results are bandpass filtered to adjust magnitude. The noise is calculated by first generating the sinusoids with formant synthesis, subtracting them from the original sound, and then removing some harmonics remained. Linear interpolation is used to model noise. The synthesized sounds are made by summing sinusoids, which are shown to be similar to the original Haegeum sounds.

An Alteration Rule of Formant Transition for Improvement of Korean Demisyllable Based Synthesis by Rule (한국어 반음절단위 규칙합성의 개선을 위한 포만트천이의 변경규칙)

  • Lee, Ki-Young;Choi, Chang-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.98-104
    • /
    • 1996
  • This paper propose the alteraton rule to compensate a formant trasition of several connected vowels for improving an unnatural synthesized continuous speech which is concatenated by each demisyllable without coarticulated formant transition for use in dmisyllable based synthesis by rule. To fullfill each formant transition part, the database of 42 stationary vowels which are segmented from the stable part of each vowels is appended to the one of Korean demisyllables, and the resonance circuit used in formant synthesis is employed to change the formant frequency of speech signals. To evaluate the synthesied speech by this rule, we carried out the alteration rule for connected vowels of the synthesized speech based on demisyllable, and compare spectrogram and MOS tested scores with the original and the demisyllable based synthesized speech without this rule. The result shows that this proposed rule can synthesize the more natural speech.

  • PDF

A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting (음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구)

  • Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.85-95
    • /
    • 2003
  • In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

Spectral Modeling Synthesis of Haegeum using GPU (GPU를 이용한 해금의 스펙트럼 모델링)

  • Islam, Md Shohidul;Islam, Md Rashedul;Farid, Fahmid Al;Kim, Jong-Myon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.5-8
    • /
    • 2014
  • This paper presents a parallel approach of formant synthesis method for haegeum on graphics processing units (GPU) using spectral modeling. Spectral modeling synthesis (SMS) is a technique that models time-varying spectra as a combination of sinusoids and a time-varying filtered noise component. A second-order digital resonator by the impulse-invariant transform (IIT) is applied to generate deterministic components and the results are band-pass filtered to adjust magnitude. The noise is calculated by first generating the sinusoids with formant synthesis, subtracting them from the original sound, and then removing some harmonics remained. The synthesized sounds are consequently by adding sinusoids, which are shown to be similar to the original Haegeum sounds. Furthermore, GPU accelerates the synthesis process enabling- real time music synthesis system development, supporting more sound effect, and multiple musical sound compositions.

  • PDF

A Study on the Pitch Alteration Technique by Subband Scaling in Speech Signal (서브밴드 스케일링에 의한 음성신호의 피치변경법에 관한 연구)

  • Kim, Young-Kyu;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.137-147
    • /
    • 2003
  • Speech synthesis can classify by synthesis way, that is waveform coding, source coding and mixture coding. Specially, waveform coding is suitable for high quality synthesis. However, it is not desirable by synthesis techniques of syllable or phoneme unit because it do not separate and handles excitation and formant part. Therefore, there is a need for pitch alteration method applied in synthesis by the rule in waveform coding. This study propose about pitch alteration method that use spectrum scaling after do to flatten spectra by subband linear approximation to minimize spectrum distortion. This paper show evaluation whether show excellency of some measure compared with LPC, Cepstrum, lifter function and method that propose. estimation method seeks distribution of each flattened signal and measured degree of flattened spectra Signal flattened is normalized, So that highest point amounts to zero, and distribution of signal ,whose average is zero, is calculated. this show result that measure the spectrum distortion rate to estimate performance of method that propose. The average spectrum distortion rate was kept below the average 2.12%, so the method that propose is superiors than existent method.

  • PDF

Perceptual Experiment on Number Production for Speaker Identification

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.7-19
    • /
    • 2001
  • The acoustic parameters of nine Korean numbers were analyzed by Praat, a speech analysis software, and synthesized by SenSynPPC, a Klatt formant synthesizer. The overall intensity, pitch and formant values of the numbers were modified dynamically by a step of 1 dB, 1 Hz and 2.5% respectively. The study explored the sensitivity of listeners to changes in the three acoustic parameters. Twelve subjects (male and female) listened to 390 pairs of synthesized numbers and judged whether the given pair sounded the same or different. Results showed that subjects perceived the same sound quality within the range of 6.6 dB of intensity variation, 10.5 Hz of pitch variation and 5.9% of the first three formant variations. The male and female groups showed almost the same perceptual ranges. Also, an asymmetrical structure of high and low boundary was observed. The ranges may be applicable to the development of a speaker identification system while the method of synthesis modification may apply to its evaluation data.

  • PDF

Discrimination of Synthesized English Vowels by American and Korean Listeners

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.7-27
    • /
    • 2006
  • This study explored the discrimination of synthesized English vowel pairs by twenty-seven American and Korean, male and female listeners. The average formant values of nine monophthongs produced by ten American English male speakers were employed to synthesize the vowels. Then, subjects were instructed explicitly to respond to AX discrimination tasks in which the standard vowel was followed by another one with the increment or decrement of the original formant values. The highest and lowest formant values of the same vowel quality were collected and compared to examine patterns of vowel discrimination. Results showed that the American and Korean groups discriminated the vowel pairs almost identically and their center formant frequency values of the high and low boundary fell almost exactly on those of the standards. In addition, the acceptable range of the same vowel quality was similar among the language and gender groups. The acceptable thresholds of each vowel formed oval to maintain perceptual contrast from adjacent vowels. The results suggested that nonnative speakers with high English proficiency could match native speakers' performance in discriminating vowel pairs with a shorter inter-stimulus interval. Pedagogical implications of those findings are discussed.

  • PDF

On a Pitch Alteration Technique by Cepstrum Analysis of Flattened Excitation Spectrum (평탄화된 여기 스펙트럼에서 켑스트럼 피치 변경법에 관한 연구)

  • 조왕래
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.159-162
    • /
    • 1998
  • Speech synthesis coding is classified into three categories: waveform coding, source coding and hybrid coding. To obtain the synthetic speech with high quality, the synthesis by waveform coding is desired. However, it is difficult to apply waveform coding to synthesis by syllable or phoneme unit, because it does not divide the speech into excitation and formant component. Thus it is required to alter the excitation in waveform coding for applying waveform coding to synthesis by rule. In this paper we propose a new pitch alteration method that minimizes the spectrum distortion by using the behavior of cepstrum. This method splits the spectrum of speech signal into excitation spectrum and formant spectrum and transforms the excitation spectrum into cepstrum domain. The pitch of excitation cepstrum is altered by zero insertion or zero deletion and the pitch altered spectrum is reconstructed in spectrum domain. As a result of performance test, the average spectrum distortion was below 2.29%.

  • PDF