• Title/Summary/Keyword: 포만트 합성

Search Result 26, Processing Time 0.023 seconds

A Study on the Phoneme Based Analysis of Korean Initial Plosives Using Statistical Method and Perception Tests (통계적 방법과 인지실험을 통한 한국어 초성파열음의 음소단위 분석에 관한 연구)

  • Jo Cheol-Woo;Lee Woo-Sun;Lee Cyu-Ho;Kim Jong-Ahn;Lim Gwang-Il;Lee Tae-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.5
    • /
    • pp.78-85
    • /
    • 1989
  • This paper describes a statistical methods and perception test for extracting the parameters to be used for the synthesis-by-rule of Korean plosives. Formant synthesizer is chosen for the synthesis of the phonemes. Speech materials for the analysis consists of 72 CV monosyllables from the single male speaker. The analysis is done mainly focused on the variation of parameters in time and frequency domain, then perception tests are executed to estimate the effects of variations of the formant transitions.

  • PDF

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique (새로운 스펙트럼 완만화에 의한 합성 음질 개선)

  • 장효종;최형일
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1037-1043
    • /
    • 2003
  • This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.

Spectral Modeling of Haegeum Using Cepstral Analysis (캡스트럼 분석을 이용한 해금의 스펙트럼 모델링)

  • Hong, Yeon-Woo;Kang, Myeong-Su;Cho, Sang-Jin;Kim, Jong-Myon;Lee, Jung-Chul;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.243-250
    • /
    • 2010
  • This paper proposes a spectral modeling of Korean traditional instrument, Haegeum, using cepstral analysis to naturally describe Haegeum sounds varying with time. To get a precise result of cepstral analysis, we set the frame size to 3 periods of input signal and more cepstral coefficients are used to extract formants. The performance is enhanced by flexibly controlling the cutoff frequency of bandpass filter depending on the resonances in the synthesis process of sinusoidal components and the deleting peaks remained in the residual signal. To detect the change of pitch, we divide the input frames into silence, attack, and sustain region and determine which region the current frame is involved in. Then, the proposed method readjusts the frame size according to the fundamental frequency in the case of the current frame is in attack region and corrects the extraction errors of the fundamental frequency for the frames in sustain region. With these processes, the synthesized sounds are much more similar to the originals. The evaluation result through the listening test by a Haegeum player says that the synthesized sounds are almost similar to originals (96~100 % similar to the original sounds).

A Study on the Pitch Extraction Improvement Using LSP for the Synthesis of High Speech Quality (고음질 음성합성을 위한 LSP를 이용한 피치검출 성능향상에 관한 연구)

  • Seo, Ji-Ho;Kim, Jong-Kuk;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.69-75
    • /
    • 2010
  • In this paper, the pitch is detected after the elimination of formant ingredients by flattening the spectrum in frequency domain. In order to remove impact of formant and transition frequency in the signal spectrum, formant envelop is made by linear interpolation with any points each sub-band and the spectrum of speech signal is compensated by the reverse of the envelop interpolated linearly after we divide frequency band into several segment based on LSP and detect the points. The experimental result showed the proposed method appeared an outstanding performance in compared with LPC, Cepstrum, Lifter methods. The method reduced the gross error rate 1.30% than the LPC method which appeared a good performance except the proposed method. Also, the proposed method showed low error rate in noise environment.

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing (다이폰 군집화와 개선된 스펙트럼 완만화에 의한 음성합성)

  • Jang, Hyo-Jong;Kim, Kwan-Jung;Kim, Gye-Young;Choi, Hyung-Il
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.665-672
    • /
    • 2003
  • This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human's acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human's acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.

A Study on the Technique of Spectrum Flattening for Improved Pitch Detection (개선된 피치검출을 위한 스펙트럼 평탄화 기법에 관한 연구)

  • 강은영;배명진;민소연
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.310-314
    • /
    • 2002
  • The exact pitch (fundamental frequency) extraction is important in speech signal processing like speech recognition, speech analysis and synthesis. However the exact pitch extraction from speech signal is very difficult due to the effect of formant and transitional amplitude. So in this paper, the pitch is detected after the elimination of formant ingredients by flattening the spectrum in frequency region. The effect of the transition and change of phoneme is low in frequency region. In this paper we proposed the new flattening method of log spectrum and the performance was compared with LPC method and Cepstrum method. The results show the proposed method is better than conventional method.

A Study on the Pulse-Train Code Excited Linear Prediction Coder: PT-CELP (Pulse-Train code 여기 선형 예측 (PT-CELP) 부호화기에 관한 연구)

  • 김흥국
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.246-249
    • /
    • 1995
  • 4.16kbps의 전송률을 갖는 음성 부호화기 구조에 관하여 기술한다. 제안된 음성 부호화기는 개방 회로 피치 검출기와 이로부터 생성된 pulse train을 코드북으로 갖는 CELP 부호화기이다. Pulse-Train codebook은 분석 프레임별로 부호화 및 복호화 양단에서 생성되며 음성의 피치 및 포만트 정보를 내포하고 있다. 구현된 PT-CELP는 random codebook 방식의 CELP에 비해 적은 크기로 codebook을 만들 수 있으며 음성의 특징을 충분히 반영하므로 합성된 음성의 음질을 향상시킬 수 있다.

  • PDF

Branch Algorithm for Phoneme Segmentation in Korean Speech Recognition System (한국어 음성인식 시스템에서 음소 경계 검출을 위한 Branch 알고리즘)

  • 서영완;한승진;장흥종;이정현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.357-359
    • /
    • 2000
  • 음소 단위로 구축된 음성 데이터는 음성인식, 합성 및 분석 등의 분야에서 매우 중요하다. 일반적으로 음소는 유성음과 무성음으로 구분되어 진다. 이러한 유성음과 무성음은 많은 특징적 차이가 있지만, 기존의 음소 경계추출 알고리즘은 이를 고려하지 않고 시간 축을 기준으로 이전 프레임과 매개변수 (스펙트럼) 비교만을 통하여 음소의 경계를 결정한다. 본 논문에서는 음소 경계 추출을 위하여 유성음과 무성음의 특징적 차이를 고려한 블록기반의 Branch 알고리즘을 설계하였다. Branch 알고리즘을 사용하기 위한 스펙트럼 비교 방법은 MFCC(Mel-Frequency Cepstrum Coefficient)를 기반으로 한 거리 측정법을 사용하였고, 유성음과 무성음의 구분은 포만트 주파수를 이용하였다. 실험 결과 3~4음절 고립단어를 대상으로 약 78%의 정확도를 얻을수 있었다.

  • PDF

A Study on the Text-to-Speech Conversion Using the Formant Synthesis Method (포만트 합성방식을 이용한 문자-음성 변환에 관한 연구)

  • Choi, Jin-San;Kim, Yin-Nyun;See, Jeong-Wook;Bae, Geun-Sune
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.9-23
    • /
    • 1997
  • Through iterative analysis and synthesis experiments on Korean monosyllables, the Korean text-to-speech system was implemented using the phoneme-based formant synthesis method. Since the formants of initial and final consonants in this system showed many variations depending on the medial vowels, the database for each phoneme was made up of formants depending on the medial vowels as well as duration information of transition region. These techniques were needed to improve the intelligibility of synthetic speech. This paper investigates also methods of concatenating the synthesis units to improve the quality of synthetic speech.

  • PDF

Implementation of Formant Speech Analysis/Synthesis System (포만트 분석/합성 시스템 구현)

  • Lee, Joon-Woo;Son, Ill-Kwon;Bae, Keuo-Sung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.295-314
    • /
    • 1997
  • In this study, we will implement a flexible formant analysis and synthesis system. In the analysis part, the two-channel (i.e., speech & EGG signals) approach is investigated for accurate estimation of formant information. The EGG signal is used for extracting exact pitch information that is needed for the pitch synchronous LPC analysis and closed phase LPC analysis. In the synthesis part, Klatt formant synthesizer is modified so that the user can change synthesis parameters arbitarily. Experimental results demonstrate the superiority of the two-channel analysis method over the one-channel(speech signal only) method in analysis as well as in synthesis. The implemented system is expected to be very helpful for studing the effects of synthesis parameters on the quality of synthetic speech and for the development of Korean text-to-speech(TTS) system with the formant synthesis method.

  • PDF