• Title/Summary/Keyword: Speech spectrum

Search Result 307, Processing Time 0.026 seconds

A Study on the Synthesis of Korean Speech by Formant VOCODER (포르만트 VOCODER에 의한 한국어 음성합성에 관한 연구)

  • 허강인;이대영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.14 no.6
    • /
    • pp.699-712
    • /
    • 1989
  • This paper describes a method of Korean speech synhes is using format VOCODER. The parameters of speech synthes is are a follows, 1) format F1, F2, and F3 by spectrum moment method and F4, F5 using the length of vocal tract. 2) pitch frequencies obtained by optimu, Comb method using AMDF. 3) short time average energy and short time mean amplitude. 4) The decision method of bandwidth reportd by Fant. 5) voicde/unvoiced discrimination using zerocrossing. 6) excitation wave reported by Rosenberg. 7) gaussian white noise. Synthesis results are in fairly good agreement with original speech.

  • PDF

Enhanced Maximum Voiced Frequency Estimation Scheme for HTS Using Two-Band Excitation Model

  • Park, Jihoon;Hahn, Minsoo
    • ETRI Journal
    • /
    • v.37 no.6
    • /
    • pp.1211-1219
    • /
    • 2015
  • In a hidden Markov model-based speech synthesis system using a two-band excitation model, a maximum voiced frequency (MVF) is the most important feature as an excitation parameter because the synthetic speech quality depends on the MVF. This paper proposes an enhanced MVF estimation scheme based on a peak picking method. In the proposed scheme, both local peaks and peak lobes are picked from the spectrum of a linear predictive residual signal. The average of the normalized distances of local peaks and peak lobes is calculated and utilized as a feature to estimate an MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves the synthetic speech quality compared with that of a conventional one in a mobile device as well as a PC environment.

Long Term Average Spectrum Characteristics of Speaking Voice of Western Operatic Singers (Long Term Average Spectrum을 이용한 성악가들의 Speaking Voice 분석)

  • Lee, Kyung-Chul;Hong, Seok-Jin;Jin, Sung-Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.2
    • /
    • pp.122-127
    • /
    • 2004
  • Background and Objectives : Many studies have described and analyzed singer's formant and it has been shown that the epilaryngeal tube in the human airway is responsible for vocal ring, or the singer's formant. A similar phenomenon produced by trained singers in their speech led some authors to examine the speaker's ring. This study was designed to analyze the speaking voice of the singers and speaker's ring. Baterials and Methods : Ten tenors, fifteen baritones, fifteen sopranos and ten mezzo sopranos attending the music college, department of vocal music were chosen for this study. Fifteen male and fifteen female untrained normal speakers were chosen for control group. Each subject was asked to produce a sample of a sustained spoken vowel /ah/ sound for at least five seconds and read sentence 'Kaeul'. The sound data was analyzed using the Fast Fourier Transform(FFT) - based power spectrum, Long term average(LTA) power spectrum using the FFT algorithm of the Computerized Speech Lab(CSL, Kay elemetrics, Model 4300B, USA). Statistical analysis was performed using the Mann-Whitney test of the Statistical Package for Social Sciences(SPSS). Results : For LTA Power spectrum of/ah/ sound, a significant increase was seen in the 2,500-3,500Hz region(p<0.01) in four trained singer group compared with untrained speaker group, and a significant increase in the 9,000-10,000Hz region(p<0.01) in soparano group. Similarly, in sentence 'Kaeul', there was a significant increase in energy in the tenor, baritone, mezzo soprano group compared with the untrained speaker group in the 2,500-3,500Hz region(p<0.01), and a significant increase in all frequency region(p<0.01) in the soprano group. Conclusions : The LTA power spectrum suggests that trained singers group show more energy concentration in the 'singer's formant' region in the speaking voice, and authors believe this region to be the 'speaker's ring'. Further research is needed on the effect of singing training on the resonance of the speaking voice.

  • PDF

A Study on the Pitch Extraction Improvement Using LSP for the Synthesis of High Speech Quality (고음질 음성합성을 위한 LSP를 이용한 피치검출 성능향상에 관한 연구)

  • Seo, Ji-Ho;Kim, Jong-Kuk;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.69-75
    • /
    • 2010
  • In this paper, the pitch is detected after the elimination of formant ingredients by flattening the spectrum in frequency domain. In order to remove impact of formant and transition frequency in the signal spectrum, formant envelop is made by linear interpolation with any points each sub-band and the spectrum of speech signal is compensated by the reverse of the envelop interpolated linearly after we divide frequency band into several segment based on LSP and detect the points. The experimental result showed the proposed method appeared an outstanding performance in compared with LPC, Cepstrum, Lifter methods. The method reduced the gross error rate 1.30% than the LPC method which appeared a good performance except the proposed method. Also, the proposed method showed low error rate in noise environment.

On a Pitch Alteration Method by Time-axis Scaling Compensated with the Spectrum for High Quality Speech Synthesis (고음질 합성용 스펙트럼 보상된 시간축조절 피치 변경법)

  • Bae, Myung-Jin;Lee, Won-Cheol;Im, Sung-Bin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.89-95
    • /
    • 1995
  • The waveform coding technique has concerned with simply preserving the waveform shape of speech signal through a redundancy reduction process. In the case of speech synthesis, the waveform coding with high sound quality is mainly used to the synthesis by analysis. However, since the parameters of this coding are not classified into either excitation or vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In order to apply the waveform coding to the synthesis by rule, the pitch alteration technique is required in prosody control. In this paper, we propose a new pitch alteration method that can change the pitch period in waveform coding by scaling the time-axis and compensating the spectrum. This is relevant to the time-frequency domain method were the phase components of the waveform is preserved with a little spectrum distortion of 2.5 % and less for 50% pitch change.

  • PDF

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing (다이폰 군집화와 개선된 스펙트럼 완만화에 의한 음성합성)

  • Jang, Hyo-Jong;Kim, Kwan-Jung;Kim, Gye-Young;Choi, Hyung-Il
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.665-672
    • /
    • 2003
  • This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human's acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human's acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.

Performance Improvement of CPSP Based TDOA Estimation Using the Preemphasis (프리엠퍼시스를 이용한 CPSP 기반의 도달시간차이 추정 성능 개선)

  • Kwon, Hong-Seok;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.461-470
    • /
    • 2009
  • We investigate and analyze the problems encountered in frame-based estimation of TDOA (Time Difference of Arrival) using CPSP function. Spectral leakage occurring in framing of a speech signal by a rectangular window could make estimation of CPSP spectrum inaccurate. Framing with other windows to reduce the spectral leakage distorts the signal due to the asynchronous weighting around the frame specifically both ends of the frame. These problems degrade the performance of the CPSP-based TDOA estimation. In this paper, we propose a method to alleviate those problems by pre-emphasis of the speech signal. It reduces the influence of the spectral leakage by reducing dynamic range of the spectrum of a speech signal with pre-emphasis. To validate the proposed method of pre-emphasis, we carry out TDOA estimation experiments in various noise and reverberation conditions, Experimental results have shown that the framing of pre-emphasized microphone output by a rectangular window achieves higher success rate of TDOA estimation than any other framing methods.

A Study on Estimation of Formants and Articulatory Motion Trajectories using RLSL Adaptive Linear Prediction Filter (RLSL 적응선형예측필터를 이용한 형성음 및 조음운동궤적 추정에 관한 연구)

  • 김동준;송영수
    • Journal of Biomedical Engineering Research
    • /
    • v.14 no.1
    • /
    • pp.1-8
    • /
    • 1993
  • In this study, the extractions of formants and articulatory motion trajectories for Korean complex vowels are performed by using the RLSL adaptive linear prediction filter. This enables us to extract accurate spectrum in transition of speech signal. This study shows that the RLSL algorithm is superior to the Levinson algorithm, specially in transition part of speech.

  • PDF

Pitch Extraction of Speech Signals by the Harmonics analysis (고조파 분석에 의한 음성신호의 피치 검출)

  • Kim, Kee-Hee;Choi, Jung-Ah;Bae, Myung-Jin;Ann, Sou-Guil
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1610-1614
    • /
    • 1987
  • The harmonies of the fundamental frequency in speech signal make a minute line spectrum in frequency domain. In this paper, we propose a new algorithm to detect a pitch interval in voiced sound based on the fact that the number of harmonies can represent the period of the pitch in the time domain.

  • PDF

A New Formulation of Multichannel Blind Deconvolution: Its Properties and Modifications for Speech Separation

  • Nam, Seung-Hyon;Jee, In-Nho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.4E
    • /
    • pp.148-153
    • /
    • 2006
  • A new normalized MBD algorithm is presented for nonstationary convolutive mixtures and its properties/modifications are discussed in details. The proposed algorithm normalizes the signal spectrum in the frequency domain to provide faster stable convergence and improved separation without whitening effect. Modifications such as nonholonomic constraints and off-diagonal learning to the proposed algorithm are also discussed. Simulation results using a real-world recording confirm superior performanceof the proposed algorithm and its usefulness in real world applications.