• Title/Summary/Keyword: speech waveform

Search Result 135, Processing Time 0.021 seconds

The Smoothing Method of the Concatenation Parts in Speech Waveform by using the Forward/Backward LPC Technique (전, 후방향 LPC법에 의한 음성 파형분절의 연결부분 스므딩법)

  • 이미숙
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.15-20
    • /
    • 1991
  • In a text-to-speech system, sound units (e. q., phonemes, words, or phrases) can be concatenated together to produce required utterance. The quality of the resulting speech is dependent on factors including the phonological/prosodic contour, the quality of basic concatenation units, and how well the units join together. Thus although the quality of each basic sound unit is high, if occur the discontinuity in the concatenation part then the quality of synthesis speech is decrease. To solve this problem, a smoothing operation should be carried out in concatenation parts. But a major problem is that, as yet, no method of parameter smoothing is availalbe for joining the segment together.

  • PDF

A Study on Implementation of Emotional Speech Synthesis System using Variable Prosody Model (가변 운율 모델링을 이용한 고음질 감정 음성합성기 구현에 관한 연구)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.8
    • /
    • pp.3992-3998
    • /
    • 2013
  • This paper is related to the method of adding a emotional speech corpus to a high-quality large corpus based speech synthesizer, and generating various synthesized speech. We made the emotional speech corpus as a form which can be used in waveform concatenated speech synthesizer, and have implemented the speech synthesizer that can be generated various synthesized speech through the same synthetic unit selection process of normal speech synthesizer. We used a markup language for emotional input text. Emotional speech is generated when the input text is matched as much as the length of intonation phrase in emotional speech corpus, but in the other case normal speech is generated. The BIs(Break Index) of emotional speech is more irregular than normal speech. Therefore, it becomes difficult to use the BIs generated in a synthesizer as it is. In order to solve this problem we applied the Variable Break[3] modeling. We used the Japanese speech synthesizer for experiment. As a result we obtained the natural emotional synthesized speech using the break prediction module for normal speech synthesize.

A Study on Approximation-Synthesis of Transition Segment in Speech Signal (음성신호에서 천이구간의 근사합성에 관한 연구)

  • Lee See-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.3
    • /
    • pp.167-173
    • /
    • 2005
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and unvoiced consonants in a frame. So, I propose TSIUVC(Transition Segment Including Unvoiced Consonant) extraction method by using pitch pulses and Zero Crossing Rate in order to unexistent with a voiced and unvoiced consonants in a frame. And this paper present a TSIUVC approximate-synthesis method by using frequency band division. As a result, this method obtains a high quality approximation-synthesis waveform within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. And the TSIUVC extraction rate was $91\%$ for female voice and $96.2\%$ for male voice respectively This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis, and speech synthesis.

  • PDF

Hoarse Speech Analysis Using Dissymmetric Four-Mass Model of Vocal Cords (비대칭 4 질량 성대 모델에 의한 쉰목소리 분석)

  • Jiang, Gan-Yi;Chen, Hui-Fang;Choi, Tae-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.94-101
    • /
    • 1995
  • In this paper, a new vocal cords model, called a four-mass model, is proposed for a hoarse speech mechanism. Pathological changes of vocal cords cause hoarse speech and glottal waveform reflects motion states of vocal cords. From these facts, we assumed that the morbid vocal cords be dissymmetric and take the four-mass type. The glottal waveforms and the model parameters of normal and hoarse speech signals are analyzed, and some relations bet ween the model parameters and the hoarse pathology are discussed. Experimental results show that the new research method of hoarse speech can reveal relations between the acoustic features of hoarse speech and the hoarse pathology, and be used to diagnose laryngeal diseases and to improve tone quality of hoarse speech.

  • PDF

Speech Recognition of the Korean Vowel 'ㅜ' Based on Time Domain Bulk Indicators (시간 영역 벌크 지표에 기반한 한국어 모음 'ㅜ'의 음성 인식)

  • Lee, Jae Won
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.11
    • /
    • pp.591-600
    • /
    • 2016
  • Computing technologies are increasingly applied to most casual human environment networks, as computing technologies are further developed. In addition, the rapidly increasing interest in IoT has led to the wide acceptance of speech recognition as a means of HCI. In this study, we present a novel method for recognizing the Korean vowel 'ㅜ', as a part of a phoneme based Korean speech recognition system. The proposed method involves analyses of bulk indicators calculated in the time domain instead of analysis in the frequency domain, with consequent reduction in the computational cost. Four elementary algorithms for detecting typical waveform patterns of 'ㅜ' using bulk indicators are presented and combined to make final decisions. The experimental results show that the proposed method can achieve 90.1% recognition accuracy, and recognition speed of 0.68 msec per syllable.

A New Speech Waveform Coding Based on the Nonuniform Sampling Method with Separated to High-Low Band (대역분리-비균일표본화 방법을 이용한 새로운 음성신호의 파형부호화 연구)

  • Bae, Myung-Jin;Lee, Joo-Hun;Im, Sung-Bin;Lee, Won-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.89-93
    • /
    • 1995
  • To reduce the redundancy within samples that resulted from uniform sampling method, nonuniform sampling or nonredundant-sample coding methods can be considered. However, it is well known that when conventional nonuniform sampling methods are applied directly to speech signal, the required amount of data is comparable to or mure than that by uniform sampling method like PCM. To overcome this problem, a new nonuniform sampling method is proposed, in which nonuniform sampling is applied to the low-pass filtered speech signal and higher band is compensated by 8 colored Gaussian random noise with various noise levels. By this method, speech signal waveform can be encoded by 1.8 times larger compression ratio than the conventional nonuniform sampling method.

  • PDF

A Real-time Performance Evaluation System for Speech Waveform Coders (음성파형 부호화기의 실시간 성능측정 시스템)

  • 김용철;은종관
    • The Journal of the Acoustical Society of Korea
    • /
    • v.3 no.1
    • /
    • pp.43-54
    • /
    • 1984
  • 본 논문에서는 음성파형 부호화기의 성능을 실시간 측정하기 위한 시스템의 구현에 관하여 연구 하였다. 본 장비는 "bit slice" 마이크로프로세서로 설계되었다. 개발된 시스템으로 세 개의 codec의 성능 을 측정하였으며 이 결과를 distortion analyzer로 측정한 결과와 비교하였다. 개발된 장비는 음성 부호 화기의 성능시험을 위한 주관적 청취시험 과정을 피할 수 있게 되었다.

  • PDF

Cancelation of Baseline Wandering of Electroglottograph Signal using Empirical Mode Decomposition (경험적 모드 재구성 방법을 이용한 성문파형 신호의 기계선 변동 제거)

  • Jang, Seung-Jin;Kim, Hyo-Min;Park, Young-Cheol;Choi, Hong-Shik;Yoon, Young-Ro
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.475-476
    • /
    • 2007
  • Electroglottography (EGG) is a technique used to register laryngeal behavior indirectly by a measuring the change in electrical impedance across the throat during speaking. However, EGG waveform is affected by laryngeal muscles which fluctuate the vocal cords, and which result in baseline wander. It is required to reduce baseline wander in EGG waveform, because EGG waveform is used for input signal of nonlinear speech synthesizer in next chapter. In vocal cords, the abduction-adduction of glottis is mainly controlled by the posterior cricoarytenoid (abductor) and interarytenoid (adductor) muscles respectively. Empirical Mode Decomposition method was adopted in cancellation of EGG waveform baseline wandering, and showd better performance than that of high pass filter with 500 order.

  • PDF

Glottal Characteristics of Word-initial Vowels in the Prosodic Boundary: Acoustic Correlates (운율경계에 위치한 어두 모음의 성문 특성: 음향적 상관성을 중심으로)

  • Sohn, Hyang-Sook
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.47-63
    • /
    • 2010
  • This study provides a description of the glottal characteristics of the word-initial low vowels /a, $\ae$/ in terms of a set of acoustic parameters and discusses glottal configuration as their acoustic correlates. Furthermore, it examines the effect of prosodic boundary on the glottal properties of the vowels, seeking an account of the possible role of prosodic structure based on prosodic theory. Acoustic parameters reported to indicate glottal characteristics were obtained from the measurements made directly from the speech spectrum on recordings of Korean and English collected from 45 speakers. They consist of two separate groups of native Korean and native English speakers, each including both male and female speakers. Based on the three acoustic parameters of open quotient (OQ), first-formant bandwidth (B1), and spectral tilt (ST), comparisons were made between the speech of males and females, between the speech of native Korean and native English speakers, and between Korean and English produced by native Korean speakers. Acoustic analysis of the experimental data indicates that some or all glottal parameters play a crucial role in differentiating the speech groups, despite substantial interspeaker variations. Statistical analysis of the Korean data indicates prosodic strengthening with respect to the acoustic parameters B1 and OQ, suggesting acoustic enhancement in terms of the degree of glottal abduction and the glottal closure during a vibratory cycle.

  • PDF

A Study on PCFBD-MPC in 8kbps (8kbps에 있어서 PCFBD-MPC에 관한 연구)

  • Lee, See-woo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.5
    • /
    • pp.17-22
    • /
    • 2017
  • In a MPC coding using excitation source of voiced and unvoiced, it would be a distortion of speech waveform. This is caused by normalization of synthesis speech waveform of voiced in the process of restoration the multi-pulses of representation section. This paper present PCFBD-MPC( Position Compensation Frequency Band Division-Multi Pulse Coding ) used V/UV/S( Voiced / Unvoiced / Silence ) switching, position compensation in a multi-pulses each pitch interval and Unvoiced approximate-synthesis by using specific frequency in order to reduce distortion of synthesis waveform. Also, I was implemented that the PCFBD-MPC( Position Compensation Frequency Band Division-Multi Pulse Coding ) system and evaluate the SNRseg of PCFBD-MPC in coding condition of 8kbps. As a result, SNRseg of PCFBD-MPC was 13.4dB for female voice and 13.8dB for male voice respectively. In the future, I will study the evaluation of the sound quality of 8kbps speech coding method that simultaneously compensation the amplitude and position of multi-pulse source. These methods are expected to be applied to a method of speech coding using sound source in a low bit rate such as a cellular phone or a smart phone.