• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.025 seconds

Corpus Based Unrestricted vocabulary Mandarin TTS (코퍼스 기반 무제한 단어 중국어 TTS)

  • Yu Zheng;Ha Ju-Hong;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF

Time-Domain Quantization and Interpolation of Pitch Cycle Waveform

  • Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.1E
    • /
    • pp.11-16
    • /
    • 2008
  • In this paper, a pitch cycle waveform (PCW) is extracted, quantized, and interpolated in a time domain to synthesize high-quality speech at low bit rates. The pre-alignment technique is proposed for the accurate and efficient PCW extraction, which predicts the current PCW position from the previous PCW position assuming that pitch periods evolve slowly. Since the pitch periods are different frame by frame, the original PCW is converted into the fixed-dimension PCW using the dimension-conversion method, and subsequently quantized by code-excited linear predictive (CELP) coding. The excitation signal for the linear predictive coding (LPC) synthesis filter is generated using the time-domain interpolation and interlink of the quantized PCW's. The coder operates at 4.2 kbit/s and 3.2 kbit/s depending on the pitch period. Informal listening test demonstrates the effectiveness of the proposed coding scheme.

The Revised Transform Algorithm from LSF to LPC (LSF에서 LPC 계수를 구하는 개선된 알고리즘)

  • Kim, Hyang-Jin;Lee, Ki-Tae;Ham, Young-Hee;Kim, Hyoung-Jun;Lim, Jae-Yun
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.679-682
    • /
    • 1999
  • This paper proposes the LSF or LSP that is the method of using to transfer the speech parameters after processed the speech to LPC, which is digital coding transferring efficiently, for the best quality and the lowest bit rate of parameters. The new revised transform algorithm between LSF and LPC coefficients is proposed. The proposed algorithm eliminates all multiplications, computes fewer operations, and reduces memory buffer sizes.

  • PDF

Screening of Voice Disorder using Source Parameter Model and Artificial Neural Network (음원 파라미터 모델과 인공신경망을 이용한 음성장애 검출)

  • Chytil, Pavel;Jo, Cheol-Woo;Pavel, Misha
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.89-97
    • /
    • 2008
  • There is a number of clinical conditions that affect directly or indirectly the physical properties of the vocal folds and thereby the pressure waveforms of elicited sounds. If the relationships between the clinical conditions and the voice quality are sufficiently reliable, it should be possible to detect these diseases or disorders. The focus of this paper is to determine the set of features and their values that would characterize the speaker's state of vocal folds. To the extent that these features can capture the anatomical, physiological, and neurological aspects of the speaker they can be potentially used to mediate an unobtrusive approach to diagnosis. We will show a new approach to this problem supported with results obtained from two disordered voice corpora.

  • PDF

Phonation types of Korean fricatives and affricates

  • Lee, Goun
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.51-57
    • /
    • 2017
  • The current study compared the acoustic features of the two phonation types for Korean fricatives (plain: /s/, fortis : /s'/) and the three types for affricates (aspirated : /$ts^h$/, lenis : /ts/, and fortis : /ts'/) in order to determine the phonetic status of the plain fricative /s/. Considering the different manners of articulation between fricatives and affricates, we examined four acoustic parameters (rise time, intensity, fundamental frequency, and Cepstral Peak Prominence (CPP) values) of the 20 Korean native speakers' productions. The results showed that unlike Korean affricates, F0 cannot distinguish two fricatives, and voice quality (CPP values) only distinguishes phonation types of Korean fricatives and affricates by grouping non-fortis sibilants together. Therefore, based on the similarity found in /$ts^h$/ and /ts/ and the idiosyncratic pattern found in /s/, this research concludes that non-fortis fricative /s/ cannot be categorized as belonging to either phonation type.

Transcoding Algorithm for SMV and G.723.1 Vocoders via Direct Parameter Transformation (SMV와 G.723.1 음성부호화기를 위한 파라미터 직접 변환 방식의 상호부호화 알고리듬)

  • 서성호;장달원;이선일;유창동
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2228-2231
    • /
    • 2003
  • In this paper, a transcoding algorithm for the Selectable Mode Vocoder (SMV) and the G.723.1 speech coder via direct parameter transformation is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm converts the parameters of one coder to the Other Without going through the decoding md encoding process. The proposed algorithm is composed of four parts: the parameter decoding, line spectral pair (LSP) conversion, pitch period conversion, excitation conversion and rate selection. The evaluation results show that the proposed algorithm achieves equivalent speech quality to that of tandem transcoding with reduced computational complexity and delay.

  • PDF

Encoding of Speech Spectral Parameters Using Adaptive Vector-Scalar Quantization Methods for Mobile Communication Systems

  • Lee, In-Sung;Kim, Jong-Hark
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.4E
    • /
    • pp.35-40
    • /
    • 1998
  • In this paper, an efficient quantization method of line spectrum pairs(LSP) with cascaded structure of vector quantizer and scalar quantizer is proposed. First, input LSP parameters is vector-quantized using a codebook a with a moderate number of entries. In the second stage of quantization, the components of residual vector are individually quantized by the scalar quantizer. The utilization of ordering property of LSP parameters and the inclusion of interframe prediction improve the quantizer performance and remove the stability check routine after quantization procedure. The new vector-scalar hybrid quantizer using 26 bits/frame shows a transparent quality of speech that an average spectral distortion is 1 dB and the frame proportion with above 2 dB spectral distortion is less than 2%. The performances of proposed quantization method is evaluated in the transmission errors.

  • PDF

Robust Speech Decoding Using Channel-Adaptive Parameter Estimation.

  • Lee, Yun-Keun;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1E
    • /
    • pp.3-6
    • /
    • 1999
  • In digital mobile communication system, the transmission errors affect the quality of output speech seriously. There are many error concealment techniques using a posteriori probability which provides information about any transmitted parameter. They need knowledge about channel transition probability as well as the 1st order Markov transition probability of codec parameters for estimation of transmitted parameters. However, in applications of mobile communication systems, the channel transition probability varies depending on nonstationary channel characteristics. The mismatch of designed channel transition probability of the estimator to actual channel transition probability degrades the performance of the estimator. In this paper, we proposed a new parameter estimator which adapts to the channel characteristics using short time average of maximum a posteriori probabilities(MAPs). The proposed scheme, when applied to the LSP parameter estimation, performed better than the conventional estimator which do not adapt to the channel characteristics.

  • PDF

Fat Injection of Functional Velopharyngeal Insufficiency as the Supportive Treatment (기능성 연구개 인두부전증의 보조 치료로서의 지방 삽입술)

  • Ahn, Cheol-Min;Kim, Yong-Woo
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.18-25
    • /
    • 1998
  • Background: The results of treatment in functional velopharyngeal insufficiency (VPI) was not good compared to physician's common practice. Objectives: Authors conducted this study to evaluate the efficacy of fat injection on posterior pharyngeal wall in the functional velopharyngeal insufficiency as the supportive treatment. Materials and Methods: The preoperative assessment includes history of patients, the perceptual analysis of patient's voice, nasopharyngoscopic finding of velopharyngeal movements, nasometer, movement findings of soft palate during phonation and swalling. Fat which was taken from umbilical area was injected in 5 patients with conducted functional velopharyngeal insufficiency. Results: All 5 patients had good results in voice quality after fat injection. Conclusions: Fat injection is a good treatment method in functional velopharyngeal insufficiency as a supportive method.

  • PDF

Ambiguity Types of the Homonymic & Heterographic Units for Improving Korean Voice Recognition System - a Preliminary Research (한국어 음성인식 시스템 향상을 위한 동음이철 단위의 중의성 유형 분류)

  • Yoon, Ae-Sun;Kang, Mi-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.67-81
    • /
    • 2008
  • The accuracy rate of P2G (Phoneme-to-Grapheme) is one of the important factors determining the quality of unlimited voice recognition (VR) systems. Few studies were, however, conducted to reduce ambiguities of a phoneme string which can be segmented into a variety of different linguistic units (i.e. morphemes, words, eo-jeols), thus be transformed into more than one grapheme string. This paper is a preliminary research for building a large knowledge base of those homonymic & heterographic units(HHUs), which will provide unlimited Korean VR systems with more accurate P2G information. This paper analyzes 2 main factors generating HHUs: (1) boundary determination of the prosodic unit; (2) its segmentation into linguistic units. In this paper, linguistic characteristics determining variable boundaries of a prosodic unit are investigated, and the ambiguity types of HHUs are classified in accordance with their morphological and syntactic structures as well as with the phonological rules governing them.

  • PDF