• Title/Summary/Keyword: candidate's Speech

Search Result 16, Processing Time 0.023 seconds

A DB Pruning Method in a Large Corpus-Based TTS with Multiple Candidate Speech Segments (대용량 복수후보 TTS 방식에서 합성용 DB의 감량 방법)

  • Lee, Jung-Chul;Kang, Tae-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.572-577
    • /
    • 2009
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. To prune the redundant speech segments in a large speech segment DB, we can utilize a decision-tree based triphone clustering algorithm widely used in speech recognition area. But, the conventional methods have problems in representing the acoustic transitional characteristics of the phones and in applying context questions with hierarchic priority. In this paper, we propose a new clustering algorithm to downsize the speech DB. Firstly, three 13th order MFCC vectors from first, medial, and final frame of a phone are combined into a 39 dimensional vector to represent the transitional characteristics of a phone. And then the hierarchically grouped three question sets are used to construct the triphone trees. For the performance test, we used DTW algorithm to calculate the acoustic similarity between the target triphone and the triphone from the tree search result. Experimental results show that the proposed method can reduce the size of speech DB by 23% and select better phones with higher acoustic similarity. Therefore the proposed method can be applied to make a small sized TTS.

Optimal Feature Parameters Extraction for Speech Recognition of Ship's Wheel Orders (조타명령의 음성인식을 위한 최적 특징파라미터 검출에 관한 연구)

  • Moon, Serng-Bae;Chae, Yang-Bum;Jun, Seung-Hwan
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.13 no.2 s.29
    • /
    • pp.161-167
    • /
    • 2007
  • The goal of this paper is to develop the speech recognition system which can control the ship's auto pilot. The feature parameters predicting the speaker's intention was extracted from the sample wheel orders written in SMCP(IMO Standard Marine Communication Phrases). And we designed the post-recognition procedure based on the parameters which could make a final decision from the list of candidate words. To evaluate the effectiveness of these parameters and the procedure, the basic experiment was conducted with total 525 wheel orders. From the experimental results, the proposed pattern recognition procedure has enhanced about 42.3% over the pre-recognition procedure.

  • PDF

A Pre-Selection of Candidate Units Using Accentual Characteristic In a Unit Selection Based Japanese TTS System (일본어 악센트 특징을 이용한 합성단위 선택 기반 일본어 TTS의 후보 합성단위의 사전선택 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Kwang-Hyoung;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.159-165
    • /
    • 2007
  • In this paper, we propose a new pre-selection of candidate units that is suitable for the unit selection based Japanese TTS system. General pre-selection method performed by calculating a context-dependent cost within IP (Intonation Phrase). Different from other languages, however. Japanese has an accent represented as the height of a relative pitch, and several words form a single accentual phrase. Also. the prosody in Japanese changes in accentual phrase units. By reflecting such prosodic change in pre-selection. the qualify of synthesized speech can be improved. Furthermore, by calculating a context-dependent cost within accentual phrase, synthesis speed can be improved than calculating within intonation phrase. The proposed method defines AP. analyzes AP in context and performs pre-selection using accentual phrase matching which calculates CCL (connected context length) of the Phoneme's candidates that should be synthesized in each accentual phrase. The baseline system used in the proposed method is VoiceText, which is a synthesizer of Voiceware. Evaluations were made on perceptual error (intonation error, concatenation mismatch error) and synthesis time. Experimental result showed that the proposed method improved the qualify of synthesized speech. as well as shortened the synthesis time.

Adaptive Wavelet Denoising For Speech Rocognition in Car Interior Noise

  • Kim, E. Jae;Yang, Sung-Il;Kwon, Y.;Jarng, Soon S.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4E
    • /
    • pp.178-182
    • /
    • 2002
  • In this paper, we propose an adaptive wavelet method for car interior noise cancellation. For this purpose, we use a node dependent threshold which minimizes the Bayesian risk. We propose a noise estimation method based on spectral entropy using histogram of intensity and a candidate best basis instead of Donoho's best bases. And we modify the hard threshold function. Experimental results show that the proposed algorithm is more efficient, especially to heavy noisy signal than conventional one.

An Acoustical Study of Korean 's' (국어 'ㅅ' 음가에 대한 음향학적 연구)

  • Mun Seung-Jae
    • MALSORI
    • /
    • no.33_34
    • /
    • pp.11-22
    • /
    • 1997
  • The degrees of aspiration in Korean [ㅅ] and [ㅆ] were measured in terms of VOT. The measurements were compared to the aspiration in Korean stops and affricates. It was shown that [ㅅ] should be classified as an 'aspirated' sound with Korean aspirated stops and affricates [$p^h, {\;}t^h, {\;}k^h, {\;}t{\int}$], contrary to the traditional classification of the sound as unaspirated. [ㅆ] was confirmed to be in the same group as other Korean 'tense' sounds. It was pointed out that there was a gap in the typology of Korean consonants. The gap was created by the lack of the unaspirated counterpart of [ㅅ]. It was suggested that an extinct Korean sound [$\triangle$] be considered as a possible candidate for the gap. Also a perception test was suggested for the further acoustical analysis of Korean [ㅅ] and [ㅆ].

  • PDF

On a Study of the Improvement of Speaker Recognition with Characteristics of High Order Reflection Coefficients (고차 반사계수 특성을 이용한 화자인식의 성능 향상에 관한 연구)

  • 이윤주;오세영;함명규;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.667-670
    • /
    • 1999
  • As the number of reference patterns increase in the text dependant speaker recognition, the recognition performance of the system degrades. So, if reference patterns were decreased the high recognition rate can be obtained. It’s because the speaker recognition can obtain the high discrimination. In this paper, to decrease the number of reference patterns, we choose candidate reference patterns to perform pattern matching with test pattern by high order component of the reflection coefficients of the uttered speech signal Consequently the total recognition rate of the proposed method is about 2% higher than that of the conventional method.

  • PDF