• Title/Summary/Keyword: PSOLA

Search Result 33, Processing Time 0.022 seconds

Voice quality transform using jitter synthesis (Jitter 합성에 의한 음질변환에 관한 연구)

  • Jo, Cheolwoo
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.121-125
    • /
    • 2018
  • This paper describes procedures of changing and measuring voice quality in terms of jitter. Jitter synthesis method was applied to the TD-PSOLA analysis system of the Praat software. The jitter component is synthesized based on a Gaussian random noise model. The TD-PSOLA re-synthesize process is used to synthesize the modified voice with artificial jitter. Various vocal jitter parameters are used to measure the change in quality caused by artificial systematic jitter change. Synthetic vowels, natural vowels and short sentences are used to check the change in voice quality through the synthesizer model. The results shows that the suggested method is useful for voice quality control in a limited way and can be used to alter the jitter component of voice.

A Study on a Implementation of Gentle Phone's Fuction by using PSOLA Algorithm (PSOLA 알고리즘을 이용한 친절전화기능의 구현에 관한 연구)

  • Jung HyunUk;Kim JongKuk;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.93-96
    • /
    • 2004
  • 본 논문은 전화기의 수화기에서 들리는 상대방의 목소리를 디지털 발성처리기술을 적용하여 억양이 강하지 않고 부드러운 소리(소프트사운드, soft-sound)로 통화하는 방식을 새로이 제안한 것이다. 실시간 친절전화기의 구현에 있어 메모리 점유율을 음성신호의 지속시간을 제어함으로써 효율적인 소프트웨어 및 하드웨어 구현을 위한 방법을 제안한다. 목소리 신호의 특징 추출을 수행하여 발성자의 특성정보는 그대로 유지하면서 발성자의 의미정보를 친절하게 변경하는 것으로서, 발성자의 발성특성에서 지속시간을 조절하여 슬로우-목소리를 구현하거나, 발성 지속시간의 지연을 유성 및 비유성 구간으로 구분하여 처리를 다르게 하는 등의 발성 변환법을 전화기에 구현하여 상대방 목소리가 친절하게 들리도록 하는 친절기능을 부가한 전화기를 구현한다.

  • PDF

Mutiple-Speech Synthesis System according to Various Utterance (다양한 발성에 따른 다중음성 합성 시스템)

  • Park, Hyun-Young;Kim, Myoung;Bae, Myoung-Jin
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2003.11a
    • /
    • pp.151-154
    • /
    • 2003
  • 음성 합성이란 기계적인 장치나 전지회로 또는 컴퓨터 모의를 이용하여 자동으로 음성파형을 생성해 내는 것으로 정의한다. 음성 합성에 대한 연구는 다른 음성에 관련된 기술들보다 가장 먼저 연구된 기술이다. 음성 합성기는 PC의 보급이 확대되고 통신 시장이 컴짐에 따라 그 응용 분야가 점차 확대되어 가고 다양한 방식의 음성 합성 기법에 관한 연구가 이루어지고 있다. 일반적으로 자연스러운 대화를 할 때나 글을 읽을 때의 음성에는 퍼지, 지속시간, 에너지 등의 운율 정보가 포함되어 있다. 따라서, 문장을 합성하는 경우 운율정보를 합성음에 반영하면 보다 명확한 의미 전달과 다양한 발성변환이 가능해 진다. 본 논문에서는 시간영역에서 PSOLA 합성방식에 의한 피치 변경과 지속시간 변경을 이용하여 다양한 발성변환에 따른 다중음성 합성기를 구현하였다.

  • PDF

SWAPPING NATIVE AND NON-NATIVE SPEAKERS' PROSODY USING THE PSOLA ALGORITHM

  • Yoon Kyu-Chul
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.77-81
    • /
    • 2006
  • This paper presents a technique of imposing the prosodic features of a native speaker's utterance onto the same sentence uttered by a non-native speaker. Three acoustic aspects of the prosodic features were considered: the fundamental frequency (F0) contour, segmental durations, and the intensity contour. The fundamental frequency contour and the segmental durations of the native speaker's utterance were imposed on the non-native speaker's utterance by using the PSOLA (pitch-synchronous overlap and add) algorithm [1] implemented in Praat[2]. The intensity contour transfer was also done in Praat. The technique of transferring one or more of these prosodic features was elaborated and its implications in the area of language education were discussed.

  • PDF

A Study on a Analysis and Comparison of Preprocessing Technique for the Speech Compression (음성압축을 위한 전처리기법의 비교 분석에 관한 연구)

  • Jang, Kyung-A;Min, So-Yeon;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.125-136
    • /
    • 2003
  • Speech coding techniques have been studied to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, has used as a one of standard, supports the great sound quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is the different with the conventional vocoder. The different kinds of parameter are used for the preprocessing so this paper is compared with theses parameters for finding the more appropriate parameter for the vocoder. The parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in used the preprocessing step are speaking rate, duration and PSOLA technique.

  • PDF

Glottal Closure Interval Extrapolation Technique Based Pitch modification (성문 닫힘 구간 가변에 의한 피치 변경)

  • 강동규
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.231-234
    • /
    • 1998
  • 시간영역에서 유성음의 피치를 조절하기 위해 한피치구간의 신호 중에서 성문이 닫힌 구간의 특성을 추정한 파라미터로 성문 닫힌 구간의 신호에 연속하여 선형적으로 연장 또는 축소하므로써 고 음질을 유지하면서도 자유롭게 피치를 조절할 수 있는 방법을 제안하였다. 제안된 방법은 PSOLA 기법에서와 같은 window의 적용이나 신호의 겹침에 의한 영향이 최소화되므로 보다 명료한 합성음을 얻을 수 있었다.

  • PDF

The Contribution of Prosody to the Foreign Accent of Chinese Talkers' English Speech

  • Liu, Xing;Lee, Joo-Kyeong
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.59-73
    • /
    • 2012
  • This study attempts to investigate the contribution of prosody to the foreign accent in Chinese speakers' English production by examining the synthesized speech of crossing native and non-native talkers' prosody and segments. For the stimuli of the foreign accent ratings, we transplanted gender-matched native speakers' prosody onto non-native talkers' segments and vice versa, utilizing the TD-PSOLA algorithm. Eight English native listeners participated in judging foreign accent and comprehensibility of the transplanted stimuli. Results showed that the synthesized stimuli were perceived as stronger foreign accent regardless of speakers' proficiency when English speakers' prosody was crossed with Chinese speakers' segments. This suggests that segments contribute more than prosody to native listeners' evaluation of foreign accent. When transplanted with English speakers' segments, Chinese speakers' prosody showed a difference in duration rather than pitch between high and low proficiency such that stronger foreign accent was detected when low proficient Chinese speakers' duration was crossed with English speakers' segments. This indicated that prosody, more specifically duration, plays a role though the prosodic role is not overall as significant as segments. According to the post acoustic analysis, the temporal features contributing to making the duration parameter prominent as opposed to pitch were found out to be speaking rate, pause duration and pause frequency. Finally, foreign accent and comprehensibility showed no significant correlation such that native listeners had no difficulty listening to highly foreign accented speech.

A Study on Real Time Pitch Alteration of Speech Signal (음성신호의 실시간 피치변경에 관한 연구)

  • 김종국;박형빈;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.82-89
    • /
    • 2004
  • This paper describes how to reduce the effect of an occupation threshold by that the transform of mixture components of HMM parameters is controlled in hierarchical tree structure to prevent from over-adaptation. To reduce correlations between data elements and to remove elements with less variance, we employ PCA (principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible, and decline the effect of over-adaptation. When we set lower occupation threshold and increase the number of transformation function, ordinary WLLR adaptation algorithm represents lower recognition rate than SI models, whereas the proposed MLLR adaptation algorithm represents the improvement of over 2% for the word recognition rate as compared to performance of SI models.

Cyber Character Implementation with Recognition and Synthesis of Speech/lmage (음성/영상의 인식 및 합성 기능을 갖는 가상캐릭터 구현)

  • Choe, Gwang-Pyo;Lee, Du-Seong;Hong, Gwang-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.5
    • /
    • pp.54-63
    • /
    • 2000
  • In this paper, we implemented cyber character that can do speech recognition, speech synthesis, Motion tracking and 3D animation. For speech recognition, we used Discrete-HMM algorithm with K-means 128 level vector quantization and MFCC feature vector. For speech synthesis, we used demi-syllables TD-PSOLA algorithm. For PC based Motion tracking, we present Fast Optical Flow like Method. And for animating 3D model, we used vertex interpolation with DirectSD retained mode. Finally, we implemented cyber character integrated above systems, which game calculating by the multiplication table with user and the cyber character always look at user using of Motion tracking system.

  • PDF

Speech synthesis system using Korean prosodic rules (한국어 운율규칙을 이용한 음성합성시스템)

  • 이기영
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.356-359
    • /
    • 1998
  • This paper proposes the speech synthesis method using Korean prosodic rules as an important technique for Korean speech synthesis. The prosodic model for speech synthesis is composed of accentual phrases and intonational phrases which are derived from hierarchical structure of prosody. This prosodic model controls duration time, intonation and pause of synthesized speech. Synthesis units constitute of demi-syllables and VCV-triphones which can make unlimited vocabularies, and TD-PSOLA is used a sthe synthesis method.

  • PDF