• 제목/요약/키워드: speech resynthesis

검색결과 4건 처리시간 0.02초

Harmonic Structure Features for Robust Speaker Diarization

  • Zhou, Yu;Suo, Hongbin;Li, Junfeng;Yan, Yonghong
    • ETRI Journal
    • /
    • 제34권4호
    • /
    • pp.583-590
    • /
    • 2012
  • In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech.

Jitter 합성에 의한 음질변환에 관한 연구 (Voice quality transform using jitter synthesis)

  • 조철우
    • 말소리와 음성과학
    • /
    • 제10권4호
    • /
    • pp.121-125
    • /
    • 2018
  • This paper describes procedures of changing and measuring voice quality in terms of jitter. Jitter synthesis method was applied to the TD-PSOLA analysis system of the Praat software. The jitter component is synthesized based on a Gaussian random noise model. The TD-PSOLA re-synthesize process is used to synthesize the modified voice with artificial jitter. Various vocal jitter parameters are used to measure the change in quality caused by artificial systematic jitter change. Synthetic vowels, natural vowels and short sentences are used to check the change in voice quality through the synthesizer model. The results shows that the suggested method is useful for voice quality control in a limited way and can be used to alter the jitter component of voice.

시간적 분해에 기반한 F0 궤적 모델에 관한 연구 (F0 Contour Model based on Temporal Decomposition)

  • 변효진;김연준;오영환
    • 한국음향학회지
    • /
    • 제18권8호
    • /
    • pp.75-83
    • /
    • 1999
  • 본 논문에서는 음성합성의 억양 제어를 위한 새로운 F0 궤적 모델을 제안한다. 제안한 모델은 발성된 문장의 F0 궤적을 중첩가산되는 사건들로 분해하고, 각 사건들을 가우시안 종모양의 사건함수로 모델링한다. 그리고 제안한 모델을 위한 파라미터 추정 알고리즘을 제시한다. 제안한 모델은 특정한 음운론적 지식에 기반하지 않았으며, F0 궤적의 분석단계와 합성단계에 모두 사용 가능하다. 제안한 모델의 성능평가를 위해 다양한 장르에서 추출한 여러 형태의 500문장의 코퍼스를 구축하고, 이를 전문 아나운서에게 발성하게 하여 구축한 음성코퍼스로 실험한 결과, 원음성의 F0 궤적과 제안한 모델에 의해 합성된 F0 궤적의 평균 제곱 오류근이 7.87Hz이었다.

  • PDF

/오/-/우/ 합성모음 연속체에 대한 중국인 한국어 학습자의 청지각적 경계 (Perceptual Boundary on a Synthesized Korean Vowel /o/-/u/ Continuum by Chinese Learners of Korean Language)

  • 윤지현;김은경;성철재
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.111-121
    • /
    • 2015
  • The present study examines the auditory boundary between Korean /o/ and /u/ on a synthesized vowel continuum by Chinese learners of Korean language. Preceding researches reported that the Chinese learners have difficulty pronouncing Korean monophthongs /o/ and /u/. In this experiment, a nine-step continuum was resynthesized using Praat from a vowel token from a recording of a male announcer who produced it in isolated form. F1 and F2 were synchronously shifted in equal steps in qtone (quarter tone), while F3 and F4 values were held constant for the entire stimuli. A forced choice identification task was performed by the advanced learners who speak Mandarin Chinese as their native language. Their experiment data were compared to a Korean native group. ROC (Receiver Operating Characteristic) analysis and logistic regression were performed to estimate the perceptual boundary. The result indicated the learner group has a different auditory criterion on the continuum from the Korean native group. This suggests that more importance should be placed on hearing and listening training in order to acquire the phoneme categories of the two vowels.