• 제목/요약/키워드: resynthesized speech

검색결과 6건 처리시간 0.018초

Harmonic Structure Features for Robust Speaker Diarization

  • Zhou, Yu;Suo, Hongbin;Li, Junfeng;Yan, Yonghong
    • ETRI Journal
    • /
    • 제34권4호
    • /
    • pp.583-590
    • /
    • 2012
  • In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech.

운율 변조 양상에 따른 청자의 연령 지각 (Listener's Age Estimation by Prosody Manipulation)

  • 김지연;성철재
    • 말소리와 음성과학
    • /
    • 제6권2호
    • /
    • pp.81-88
    • /
    • 2014
  • The normal aging process on speech production and these changes are perceived by listeners. This study examined whether age perception changed under various conditions of prosodic manipulations in normal listeners, comparing the prosodic changes according to age and sex in adulthood. The older and younger voices were resynthesized by manipulation of the speaking rate and pitch to shift the perceived age of the groups toward each other. Two-way repeated ANOVA were conducted to determine if the prosodic type of resynthesized cue resulted in a significant shift in perceived age of young and old voices. The manipulation of the speaking rate resulted in a significant shift in perceived age for the older and younger groups. A significant shift in age estimates was not observed for the younger male group when pitch was manipulated. There were significant gender-by-age group interactions for prosodic manipulation type. Age-related changes in the prosodic properties of speech may ultimately influence speech perception.

말속도와 강도 변조에 따른 경도 마비말장애 환자의 말 용인도 변화 (The Change of Acceptability for the Mild Dysarthric Speakers' Speech due to Speech Rate and Loudness Manipulation)

  • 김지연;성철재
    • 말소리와 음성과학
    • /
    • 제7권1호
    • /
    • pp.47-55
    • /
    • 2015
  • This study examined whether speech acceptability was changed under various conditions of prosodic manipulations. Both speech rate and voice loudness reportedly are associated with acceptability and intelligibility. Speech samples by twelve speakers with mild dysarthria were recorded. Speech rate and loudness changes were made by digitally manipulating habitual sentences. 3 different loudness levels (70, 75, & 80dB) and 4 different speech rates (normal, 20% rapidly, 20% slowly, & 40% slowly) were presented to 12 SLPs (speech language pathologists). SLPs evaluated sentence acceptability by 7-point Likert scale. Repeated ANOVA were conducted to determine if the prosodic type of resynthesized cue resulted in a significant change in speech acceptability. A faster speech rate (20% rapidly) rather than habitual and slower rates (20%, 40% slowly) resulted in significant improvement in acceptability ratings (p <.001). An increased vocal loudness (up to 80dB) resulted in significant improvement in acceptability ratings (p <.05). Speech rate and loudness changes in the prosodic properties of speech may contribute to improved acceptability.

/오/-/우/ 합성모음 연속체에 대한 중국인 한국어 학습자의 청지각적 경계 (Perceptual Boundary on a Synthesized Korean Vowel /o/-/u/ Continuum by Chinese Learners of Korean Language)

  • 윤지현;김은경;성철재
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.111-121
    • /
    • 2015
  • The present study examines the auditory boundary between Korean /o/ and /u/ on a synthesized vowel continuum by Chinese learners of Korean language. Preceding researches reported that the Chinese learners have difficulty pronouncing Korean monophthongs /o/ and /u/. In this experiment, a nine-step continuum was resynthesized using Praat from a vowel token from a recording of a male announcer who produced it in isolated form. F1 and F2 were synchronously shifted in equal steps in qtone (quarter tone), while F3 and F4 values were held constant for the entire stimuli. A forced choice identification task was performed by the advanced learners who speak Mandarin Chinese as their native language. Their experiment data were compared to a Korean native group. ROC (Receiver Operating Characteristic) analysis and logistic regression were performed to estimate the perceptual boundary. The result indicated the learner group has a different auditory criterion on the continuum from the Korean native group. This suggests that more importance should be placed on hearing and listening training in order to acquire the phoneme categories of the two vowels.

The continuous or categorical effects for HH vs. HL and HH vs. LH in lexical pitch accent contrasts of Korean

  • Kim, Jungsun
    • 말소리와 음성과학
    • /
    • 제6권4호
    • /
    • pp.53-65
    • /
    • 2014
  • The current research examines whether pitch contour shapes in North Kyungsang pitch accent contrasts provide a phonetic dimension for phonological discreteness in a mimicry task. Two pitch accent continua resynthesized were created for HH vs. HL and HH vs. LH. To confirm a phonetic dimension for accounting for pitch accent categories in North Kyungsang Korean, the mimicries of speakers of two dialects (i.e., North Kyungsang & South Cholla) were compared. One of the findings showed that, for North Kyungsang speakers, the range of mean f0 peak times was a phonetic dimension undergoing a continuous shift within a stimulus continuum for both HH vs. HL and HH vs. LH. On the other hand, for South Cholla speakers, there were no apparent shifts around categorical boundaries for either HH vs. HL or HH vs. LH. Regarding individual mimicries on f0 peak timing, there are many variations. For HH vs. LH, three North Kyungsang speakers showed a discrete pattern reflecting a shift in phonological categories, but for HH vs. HL, there was no such distinction showing a categorical shift, though there were statistically significant differences for two speakers. Interestingly, one of the North Kyungsang speakers showed a continuous phonetic dimension for both HH vs. HL and HH vs. LH. Lastly, the f0 valley timing did not exhibit a discrete or gradient phonetic dimension for speakers of either dialect. On the basis of these results, what is interesting is that the tonal target such as high tone in North Kyungsang pitch accent categories within the autosegmental-metrical (AM) theory may be realized within individual cognitive systems for representing the interaction of perception and production.

도호쿠 일본어의 폐쇄음 지각에 있어서 voice onset time(VOT)과 후속모음 fundamental frequency(F0)의 역할 (The role of voice onset time (VOT) and post-stop fundamental frequency (F0) in the perception of Tohoku Japanese stops)

  • 변희경
    • 말소리와 음성과학
    • /
    • 제15권1호
    • /
    • pp.35-45
    • /
    • 2023
  • 일본어의 전통적인 어두 폐쇄음은 파열 전에 성대 진동을 동반하는 유성음과 파열 후에 약간의 기음을 동반하는 무성음으로 이분된다. 한편 도호쿠지방의 유성음은 어느 세대나 파열 전에 성대 진동을 동반하지 않고 무성화한 유성음으로 실현되어 다른 지역과 대조를 이룬다. 무성화한 유성음은 voice onset time(VOT)이 양값으로 나타나고 그러면 기존의 무성음의 VOT와 충돌하게 되어 카테고리 구별에 영향을 미치게 된다. 이에 대해 도호쿠지방의 화자는 생성 시에 다른 지역과는 달리 폐쇄음 구별에 후속 모음의 fundamental frequency(F0)를 적극적으로 사용하는 것이 여러 연구에 의해 확인되었다. 본 연구는 인지면에서도 F0가 폐쇄음 구별에 중요한 역할을 하고 있는지를 밝히기 위해 VOT와 함께 검토한 것이다. VOT와 F0를 재합성한 자극음을 사용하여 도호쿠지방 청자를 대상으로 조건을 달리한 여러 개의 지각실험을 실시하였다. 결과에서는 무의미어의 경우 지역차(도호쿠 지방 vs.주부 지방)는 유의하지 않았으나 유의미어에서는 어휘에 따라 F0 사용에 유의한 차이가 있었으며 이러한 차이는 F0를 적극적으로 사용하는 몇몇의 청자들에게서 기인하는 것으로 밝혀졌다. 논의에서는 이들이 혁신 청자들로 여겨지며 이들을 중심으로 폐쇄음 지각에 F0 역할이 일반화되고 지각특성으로서 F0가 확립될 가능성에 대해 추론해 보았다.