• Title/Summary/Keyword: speaker variation

Search Result 74, Processing Time 0.018 seconds

Performance Improvement of Connected Digit Recognition by Considering Phonemic Variations in Korean Digit and Speaking Styles (한국어 숫자음의 음운변화 및 화자 발성특성을 고려한 연결숫자 인식의 성능향상)

  • 송명규;김형순
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.401-406
    • /
    • 2002
  • Each Korean digit is composed of only a syllable, so recognizers as well as Korean often have difficulty in recognizing it. When digit strings are pronounced, the original pronunciation of each digit is largely changed due to the co-articulation effect. In addition to these problems, the distortion caused by various channels and noises degrades the recognition performance of Korean connected digit string. This paper dealt with some techniques to improve recognition performance of it, which include defining a set of PLUs by considering phonemic variations in Korean digit and constructing a recognizer to handle speakers various speaking styles. In the speaker-independent connected digit recognition experiments using telephone speech, the proposed techniques with 1-Gaussian/state gave string accuracy of 83.2%, i. e., 7.2% error rate reduction relative to baseline system. With 11-Gaussians/state, we achieved the highest string accuracy of 91.8%, i. e., 4.7% error rate reduction.

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.681-686
    • /
    • 2012
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.

Tonal development and voice quality in the stops of Seoul Korean

  • Yu, Hye Jeong
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.91-99
    • /
    • 2018
  • Korean stops are currently undergoing a tonogenetic sound change, as found in the Seoul dialect in which a merged VOT of aspirated and lax stops induces F0 to be the primary cue for distinguishing the two stops and the lax stops have lower F0 than the aspirated stops. In tonal languages, low tone is produced with a breathy voice. This study investigated whether there are changes in voice quality with respect to the tonogenetic sound change of Korean stops. Two age groups speaking the Seoul dialect participated in this study: five females and six males born in the 1940s and 1950s and nine females and eight males born in the 1980s and 1990s. This study replicated previous findings of VOT and F0 and further examined H1-H2, H1-A1, and H1-A2 to see how they correlate with the sound change. In the older and younger generations, H1-H2, H1-A1, and H1-A2 were significantly lower after the tense stops than after the aspirated and lax stops, but they were not significantly different after the aspirated and lax stops. However, the younger females exhibited some different results for H1-H2 and H1-A2 than the older generation. In the younger females, the H1-H2 mean was higher after the aspirated stops than it was after the lax stops at the vowel onset, and the H1-H2 difference increased at the vowel midpoint. Although there was an inter-speaker variation in the results of H1-H2 and H1-A1, analyses of individual speakers showed that the H1-H2 and H1-A1 were higher after the lax stops than after the aspirated stops in the younger female speakers. These results indicate that lax stops tend to be breathier than aspirated stops in the younger female speakers. They also indicate that changes in voice quality are on Korean stops with tonal sound change, but are still developing.

Robust Speech Recognition Algorithm of Voice Activated Powered Wheelchair for Severely Disabled Person (중증 장애우용 음성구동 휠체어를 위한 강인한 음성인식 알고리즘)

  • Suk, Soo-Young;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.250-258
    • /
    • 2007
  • Current speech recognition technology s achieved high performance with the development of hardware devices, however it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. For the system which aims to operate powered wheelchairs safely by voice in real environment, we need to consider that non-voice commands such as user s coughing, breathing, and spark-like mechanical noise should be rejected and the wheelchair system need to recognize the speech commands affected by disability, which contains specific pronunciation speed and frequency. In this paper, we propose non-voice rejection method to perform voice/non-voice classification using both YIN based fundamental frequency(F0) extraction and reliability in preprocessing. We adopted a multi-template dictionary and acoustic modeling based speaker adaptation to cope with the pronunciation variation of inarticulately uttered speech. From the recognition tests conducted with the data collected in real environment, proposed YIN based fundamental extraction showed recall-precision rate of 95.1% better than that of 62% by cepstrum based method. Recognition test by a new system applied with multi-template dictionary and MAP adaptation also showed much higher accuracy of 99.5% than that of 78.6% by baseline system.