• Title/Summary/Keyword: Intra-speaker variation

Search Result 7, Processing Time 0.02 seconds

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

  • Kim Dong-Hyun;Hong Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.4 no.3
    • /
    • pp.9-14
    • /
    • 2003
  • The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.

  • PDF

Speaker Verification System Based on HMM Robust to Noise Environments (잡음환경에 강인한 HMM기반 화자 확인 시스템에 관한 연구)

  • 위진우;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.7
    • /
    • pp.69-75
    • /
    • 2001
  • Intra-speaker variation, noise environments, and mismatch between training and test conditions are the major reasons for the speaker verification system unable to use it practically. In this study, we propose robust end-point detection algorithm, noise cancelling with the microphone property compensation technique, and inter-speaker discriminate technique by weighting cepstrum for robust speaker verification system. Simulation results show that the average speaker verification rate is improved in the rate of 17.65% with proposed end-point detection algorithm using LPC residue and is improved in the rate of 36.93% with proposed noise cancelling and microphone property compensation algorithm. The proposed weighting function for discriminating inter-speaker variations also improves the average speaker verification rate in the rate of 6.515%.

  • PDF

Statistical Patterns in Consonant Cluster Simplification in Seoul Korean: Within-dialect Interspeaker and Intraspeaker Variation

  • Cho, Tae-Hong;Kim, Sa-Hyang
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 2009
  • This study examines how young speakers of Seoul Korean produce tri-consonantal clusters /1kt/ and /1pt/ as in palk-ta ('to be bright') and palp-ta ('to step on'). Production data were collected from 20 speakers of Seoul Korean. The results of narrow transcription of the data showed that simplification is not obligatory as some speakers often preserve all three consonants. When simplified, there was a clear asymmetry between /1kt/ and /1pt/. Speakers showed no clear preference for either C1 preservation (C1=/1/) or C2 preservation (C2=/k/ in /1kt/ and /p/ in /1pt/) in production of /1kt/, but in production of /1pt/, strong preference was found for C1-preserved to C2-preserved variant. When compared with production data in Cho (1999), simplification patterns appear to have changed over the past 10 years, in a direction to preserve the first member of the cluster (/1/) more often, especially with /1kt/. There was no substantial between-item variation, indicating that simplification patterns are not lexically specified. Finally, the results suggest that the process of tri-consonantal simplification has not been fully phonologized in the grammar of the language as evident in substantial inter- and intra-speaker variation.

  • PDF

Text-dependent Speaker Recognition System Using DTW & VQ (VQ와 DTW를 이용한 문장 의존형 화자인식 시스템)

  • Jung JongSoon;Oh SeYoung;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.97-103
    • /
    • 2001
  • The speaker recognition method using DTW algorithm has the problem that is reducing the performance of the speaker recognition system as the time variation. So there are many proposed algorithms to solve these problems. This paper proposes the new method If make the reference pattern that is acceptable to intra-speaker variation by reference pattern normalization. And to avoid reducing performance of speaker recognition system, we use the modified reference pattern to recognize the system user. The used methods in this paper are VQ and DTW. As the result of simulation we can obtain the $97.5\%$ of recognition accuracy rate.

  • PDF

A Study on Adaptive Model Updating and a Priori Threshold Decision for Speaker Verification System (화자 확인 시스템을 위한 적응적 모델 갱신과 사전 문턱치 결정에 관한 연구)

  • 진세훈;이재희;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.20-26
    • /
    • 2000
  • In speaker verification system the HMM(hidden Markov model) parameter updating using small amount of data and the priori threshold decision are crucial factor for dealing with long-term variability in people voices. In the paper we present the speaker model updating technique which can be adaptable to the session-to-intra speaker variability and the priori threshold determining technique. The proposed technique decreases verification error rates which the session-to-session intra-speaker variability can bring by adapting new speech data to speaker model parameter through Baum Welch re-estimation. And in this study the proposed priori threshold determining technique is decided by a hybrid score measurement which combines the world model based technique and the cohen model based technique together. The results show that the proposed technique can lead a better performance and the difference of performance is small between the posteriori threshold decision based approach and the proposed priori threshold decision based approach.

  • PDF

An acoustic study on the duration of the morn in Japanese (일본어 특수박의 지속시간에 관한 음향음성학적 분석)

  • Kim Seonhi
    • MALSORI
    • /
    • no.38
    • /
    • pp.113-124
    • /
    • 1999
  • It is well known that Japanese prosodic structure assumes mora below the syllable tier. Syllables with V or CV structure are counted as having one morn whereas those with coda consonants /-pp, -tt, -kk, -ss, -N/ or long vowels are counted as having two morns in Japanese. This study measured the acoustic duration of these special moras ('tokusyuhaku') produced by Tokyo dialect speakers to see if they are isochronic with V or CV. It also examined the production of Korean(Seoul/Kyungsang dialect) and Chinese native speakers loaming Japanese as a second language to examine how the learners' first language influence their second language. Finally, it examined how speakers of the Akita dialect, which is blown as a syllabeme dialect in Japanese, produced them. The results showed that intra-speaker variation as well as inter-speaker variation was observed in the production by Akita dialect speakers. Production of native speakers of Chinese and Kyungsang dialect of Korean -- which have vowel length contrast in their phonological systems -- showed a similar result to Tokyo dialect speakers, which implies the influence of the learners' first language on the acquisition of the second language.

  • PDF

Electromyographic evidence for a gestural-overlap analysis of vowel devoicing in Korean

  • Jun, Sun-A;Beckman, M.;Niimi, Seiji;Tiede, Mark
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.153-200
    • /
    • 1997
  • In languages such as Japanese, it is very common to observe that short peripheral vowel are completely voiceless when surrounded by voiceless consonants. This phenomenon has been known as Montreal French, Shanghai Chinese, Greek, and Korean. Traditionally this phenomenon has been described as a phonological rule that either categorically deletes the vowel or changes the [+voice] feature of the vowel to [-voice]. This analysis was supported by Sawashima (1971) and Hirose (1971)'s observation that there are two distinct EMG patterns for voiced and devoiced vowel in Japanese. Close examination of the phonetic evidence based on acoustic data, however, shows that these phonological characterizations are not tenable (Jun & Beckman 1993, 1994). In this paper, we examined the vowel devoicing phenomenon in Korean using data from ENG fiberscopic and acoustic recorders of 100 sentences produced by one Korean speaker. The results show that there is variability in the 'degree of devoicing' in both acoustic and EMG signals, and in the patterns of glottal closing and opening across different devoiced tokens. There seems to be no categorical difference between devoiced and voiced tokens, for either EMG activity events or glottal patterns. All of these observations support the notion that vowel devoicing in Korean can not be described as the result of the application of a phonological rule. Rather, devoicing seems to be a highly variable 'phonetic' process, a more or less subtle variation in the specification of such phonetic metrics as degree and timing of glottal opening, or of associated subglottal pressure or intra-oral airflow associated with concurrent tone and stricture specifications. Some of token-pair comparisons are amenable to an explanation in terms of gestural overlap and undershoot. However, the effect of gestural timing on vocal fold state seems to be a highly nonlinear function of the interaction among specifications for the relative timing of glottal adduction and abduction gestures, of the amplitudes of the overlapped gestures, of aerodynamic conditions created by concurrent oral tonal gestures, and so on. In summary, to understand devoicing, it will be necessary to examine its effect on phonetic representation of events in many parts of the vocal tracts, and at many stages of the speech chain between the motor intent and the acoustic signal that reaches the hearer's ear.

  • PDF