• Title/Summary/Keyword: phonetic level

Search Result 112, Processing Time 0.022 seconds

A Study on Regression Class Generation of MLLR Adaptation Using State Level Sharing (상태레벨 공유를 이용한 MLLR 적응화의 회귀클래스 생성에 관한 연구)

  • 오세진;성우창;김광동;노덕규;송민규;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.727-739
    • /
    • 2003
  • In this paper, we propose a generation method of regression classes for adaptation in the HM-Net (Hidden Markov Network) system. The MLLR (Maximum Likelihood Linear Regression) adaptation approach is applied to the HM-Net speech recognition system for expressing the characteristics of speaker effectively and the use of HM-Net in various tasks. For the state level sharing, the context domain state splitting of PDT-SSS (Phonetic Decision Tree-based Successive State Splitting) algorithm, which has the contextual and time domain clustering, is adopted. In each state of contextual domain, the desired phoneme classes are determined by splitting the context information (classes) including target speaker's speech data. The number of adaptation parameters, such as means and variances, is autonomously controlled by contextual domain state splitting of PDT-SSS, depending on the context information and the amount of adaptation utterances from a new speaker. The experiments are performed to verify the effectiveness of the proposed method on the KLE (The center for Korean Language Engineering) 452 data and YNU (Yeungnam Dniv) 200 data. The experimental results show that the accuracies of phone, word, and sentence recognition system increased by 34∼37%, 9%, and 20%, respectively, Compared with performance according to the length of adaptation utterances, the performance are also significantly improved even in short adaptation utterances. Therefore, we can argue that the proposed regression class method is well applied to HM-Net speech recognition system employing MLLR speaker adaptation.

A Longitudinal Case Study of Late Babble and Early Speech in Southern Mandarin

  • Chen, Xiaoxiang
    • Cross-Cultural Studies
    • /
    • v.20
    • /
    • pp.5-27
    • /
    • 2010
  • This paper studies the relation between canonical/variegated babble (CB/VB) and early speech in an infant acquiring Mandarin Chinese from 9 to 17 months. The infant was audio-and video-taped in her home almost every week. The data analyzed here come from 1,621 utterances extracted from 23 sessions ranging from 30 minutes to one hour, from age 00:09;07 to 01:05;27. The data was digitized, and segments from 23 sessions were transcribed in narrow IPA and coded for analysis. Babble was coded from age 00:09;07 to 01:00;00, and words were coded from 01:00;00 to 01:05;27, proto-words appeared at 11 months, and some babble was still present after 01:10;00. 3821 segments were counted in CB/VB utterances, plus the segments found in 899 word tokens. The data transcription was completed and checked by the author and was rechecked by two other researchers who majored in Chinese phonetics in order to ensure the reliability, we reached an agreement of 95.65%. Mandarin Chinese is phonetically very rich in consonants, especially affricates: it has aspirated and unaspirated stops in labial, alveolar, and velar places of articulation; affricates and fricatives in alveolar, retroflex, and palatal places; /f/; labial, alveolar, and velar nasals; a lateral;[h]; and labiovelar and palatal glides. In the child's pre-speech phonetic repertoire, 7 different consonants and 10 vowels were transcribed at 00:09;07. By 00:10;16, the number of phones was more than doubled (17 consonants, 25 vowels), but the rate of increase slowed after 11 months of age. The phones from babbling remained active throughout the child's early and subsequent speech. The rank order of the occurrence of the major class types for both CB and early speech was: stops, approximants, nasals, affricates, fricatives and lateral. As expected, unaspirated stops outnumbered aspirated stops, and front stops and nasals were more frequent than back sounds in both types of utterances. The fact that affricates outnumbered fricatives in the child's late babble indicates the pre-speech influence of the ambient language. The analysis of the data also showed that: 1) the phonetic characteristics of CB/VB and early meaningful speech are extremely similar. The similarities of CB/VB and speech prove that the two are deeply related; 2) The infant has demonstrated similar preferences for certain types of sounds in the two stages; 3) The infant's babbling was patterned at segmental level, and this regularity was similarly evident in the early speech of children. The three types being coronal plus front vowel; labial plus central and dorsal plus back vowel exhibited much overlap in the phonetic forms of CB/ VB and early speech. So the child's CB/ VB at this stage already shared the basic architecture, composition and representation of early speech. The evidence of similarity between CB/VB and early speech leaves no doubt that phones present in CB/VB are indeed precursors to early speech.

Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation (자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리)

  • Park Hyeyoung;Kim Hyungsoon
    • MALSORI
    • /
    • no.43
    • /
    • pp.137-150
    • /
    • 2002
  • Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.

  • PDF

Variation of Word-Initial Length by Age in Seoul Dialect (서울말 장단의 연령별 변이)

  • Kim Seoncheol;Kwon Mi-yeong;Hwang Yoen-Shin
    • MALSORI
    • /
    • no.50
    • /
    • pp.1-22
    • /
    • 2004
  • The aim of this paper is to show what are the sociolinguistic variables of word-initial length loss in Seoul dialect. 350 people were inquired to pronounce 40 words. Among the informants, 152 were male, and 198 were female. In terms of their age, 49 were twenties, 70 were thirties, 69 were forties, 71 were fifties, and 91 were above sixties. According to our statistics, 18 words show sociolinguistic variation by age, and sex was not a variable. So we can conclude that Seoul dialect is undergoing length loss by age at least. But we need to enlarge the number of words and informants and we also need to adopt other variables like social level, education etc for better understanding of Seoul dialect.

  • PDF

The Locus of the Word Frequency Effect in Speech Production: Evidence from the Picture-word Interference Task (말소리 산출에서 단어빈도효과의 위치 : 그림-단어간섭과제에서 나온 증거)

  • Koo, Min-Mo;Nam, Ki-Chun
    • MALSORI
    • /
    • no.62
    • /
    • pp.51-68
    • /
    • 2007
  • Two experiments were conducted to determine the exact locus of the frequency effect in speech production. Experiment 1 addressed the question as to whether the word frequency effect arise from the stage of lemma selection. A picture-word interference task was performed to test the significance of interactions between the effects of target frequency, distractor frequency and semantic relatedness. There was a significant interaction between the distractor frequency and the semantic relatedness and between the target and the distractor frequency. Experiment 2 examined whether the word frequency effect is attributed to the lexeme level which represent phonological information of words. A methodological logic applied to Experiment 2 was the same as that of Experiment 1. There was no significant interaction between the distractor frequency and the phonological relatedness. These results demonstrate that word frequency has influence on the processes involved in selecting a correct lemma corresponding to an activated lexical concept in speech production.

  • PDF

Comparison of the recognition performance of Korean connected digit telephone speech depending on channel compensation methods and feature parameters (채널보상기법 및 특징파라미터에 따른 한국어 연속숫자음 전화음성의 인식성능 비교)

  • Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kim Sang Hun
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.201-204
    • /
    • 2002
  • As a preliminary study for improving recognition performance of the connected digit telephone speech, we investigate feature parameters as well as channel compensation methods of telephone speech. The CMN and RTCN are examined for telephone channel compensation, and the MFCC, DWFBA, SSC and their delta-features are examined as feature parameters. Recognition experiments with database we collected show that in feature level DWFBA is better than MFCC and for channel compensation RTCN is better than CMN. The DWFBA+Delta_ Mel-SSC feature shows the highest recognition rate.

  • PDF

Accurate Speech Detection based on Sub-band Selection for Robust Keyword Recognition (강인한 핵심어 인식을 위해 유용한 주파수 대역을 이용한 음성 검출기)

  • Ji Mikyong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.183-186
    • /
    • 2002
  • The speech detection is one of the important problems in real-time speech recognition. The accurate detection of speech boundaries is crucial to the performance of speech recognizer. In this paper, we propose a speech detector based on Mel-band selection through training. In order to show the excellence of the proposed algorithm, we compare it with a conventional one, so called, EPD-VAA (EndPoint Detector based on Voice Activity Detection). The proposed speech detector is trained in order to better extract keyword speech than other speech. EPD-VAA usually works well in high SNR but it doesn't work well any more in low SNR. But the proposed algorithm pre-selects useful bands through keyword training and decides the speech boundary according to the energy level of the sub-bands that is previously selected. The experimental result shows that the proposed algorithm outperforms the EPD-VAA.

  • PDF

Against Phonological Ambisyllabicity (음운적 양음절성의 허상)

  • 김영석
    • Korean Journal of English Language and Linguistics
    • /
    • v.1 no.1
    • /
    • pp.19-38
    • /
    • 2001
  • The question of how / ... VCV .../ sequences should be syllabified is a much discussed, yet unresolved, issue in English phonology. While most researchers recognize an over-all universal tendency towards open syllables, there seem to be at least two different views as regards the analysis of / ... VCV .../ when the second vowel is unstressed: ambisyllabicity (e.g., Kahn 1976) and resyllabification (e.g., Borowsky 1986). Basically, we adopt the latter view and will present further evidence in its favor. This does not exclude low-level “phonetic” ambisyllabification, however. Following Nespor and Vogel (1986), we also assume that the domain of syllabification or resyllabification is the phonological word. With the new conception of the syllable structure of English, we attempt a reanalysis of Aitkin's Law as well as fe-tensing in New York City and Philadelphia.

  • PDF

Lexical Encoding of L2 Suprasegmentals: Evidence from Korean Learners' Acquisition of Japanese Vowel Length Distinctions

  • Han, Jeong-Im
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.17-27
    • /
    • 2009
  • Despite many studies on the production and perception of L2 phonemes, studies on how such phonemes are encoded lexically remain scarce. The aim of this study is to examine whether L2 learners have a perceptual problem with L2 suprasegmentals which are not present in their L1, or if they are able to perceive but not able to encode them in their lexicon. Specifically, Korean learners were tested to see if they could discriminate the vowel length differences in Japanese at the psychoacoustic level through a simple AX discrimination task. Then, a speeded lexical decision task with high phonetic variability was conducted to see whether they could use such contrasts lexically. The results showed that Korean learners of Japanese have no difficulties in discriminating Japanese vowel length contrast, but they are unable to encode such contrast in their phonological representation, even with long L2 exposure.

  • PDF

On the Simple Speaker Verification System Using Tolerance Interval Analysis Without Background Speaker Models (Tolerance Interval Analysis를 이용한 배경화자 없는 간단한 화자인증시스템에 관한 연구)

  • Choi, Hong-Sub
    • MALSORI
    • /
    • no.56
    • /
    • pp.147-158
    • /
    • 2005
  • In this paper, we are focused to develop the simplified speaker verification algorithm without background speaker models, which will be adopted in the portable speaker verification system equipped in portable terminals such as mobile phone and PMP. According to the tolerance interval analysis, the population of someone's speaker model can be represented by a suitable number of selected independent samples of speaker model. So we can make the representative speaker model and threshold under the specified confidence level and coverage. Using proposed algorithm with the number of samples is 40, the experiments show that the false rejection rate is $3.0\%$ and the false acceptance rate $4.3\%$, worth comparing to conventional method's results, $5.4\%\;and\;5.5\%$, respectively. Next step of research will be on the suitable adaptation methods to overcome speech variation problems due to aging effect and operating environments.

  • PDF