• Title/Summary/Keyword: Phoneme set

Search Result 38, Processing Time 0.028 seconds

A Study on the Korean Syllable As Recognition Unit (인식 단위로서의 한국어 음절에 대한 연구)

  • Kim, Yu-Jin;Kim, Hoi-Rin;Chung, Jae-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3
    • /
    • pp.64-72
    • /
    • 1997
  • In this paper, study and experiments are performed for finding recognition unit fit which can be used in large vocabulary recognition system. Specifically, a phoneme that is currently used as recognition unit and a syllable in which Korean is well characterized are selected. From comparisons of recognition experiments, the study is performed whether a syllable can be considered as recognition unit of Korean recognition system. For report of an objective result of the comparison experiment, we collected speech data of a male speaker and processed them by hand-segmentation for phoneme boundary and labeling to construct speech database. And for training and recognition based on HMM, we used HTK (HMM Tool Kit) 2.0 of commercial tool from Entropic Co. to experiment in same condition. We applied two HMM model topologies, 3 emitting state of 5 state and 6 emitting state of 8 state, in Continuous HMM on training of each recognition unit. We also used 3 sets of PBW (Phonetically Balanced Words) and 1 set of POW(Phonetically Optimized Words) for training and another 1 set of PBW for recognition, that is "Speaker Dependent Medium Vocabulary Size Recognition." Experiments result reports that recognition rate is 95.65% in phoneme unit, 94.41% in syllable unit and decoding time of recognition in syllable unit is faster by 25% than in phoneme.

  • PDF

A Study on the Pitch Contour Generator with Neural Network in the Isolated Words (신경망을 이용한 고립단어에서의 피치변화곡선 발생기에 관한 연구)

  • Lim Unchun;Kwak Jingu;Chang Sokwang
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.137-155
    • /
    • 1996
  • The purpose of this paper is to generate a pitch contour which is affected by tile phonetic environment and the number of syllables in each Korean isolated word using a neural network. To do this, we analyzed a set of 513 Korean isolated words, consisting of 1-4 syllables and extracted the pitch contour and the duration of each phoneme in all the words. The total number of phonemes we analyzed is about 3800. After that we approximated the pitch contour with a 1st order polynominal by a regression analysis. We could get the slope, the initial pitch and the duration of each phoneme. We used these 3 parameters as the target pattern of the neural network and let the neural network learn the rule of the variation of the pitch and duration, which was affected by the phonetic environment of each phoneme. We used 7 consecutive phoneme strings as an input pattern for a neural network to make the network learn the effect of phonetic environment around the center phoneme. In the learning phase, we used 3545 items(463 words) as target patterns which contained the phonetic environment of front and rear 3 phonemes and the neural network showed the correctness rate of 98.43%, 98.59%, 97.7% in the estimation of the duration, the slope, the initial pitch. In the recall phase, we tested the performance of tile neural network with 251 items(50 words) which weren't need as learning data and we could get the good correctness rate of 97.34%, 95.45%, 96.3% in the generation of the duration, the slope, and the initial pitch of each phoneme.

  • PDF

Phoneme Segmentation based on Volatility and Bulk Indicators in Korean Speech Recognition (한국어 음성 인식에서 변동성과 벌크 지표에 기반한 음소 경계 검출)

  • Lee, Jae Won
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.10
    • /
    • pp.631-638
    • /
    • 2015
  • Today, the demand for speech recognition systems in mobile environments is increasing rapidly. This paper proposes a novel method for Korean phoneme segmentation that is applicable to a phoneme based Korean speech recognition system. First, the input signal constitutes blocks of the same size. The proposed method is based on a volatility indicator calculated for each block of the input speech signal, and the bulk indicators calculated for each bulk in blocks, where a bulk is a set of adjacent samples that have the same sign as that of the primitive indicators for phoneme segmentation. The input signal vowels, voiced consonants, and voiceless consonants are sequentially recognized and the boundaries among phonemes are found using three devoted recognition algorithms that combine the two types of primitive indicators. The experimental results show that the proposed method can markedly reduce the error rate of the existing phoneme segmentation method.

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser;Almasganj, Farshad
    • ETRI Journal
    • /
    • v.35 no.1
    • /
    • pp.100-108
    • /
    • 2013
  • In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data (음향 데이터로부터 얻은 확장된 음소 단위를 이용한 한국어 자유발화 음성인식기의 성능)

  • Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.39-47
    • /
    • 2019
  • We propose a method to improve the performance of spontaneous speech recognizers by extending their phone set using speech data. In the proposed method, we first extract variable-length phoneme-level segments from broadcast speech signals, and convert them to fixed-length latent vectors using an long short-term memory (LSTM) classifier. We then cluster acoustically similar latent vectors and build a new phone set by choosing the number of clusters with the lowest Davies-Bouldin index. We also update the lexicon of the speech recognizer by choosing the pronunciation sequence of each word with the highest conditional probability. In order to analyze the acoustic characteristics of the new phone set, we visualize its spectral patterns and segment duration. Through speech recognition experiments using a larger training data set than our own previous work, we confirm that the new phone set yields better performance than the conventional phoneme-based and grapheme-based units in both spontaneous speech recognition and read speech recognition.

The methods of recognition of consonants(voiced stops) by Neural Network (신경망에 의한 초성자음(ㄱ, ㄷ, ㅂ)의 인식방법)

  • 김석동
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.73-77
    • /
    • 1991
  • As the basic analysis to solve the stop consonants in phoneme based speech recognition using Back Propagation learning algorithm, changes in hidden units, training set and iteration. Also we propose an efficient processing method of separation between consonants and vowels.

  • PDF

Consonant Inventories of the Better Cochlear Implant Children in Korea (말지각 능력이 우수한 인공와우 착용 아동들의 조음 특성 : 정밀전사 분석 방법을 중심으로)

  • Chang, Son-A;Kim, Soo-Jin;Shin, Ji-Young
    • MALSORI
    • /
    • no.62
    • /
    • pp.33-49
    • /
    • 2007
  • The purpose of this study is 1) to investigate the phoneme inventories and phonological processes of cochlear implant(CI) children and 2) to describe their utterances using narrow phonetic transcription method. All ten subjects had more than 2 year-experience with CI and showed more than 85 % open-set sentence perception abilities. Average consonant accuracy was 81.36 % and it was improved up to 87.41% when distortion errors were not counted. They showed similar phonological processing patterns to HA or normal hearing children in some way as well as different phonological processing patterns from HA or normal hearing children. The prominent distortion error pattern was weakening of consonants. Every subject had his/her idiosyncratic error pattern that demanded his/her own individualized therapy program.

  • PDF

An Automatic Segmentation System Based on HMM and Correction Algorithm (HMM 및 보정 알고리즘을 이용한 자동 음성 분할 시스템)

  • Kim, Mu-Jung;Kwon, Chul-Hong
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.265-274
    • /
    • 2002
  • In this paper we propose an automatic segmentation system that outputs the time alignment information of phoneme boundary using Viterbi search with HMM (Hidden Markov Model) and corrects these results by an UVS (unvoiced/voiced/silence) classification algorithm. We selecte a set of 39 monophones and a set of 647 extended phones for HMM models. For the UVS classification we use the feature parameters such as ZCR (Zero Crossing Rate), log energy, spectral distribution. The result of forced alignment using the extended phone set is 11% better than that of the monophone set. The UVS classification algorithm shows high performance to correct the segmentation results.

  • PDF

A Revising Method using Phoneme Comparison for Databases with Korean Character Set (데이터베이스상의 한글 자모단위 비교를 통한 데이터 정정기법)

  • 김대환;백두권
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.532-534
    • /
    • 2003
  • 코드로써 관리되어있지 않은 데이터베이스 내의 다양한 속성들이 시간이 흐름에 따라 정보로써 가치를 갖게 되면서. 비코드성 한글 데이터의 정형화에 대한 요구가 증가하고 있다. 정형화에 있어 한글의 특수성 중에 하나는 한글자료의 경우 KSC5601, CP949등을 사용하여 음절단위의 문자셋을 사용하여 음절단위로 저장 관리한다. 그런데 입력 시정에서는 자판기등을 이용하여 음소단위로 데이터를 입력하면서 발생하는 오류 및 비정형 데이터의 유입의 문제 등을 내포하고 있다. 이러한 문제를 해결하기 위하여 데이터의 저장단위인 음절이 아닌 음소 단위의 비교를 통하여 데이터를 정정하는 기법을 제안하고자 한다.

  • PDF

Phonetic Transcription based Speech Recognition using Stochastic Matching Method (확률적 매칭 방법을 사용한 음소열 기반 음성 인식)

  • Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.5
    • /
    • pp.696-700
    • /
    • 2007
  • A new method that improves the performance of the phonetic transcription based speech recognition system is presented with the speaker-independent phonetic recognizer. Since SI phoneme HMM based speech recognition system uses only the phoneme transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the phoneme recognition errors generated from using SI models. A new training method that iteratively estimates the phonetic transcription and transformation vectors is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the transformation vectors. The experiments performed over actual telephone line shows that a reduction of about 45% in the error rates could be achieved as compared to the conventional method.