• Title/Summary/Keyword: phonemes

Search Result 226, Processing Time 0.028 seconds

Korean Word Recognition Using Diphone- Level Hidden Markov Model (Diphone 단위 의 hidden Markov model을 이용한 한국어 단어 인식)

  • Park, Hyun-Sang;Un, Chong-Kwan;Park, Yong-Kyu;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.14-23
    • /
    • 1994
  • In this paper, speech units appropriate for recognition of Korean language have been studied. For better speech recognition, co-articulatory effects within an utterance should be considered in the selection of a recognition unit. One way to model such effects is to use larger units of speech. It has been found that diphone is a good recognition unit because it can model transitional legions explicitly. When diphone is used, stationary phoneme models may be inserted between diphones. Computer simulation for isolated word recognition was done with 7 word database spoken by seven male speakers. Best performance was obtained when transition regions between phonemes were modeled by two-state HMM's and stationary phoneme regions by one-state HMM's excluding /b/, /d/, and /g/. By merging rarely occurring diphone units, the recognition rate was increased from $93.98\%$ to $96.29\%$. In addition, a local interpolation technique was used to smooth a poorly-modeled HMM with a well-trained HMM. With this technique we could get the recognition rate of $97.22\%$ after merging some diphone units.

  • PDF

Speaker Verification System Using Continuants and Multilayer Perceptrons (지속음 및 다층신경망을 이용한 화자증명 시스템)

  • Lee, Tae-Seung;Park, Sung-Won;Hwang, Byong-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.1015-1020
    • /
    • 2003
  • Among the techniques to protect private information by adopting biometrics, speaker verification is expected to be widely used due to advantages in convenient usage and implementation cost. Speaker verification should achieve a high degree of the reliability in the verification score, the flexibility in speech text usage, and the efficiency in verification system complexity. Continuants have excellent speaker-discriminant power and the modest number of phonemes in the category, and multilayer perceptrons (MLPs) have superior recognition ability and fast operation speed. In consequence, the two provide viable ways for speaker verification system to obtain the above properties. This paper implements a system to which continuants and MLPs are applied, and evaluates the system using a Korean speech database. The results of the experiment prove that continuants and MLPs enable the system to acquire the three properties.

  • PDF

A PHONEMIC ANALYSIS OF THE UNWRITTEN LANGUAGE OF THE PULANG TRIBE

  • Kang, Su-Hee
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.166-177
    • /
    • 2000
  • The purpose of this study was to create letters for of nonliterary Pulang tribe in Thailand those who immigrant from China. illiterate Pulang tribe hand down their tradition by primary oral culture therefore their tradition can't initiate and keep, moreover, it may disappear throughout history. So it is expected to crusade against unlettered people. The scheme of research adopted in this study was a minority race who habitate at the northern Machan, Chiangrai in Thailand. It is not only analysis of language but also the eradication of literacy and the research based on linguistic, ethnolinguistic, and primary oral culture. Five Pulang people who live in that area were chosen for creating letters. By using the I. P. A., after each word was listen to their pronunciation one by one it was described and repeated this process several times; the material words and humanbody were pointed in front of them while other words were described by gesture. For final description, number of people were in the lineup for listening the sound of words and phrases to sentences. In the first stage, it was an analysis segmental of Pulang: vocoid, contoid and diphthong were described with each sample syllables and words. The suprasegmental were studied with intonation and juncture of the words in the second stage. Two words were compared and different meanings within their intonation and juncture were shown. At the end of this part, each case of phonemic or morphophonemics representation described the juncture in the words. In the third stage, minimal pairs were analyzed with vowels and consonants and described in free variation based on words. In the last stage, syllable structure in open syllable and closed syllable was studied and then each syllable of its structure was analyzed with samples. There were thirty-two phonemes in apong Pulang as follows: seven vocoids; a, i, e, o, u, ${\ae}$, and $\wedge$, one diphthong; wu, 24 contoids; b, c, d, f, g, h, j, k, k, 1, m, n, ${\eta}, {\;}p^{h}$, p, p, r, s, s, sh, t, t, w, and y. Their pronunciations of p, s, d, $p^{h}$, j, and t are frequently used in speech and are unique in triphthong. Moreover, most of the words used initial and final consonant cluster.

  • PDF

Utilization of Syllabic Nuclei Location in Korean Speech Segmentation into Phonemic Units (음절핵의 위치정보를 이용한 우리말의 음소경계 추출)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.13-19
    • /
    • 2000
  • The blind segmentation method, which segments input speech data into recognition unit without any prior knowledge, plays an important role in continuous speech recognition system and corpus generation. As no prior knowledge is required, this method is rather simple to implement, but in general, it suffers from bad performance when compared to the knowledge-based segmentation method. In this paper, we introduce a method to improve the performance of a blind segmentation of Korean continuous speech by postprocessing the segment boundaries obtained from the blind segmentation. In the preprocessing stage, the candidate boundaries are extracted by a clustering technique based on the GLR(generalized likelihood ratio) distance measure. In the postprocessing stage, the final phoneme boundaries are selected from the candidates by utilizing a simple a priori knowledge on the syllabic structure of Korean, i.e., the maximum number of phonemes between any consecutive nuclei is limited. The experimental result was rather promising : the proposed method yields 25% reduction of insertion error rate compared that of the blind segmentation alone.

  • PDF

A Study on Phoneme Likely Units to Improve the Performance of Context-dependent Acoustic Models in Speech Recognition (음성인식에서 문맥의존 음향모델의 성능향상을 위한 유사음소단위에 관한 연구)

  • 임영춘;오세진;김광동;노덕규;송민규;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.388-402
    • /
    • 2003
  • In this paper, we carried out the word, 4 continuous digits. continuous, and task-independent word recognition experiments to verify the effectiveness of the re-defined phoneme-likely units (PLUs) for the phonetic decision tree based HM-Net (Hidden Markov Network) context-dependent (CD) acoustic modeling in Korean appropriately. In case of the 48 PLUs, the phonemes /ㅂ/, /ㄷ/, /ㄱ/ are separated by initial sound, medial vowel, final consonant, and the consonants /ㄹ/, /ㅈ/, /ㅎ/ are also separated by initial sound, final consonant according to the position of syllable, word, and sentence, respectively. In this paper. therefore, we re-define the 39 PLUs by unifying the one phoneme in the separated initial sound, medial vowel, and final consonant of the 48 PLUs to construct the CD acoustic models effectively. Through the experimental results using the re-defined 39 PLUs, in word recognition experiments with the context-independent (CI) acoustic models, the 48 PLUs has an average of 7.06%, higher recognition accuracy than the 39 PLUs used. But in the speaker-independent word recognition experiments with the CD acoustic models, the 39 PLUs has an average of 0.61% better recognition accuracy than the 48 PLUs used. In the 4 continuous digits recognition experiments with the liaison phenomena. the 39 PLUs has also an average of 6.55% higher recognition accuracy. And then, in continuous speech recognition experiments, the 39 PLUs has an average of 15.08% better recognition accuracy than the 48 PLUs used too. Finally, though the 48, 39 PLUs have the lower recognition accuracy, the 39 PLUs has an average of 1.17% higher recognition characteristic than the 48 PLUs used in the task-independent word recognition experiments according to the unknown contextual factor. Through the above experiments, we verified the effectiveness of the re-defined 39 PLUs compared to the 48PLUs to construct the CD acoustic models in this paper.

A Study on the Multilingual Speech Recognition using International Phonetic Language (IPA를 활용한 다국어 음성 인식에 관한 연구)

  • Kim, Suk-Dong;Kim, Woo-Sung;Woo, In-Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.7
    • /
    • pp.3267-3274
    • /
    • 2011
  • Recently, speech recognition technology has dramatically developed, with the increase in the user environment of various mobile devices and influence of a variety of speech recognition software. However, for speech recognition for multi-language, lack of understanding of multi-language lexical model and limited capacity of systems interfere with the improvement of the recognition rate. It is not easy to embody speech expressed with multi-language into a single acoustic model and systems using several acoustic models lower speech recognition rate. In this regard, it is necessary to research and develop a multi-language speech recognition system in order to embody speech comprised of various languages into a single acoustic model. This paper studied a system that can recognize Korean and English as International Phonetic Language (IPA), based on the research for using a multi-language acoustic model in mobile devices. Focusing on finding an IPA model which satisfies both Korean and English phonemes, we get 94.8% of the voice recognition rate in Korean and 95.36% in English.

Phoneme Segmentation based on Volatility and Bulk Indicators in Korean Speech Recognition (한국어 음성 인식에서 변동성과 벌크 지표에 기반한 음소 경계 검출)

  • Lee, Jae Won
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.10
    • /
    • pp.631-638
    • /
    • 2015
  • Today, the demand for speech recognition systems in mobile environments is increasing rapidly. This paper proposes a novel method for Korean phoneme segmentation that is applicable to a phoneme based Korean speech recognition system. First, the input signal constitutes blocks of the same size. The proposed method is based on a volatility indicator calculated for each block of the input speech signal, and the bulk indicators calculated for each bulk in blocks, where a bulk is a set of adjacent samples that have the same sign as that of the primitive indicators for phoneme segmentation. The input signal vowels, voiced consonants, and voiceless consonants are sequentially recognized and the boundaries among phonemes are found using three devoted recognition algorithms that combine the two types of primitive indicators. The experimental results show that the proposed method can markedly reduce the error rate of the existing phoneme segmentation method.

Analysis of the typology of errors in French's pronunciation by Arabs and proposition for the phonetic correction: Based on the Younes's research paper (아랍어권 학습자들에 의한 프랑스어 발음 오류의 유형 분류와 개선 방안: Younes의 논문을 중심으로)

  • JUNG, Il-Young
    • Cross-Cultural Studies
    • /
    • v.27
    • /
    • pp.7-29
    • /
    • 2012
  • This study was aimed to analyze - focusing on the thesis of Younes - the pronunciation error occuring mostly for Arabian speakers to learn French pronunciation for Arabians and to suggest the effective study plan to improve such errors and provide the effective studying method. The first part is on how the Arabic and French pronunciation system are distinguished, especially by comparing and analyzing the system of graphemes and phonemes, with which we focused on the fact that Arabian is a language centralized on consonants, while French is a verb-centered language. In the second part, we mainly discussed the cause and the types of errors occurring when Arabic speakers study French pronunciation. As of the category of mistakes, we separated them into consonants and verbs. We assumed the possible method which can be used in learning, focusing on /b/, /v/, /p/, /b/ - in case of non-verbs and consonants - and /y/, /ø/, - in case of verbs - which don't exist in Arabic pronunciation system. One of the troubles the professors in Arabian culture have in teaching French to native learners is how to solve the problem on a phonetic basis regarding speaking and reading ability, which belong to verbal skill, among the critical factors of foreign language education, which are listening, speaking, reading, and writing skills. In fact, the problems occuring in learning foreign language are had by not only Arabian learners but also general groups of people who learn the foreign language, the pronunciation system of which is distinctly distinguished from their mother tongue. The important fact professors should recognize regarding study of pronunciation is that they should encourage the learners to reach the acceptable level in proper communication rather than push them to have the same ability as the native speakers, Even though it cannot be said that the methods suggested in this study have absolute influence in reducing errors when learning French pronunciation system, I hope it can be at least a small help.

Speech Feature Extraction based on Spikegram for Phoneme Recognition (음소 인식을 위한 스파이크그램 기반의 음성 특성 추출 기술)

  • Han, Seokhyeon;Kim, Jaewon;An, Soonho;Shin, Seonghyeon;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.24 no.5
    • /
    • pp.735-742
    • /
    • 2019
  • In this paper, we propose a method of extracting speech features for phoneme recognition based on spikegram. The Fourier-transform-based features are widely used in phoneme recognition, but they are not extracted in a biologically plausible way and cannot have high temporal resolution due to the frame-based operation. For better phoneme recognition, therefore, it is desirable to have a new method of extracting speech features, which analyzes speech signal in high temporal resolution following the model of human auditory system. In this paper, we analyze speech signal based on a spikegram that models feature extraction and transmission in auditory system, and then propose a method of feature extraction from the spikegram for phoneme recognition. We evaluate the performance of proposed features by using a DNN-based phoneme recognizer and confirm that the proposed features provide better performance than the Fourier-transform-based features for short-length phonemes. From this result, we can verify the feasibility of new speech features extracted based on auditory model for phoneme recognition.

Performance Comparison of Korean Dialect Classification Models Based on Acoustic Features

  • Kim, Young Kook;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.37-43
    • /
    • 2021
  • Using the acoustic features of speech, important social and linguistic information about the speaker can be obtained, and one of the key features is the dialect. A speaker's use of a dialect is a major barrier to interaction with a computer. Dialects can be distinguished at various levels such as phonemes, syllables, words, phrases, and sentences, but it is difficult to distinguish dialects by identifying them one by one. Therefore, in this paper, we propose a lightweight Korean dialect classification model using only MFCC among the features of speech data. We study the optimal method to utilize MFCC features through Korean conversational voice data, and compare the classification performance of five Korean dialects in Gyeonggi/Seoul, Gangwon, Chungcheong, Jeolla, and Gyeongsang in eight machine learning and deep learning classification models. The performance of most classification models was improved by normalizing the MFCC, and the accuracy was improved by 1.07% and F1-score by 2.04% compared to the best performance of the classification model before normalizing the MFCC.