Search | Korea Science

A VQ Codebook Design Based on Phonetic Distribution for Distributed Speech Recognition (분산 음성인식 시스템의 성능향상을 위한 음소 빈도 비율에 기반한 VQ 코드북 설계)

Oh Yoo-Rhee;Yoon Jae-Sam;Lee Gil-Ho;Kim Hong-Kook;Ryu Chang-Sun;Koo Myoung-Wa
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.37-40
- /
- 2006
In this paper, we propose a VQ codebook design of speech recognition feature parameters in order to improve the performance of a distributed speech recognition system. For the context-dependent HMMs, a VQ codebook should be correlated with phonetic distributions in the training data for HMMs. Thus, we focus on a selection method of training data based on phonetic distribution instead of using all the training data for an efficient VQ codebook design. From the speech recognition experiments using the Aurora 4 database, the distributed speech recognition system employing a VQ codebook designed by the proposed method reduced the word error rate (WER) by 10% when compared with that using a VQ codebook trained with the whole training data.
PDF

Phoneme distribution and phonological processes of orthographic and pronounced phrasal words in light of syllable structure in the Seoul Corpus (음절구조로 본 서울코퍼스의 글 어절과 말 어절의 음소분포와 음운변동)

Yang, Byunggon
- Phonetics and Speech Sciences
- /
- v.8 no.3
- /
- pp.1-9
- /
- 2016
This paper investigated the phoneme distribution and phonological processes of orthographic and pronounced phrasal words in light of syllable structure in the Seoul Corpus in order to provide linguists and phoneticians with a clearer understanding of the Korean language system. To achieve the goal, the phrasal words were extracted from the transcribed label scripts of the Seoul Corpus using Praat. Following this, the onsets, peaks, codas and syllable types of the phrasal words were analyzed using an R script. Results revealed that k0 was most frequently used as an onset in both orthographic and pronounced phrasal words. Also, aa was the most favored vowel in the Korean syllable peak with fewer phonological processes in its pronounced form. The total proportion of all diphthongs according to the frequency of the peaks in the orthographic phrasal words was 8.8%, which was almost double those found in the pronounced phrasal words. For the codas, nn accounted for 34.4% of the total pronounced phrasal words and was the varied form. From syllable type classification of the Corpus, CV appeared to be the most frequent type followed by CVC, V, and VC from the orthographic forms. Overall, the onsets were more prevalent in the pronunciation more than the codas. From the results, this paper concluded that an analysis of phoneme distribution and phonological processes in light of syllable structure can contribute greatly to the understanding of the phonology of spoken Korean.
https://doi.org/10.13064/KSSS.2016.8.3.001 인용 PDF KSCI

Phoneme Recognition based on Two-Layered Stereo Vision Neural Network (2층 구조의 입체 시각형 신경망 기반 음소인식)

Kim, Sung-Ill;Kim, Nag-Cheol
- Journal of Korea Multimedia Society
- /
- v.5 no.5
- /
- pp.523-529
- /
- 2002
The present study describes neural networks for stereoscopic vision, which are applied to identifying human speech. In speech recognition based on stereoscopic vision neural networks (SVNN), the similarities are first obtained by comparing input vocal signals with standard models. They are then given to a dynamic process in which both competitive and cooperative processes are conducted among neighboring similarities. Through the dynamic processes, only one winner neuron is finally detected. In a comparative study, the two-layered SVNN was 7.7% higher in recognition accuracies than the hidden Markov model (HMM). From the evaluation results, it was noticed that SVNN outperformed the existing HMM recognizer.
PDF

Automatic Speech Recognition Research at Fujitsu (후지쯔에 있어서의 음성 자동인식의 현상과 장래)

Nara, Yasuhiro;Kimura, Shinta;Loken-Kim, K.H.
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.1
- /
- pp.82-91
- /
- 1991
The history of automatic speech recognition research, and current and future speech products at Fujitsu are introduced here. The speech recognition research at Fujitsu started in 1970. Our research efforts have results in the production of a speaker dependent 12,000 word discrete / connected word recognizer(F2360), and a speaker independent 17 word discrete word recognizer(F2355L/S). Currently, we are working on a larger vocabulary speech recognizer, in which an input utterance will be matched with networks representing possible phonemic variations. Its application to text input is also discussed.
PDF

Efficient Vocabulary Optimization Management using VCOR (VCOR를 이용한 효율적인 어휘 최적화 관리)

Oh, Sang-Yeob
- Journal of Korea Multimedia Society
- /
- v.13 no.10
- /
- pp.1436-1443
- /
- 2010
In vocabulary recognition system has it's bad points of processing vocabulary unseen triphone and then no got distribution of confidence measure by cannot normalization. According to this problem to improve suggested VCOR(Version Control for Out-of Rejection) system by out-of vocabulary rejection algorithm use vocabulary management optimization and then phone data search support. In VCOR system to provide vocabulary information efficiently offering for user's vocabulary information using extend facet classification that improved for vocabulary measure management function offering accuracy of recognition for vocabulary. In this paper proposed system performance as a result of represent vocabulary dependence recognition rate of 97.56%, vocabulary independence recognition rate of 96.23%.
PDF KSCI

Aerodynamic Characteristics of Korean Bilabial Stop Consonant as a Function of Phonemic Position in a Syllable (음절내 음소 출현 위치에 따른 한국어 양순 파열음의 공기역학적인 특징)

Park, Sang-Hee;Jeong, Haeng-Im;Jeong, Ok-Ran;Seok, Dong-Il
- Speech Sciences
- /
- v.9 no.4
- /
- pp.59-75
- /
- 2002
Aerodynamic analysis study was performed on 14 normal subjects (2 males, 12 females) with nonsense syllables composed of Korean bilabial stops (/p, p', $p^{h}$) and their preceding and/or following vowels, /i, a, u/. That is, [pi, p'i, $p^{h}i$, pa, p'a, $p^{h}a$, pu, p'u, $p^{h}u$, ipi, apa, upu, $ip^{h}i$, $ap^{h}a$, $up^{h}u$, ip'i, ap'a, up'u]. All measures were taken and analysed using Aerophone II voice function analyzer and included peak air pressure, mean air pressure, maximum flow rate, volume, mean SPL and phonatory SPL. A t-test and one-way ANOVA were employed for analysis. A post-hoc analysis was performed with Scheffe and Bonferroni. The results were as follows: First, MSPL. and MAP of /p, p', $p^{h}$/ were significantly different in different positions (initial and medial position). In addition, different vowel environment also produced significantly different aerodynamic characteristics those consonants. Especially the lax consonant /p/ was significantly different /i, a, u/ vowel environments. The tense consonant /p'/ was significantly different only /i/ vowel environment.
PDF

Improving the Performance of the Continuous Speech Recognition by Estimating Likelihoods of the Phonetic Rules (음소변동규칙의 적합도 조정을 통한 연속음성인식 성능향상)

Na, Min-Soo;Chung, Min-Hwa
- Proceedings of the KSPS conference
- /
- 2006.11a
- /
- pp.80-83
- /
- 2006
The purpose of this paper is to build a pronunciation lexicon with estimated likelihoods of the phonetic rules based on the phonetic realizations and therefore to improve the performance of CSR using the dictionary. In the baseline system, the phonetic rules and their application probabilities are defined with the knowledge of Korean phonology and experimental tuning. The advantage of this approach is to implement the phonetic rules easily and to get stable results on general domains. However, a possible drawback of this method is that it is hard to reflect characteristics of the phonetic realizations on a specific domain. In order to make the system reflect phonetic realizations, the likelihood of phonetic rules is reestimated based on the statistics of the realized phonemes using a forced-alignment method. In our experiment, we generates new lexica which include pronunciation variants created by reestimated phonetic rules and its performance is tested with 12 Gaussian mixture HMMs and back-off bigrams. The proposed method reduced the WER by 0.42%.
PDF

A System of English Vowel Transcription Based on Acoustic Properties (영어 모음음소의 표기체계에 관한 연구)

김대원
- Proceedings of the KSLP Conference
- /
- 2003.11a
- /
- pp.170-173
- /
- 2003
There are more than five systems for transcribing English vowels. Because of this diversity, teachers of English and students are confronted with not a little problems with the English vowel symbols used in the English-Korean dictionaries, English text books, books for Phonetics and Phonology. This study was designed to suggest criterions for the phonemic transcription of English vowels on the basis of phonetic properties of the vowels and a system of English vowel transcription based on the criterions in order to minimize the problems with inter-system differences. A speaker (phonetician) of RP English uttered a series of isolated minimal pairs containing the vowels in question. The suggested vowel symbols are as follows: 1) Simple vowels : /i:/ in beat, /I/ bit, /$\varepsilon$/ bet,/${\ae}$/ bat, /a:/ father, /Dlla/ bod, /$\jmath$:/ bawd, /u/ put, /u:/ boot /$\Lambda$/ but, and /$\partial$/ about /$\Im$:ll$\Im$:r/ bird. 2) Diphthongs : /aI/ in bite, /au/ bout, /$\jmath$I/ boy, /$\Im$ullou/ boat, /er/ bait, /e$\partial$lle$\partial$r/ air, /u$\partial$llu$\partial$r/ poor, /i$\partial$lli$\partial$r/ beer. Where two symbols are shown corresponding to the vowel in a single word, the first is appropriate for most speakers of British English and the second for most speakers of American English.
PDF

A study on the Automatic Generation of the Freehand Style Fonts with parameters (매개변수를 가지는 한글 필기체 폰트의 자동 생성에 관한 연구)

Lee, D.R.;Lee, D.H.;Park, H.S.;Cho, H.G.
- Annual Conference on Human and Language Technology
- /
- 1992.10a
- /
- pp.581-590
- /
- 1992
고품위 입출력 장치의 급속한 발달과 전자 출판 시스템의 출현은 더 다양한 서체를 요구하게 되었다. 컴퓨터에서 사용되고 있는 한글 서체는 명조, 고딕체를 제외하고는 주로 장식용 폰트를 만들어 사용하고 있다. 본 논문에서는 Cubic B-Spline 곡선을 이용하여 다양한 필기체 폰트를 구현하였고, 필기체가 가지는 특성(날림의 정도, 글자의 기울이기 정도, 각 음소의 크기, 각 글자의 크기의 균일성 등)에 따라 인자를 부여하여 개별화되고 사실적인 폰트를 생성하였다. 각 인자의 조합은 암호화되어 각 개인의 폰트로 부여된다. 즉, 개인의 고유 폰트는 폰트 암호(password)로 부여되고, 제어인자는 폰트암호의 해쉬값에 의해 선택되며, 사용자들로부터 이 제어인자들을 숨기게 되면 각 사용자들의 폰트는 유일하고, 안전하게 되므로, 일정정도의 안정성이 보장된다고 보여진다. 그리고 본 연구에서 구현된 폰트는 한글 필기체 문자인식의 정도를 측정하는 다양한 데이타를 제공하는 데에도 의미가 있다.
PDF

A Study on the Development of Korea Telecom Automatic Voice Recognition System (음성인식에 의한 연구센타 부서안내 시스팀 개발에 관한 연구)

Koo, Myoung-Wan;Sohn, Il-Hyun;Doh, Sam-Joo;Lee, Jong-Rak
- Annual Conference on Human and Language Technology
- /
- 1992.10a
- /
- pp.185-192
- /
- 1992
이 논문에서는 음성인식기술을 이용한 연구센타 부서안내 시스팀(KARS:Korea Telecom Automatic voice Recognition system)에 대하여 기술하였다. 이 시스팀은 기본적으로 음성응답 시스팀과 유사하지만 명령입력을 위해 푸시버튼 대신 음성을 이용한다는 점이 다르다. 사용자가 마이크로폰을 통해 음성명령을 입력하면, 이 시스팀은 사용자의 음성명령을 인식하여 연구센타내 각 부서의 간략한 소개, 전화번호 및 위치를 안내해 준다. 이 시스팀은 HMM(Hidden Markov Model)을 이용하는 화자독립 격리단어 인식시스팀으로서 116개의 부서이름과 7개의 제어용 단어로 구성되어 있는 123개 단어를 인식할 수 있다. 이 시스팀은 음소와 유사한 한국어 서브워드(subword)를 HMM의 기본단위로 사용하며 인식 실험결과 98.6%의 인식율을 얻을 수 있었다.
PDF

Search Result 529, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)