Search | Korea Science

Performance Improvement of Continuous Digits Speech Recognition using the Transformed Successive State Splitting and Demi-syllable pair (반음절쌍과 변형된 연쇄 상태 분할을 이용한 연속 숫자음 인식의 성능 향상)

Kim Dong-Ok;Park No-Jin
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.8
- /
- pp.1625-1631
- /
- 2005
This paper describes an optimization of a language model and an acoustic model that improve the ability of speech recognition with Korean nit digit. Recognition errors of the language model are decreasing by analysis of the grammatical feature of korean unit digits, and then is made up of fsn-node with a disyllable. Acoustic model make use of demi-syllable pair to decrease recognition errors by inaccuracy division of a phone, a syllable because of a monosyllable, a short pronunciation and an articulation. we have used the k-means clustering algorithm with the transformed successive state splining in feature level for the efficient modelling of the feature of recognition unit . As a result of experimentations, $10.5\%$ recognition rate is raised in the case of the proposed language model. The demi-syllable pair with an acoustic model increased $12.5\%$ recognition rate and $1.5\%$ recognition rate is improved in transformed successive state splitting.
PDF KSCI

LFMMI-based acoustic modeling by using external knowledge (External knowledge를 사용한 LFMMI 기반 음향 모델링)

Park, Hosung;Kang, Yoseb;Lim, Minkyu;Lee, Donghyun;Oh, Junseok;Kim, Ji-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.5
- /
- pp.607-613
- /
- 2019
This paper proposes LF-MMI (Lattice Free Maximum Mutual Information)-based acoustic modeling using external knowledge for speech recognition. Note that an external knowledge refers to text data other than training data used in acoustic model. LF-MMI, objective function for optimization of training DNN (Deep Neural Network), has high performances in discriminative training. In LF-MMI, a phoneme probability as prior probability is used for predicting posterior probability of the DNN-based acoustic model. We propose using external knowledges for training the prior probability model to improve acoustic model based on DNN. It is measured to relative improvement 14 % as compared with the conventional LF-MMI-based model.
https://doi.org/10.7776/ASK.2019.38.5.607 인용 PDF KSCI

A COMPARATIVE STUDY ON AUDITORY ATTENTION AND PHONEME DIFFERENTIAL ABILITY AMONG CHILDREN WITH READING DISABILITY AND WITH ATTENTION DEFICIT/HYPERACTIVITY (읽기 장애와 주의력 결핍/과잉 운동 장애아동의 주의력 과제와 음소 변별 과제 수행 비교 - 청각 과제를 중심으로 -)

Lee, Kyung-Hee;Shin, Min-Sup;Kim, Boong-Nyun;Cho, Soo-Churl
- Journal of the Korean Academy of Child and Adolescent Psychiatry
- /
- v.14 no.2
- /
- pp.197-208
- /
- 2003
Objective：In this study, we hypothesized that deficit in processing rapid linguistic stimuli is at the heart of Reading Disability(RD) and deficit in response inhibition is at the heart of Attention Deficit/Hyperactivity(ADHD). We conducted experiments to identify the core cognitive characteristics of children either with RD or with ADHD or with both, using attentional tasks and phoneme differential tests. Method：In the study 1, 28 children with ADHD, 16 children with RD+ADHD were individually administered visual/auditory performance tests. Then, the differences of performance on attentional tasks between two groups were compared while IQs of two groups were controlled. In the study 2, 13 children with RD+ADHD/RD, 13 children with ADHD, and 13 normal children were administered computerized phoneme differential tests. Result：Visual attentional tasks did not distinguish an ADHD group from a RD+ADHD group. With auditory attentional tasks, however, the comorbid group showed significantly more difficulties, causing a large variance in reaction time. RD, RD+ADHD, and ADHD groups showed more errors in phoneme differential tests than a normal control group, and each group showed distinctive performance patterns. Discussion：An ADHD group had difficulty in response inhibition and sustained attention, and children who also had RD along with ADHD magnified the auditory attentional difficulties. Even though children with RD had more trouble with responding correctly to target stimuli, their responses were not significantly different from those of children with ADHD.
PDF

A Study on the Rejection Capability Based on Anti-phone Modeling (반음소 모델링을 이용한 거절기능에 대한 연구)

김우성;구명완
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.3
- /
- pp.3-9
- /
- 1999
This paper presents the study on the rejection capability based on anti-phone modeling for vocabulary independent speech recognition system. The rejection system detects and rejects out-of-vocabulary words which were not included in candidate words which are defined while the speech recognizer is made. The rejection system can be classified into two categories by their implementation methods, keyword spotting method and utterance verification method. The keyword spotting method uses an extra filler model as a candidate word as well as keyword models. The utterance verification method uses the anti-models for each phoneme for the calculation of confidence score after it has constructed the anti-models for all phonemes. We implemented an utterance verification algorithm which can be used for vocabulary independent speech recognizer. We also compared three kinds of means for the calculation of confidence score, and found out that the geometric mean had shown the best result. For the normalization of confidence score, usually Sigmoid function is used. On using it, we compared the effect of the weight constant for Sigmoid function and determined the optimal value. And we compared the effects of the size of cohort set, the results showed that the larger set gave the better results. And finally we found out optimal confidence score threshold value. In case of using the threshold value, the overall recognition rate including rejection errors was about 76%. This results are going to be adapted for stock information system based on speech recognizer which is currently provided as an experimental service by Korea Telecom.
PDF

삼음삼양(三陰三陽)의 시간배속(時間配屬)에 대한 연구 -관우삼음삼양시간배속연구(關于三陰三陽時間配屬硏究)

Lee, Yong-Beom
- Journal of Korean Medical classics
- /
- v.19 no.2 s.33
- /
- pp.46-52
- /
- 2006
${\ulcorner}$황제내경(黃帝內經)${\lrcorner}$ 내함한의학이론(內含韓醫學理論) 기중삼응삼양적내용비상난(其中三陰三陽的內容非常難) 기원인사(其原因是)${\ulcorner}$황제내경(黃帝內經)${\lrcorner}$ 중삼음삼양적내용비상복집(中三陰三陽的內容非常複雜), 특별시삼음삼양적시간배속매편화일양(特別是三陰三陽的時間配屬每篇不日樣) 본론문통과관우(本論文通過關于子) ${\ulcorner}$황제내경(黃帝內經)${\lrcorner}$ 중삼음삼양적시간배속진행연구(中三陰三陽的時間配屬進行硏究), 득출여하결론(得出如下結論), 운기지주기순서춰시오행(목화토금수)순서(運氣之主氣順序就是五行(木火土金水)順序), 저설명주기변화주요근거형지변화(這說明主氣變化主要根據形之變化), 운기지객기순서취시기지대(運氣之客氣順序就是氣之大小)(일양(一陽)${\longrightarrow}$이양(二陽)${\longrightarrow}$삼양(三陽)${\longrightarrow}$일음(一陰)${\longrightarrow}$이음(二陰)${\longrightarrow}$삼음(三陰))${\longrightarrow}$, 저설명객기변화주요근거기지변화(這說明客氣變化主要根據氣之變化) ${\ulcorner}$소문맥해(素問脈解)${\lrcorner}$ 적삼음삼양월별배속시음관(11월:태음)(的三陰三陽月別配屬是陰關(11月:太陰)${\longrightarrow}$양관(1월:태양)(陽關(1月:太陽)${\longrightarrow}$음합(3월:궐음)(陰闔(3月:厥廠)${\longrightarrow}$양합(5월:양명)(陽闔(5월:陽明))${\longrightarrow}$음추(7월:소음)(陰樞(7月:少陰)${\longrightarrow}$양추(9월:소양)(陽樞(9月少陽). 태양소음시음양세력적기초(太陽少陰是陰陽勢力的基礎), 소이상위표리(所以相爲表裏). 소양소음시음양세력적중간(少陽少陰是陰陽勢力的中間), 소이사위표리(所以杞爲表裏) 태음궐음시음양세력적전성기(太陰歐陰是陰陽勢力的全盛期), 소이상위표리(所以相爲表裏) 차료리적의사반영경락류주(此表養的意思反映經絡流注). 태음양명경류주신체전면(太陰陽明經流注身體前面), 태양소음경류주신체후면( 太陽少陰經流注身體後面 ), 소양궐음경류주신체측면(少陽厥陰經流注身體側面). ${\ulcorner}$령추경맥(靈樞經脈)${\lrcorner}$적12경맥류주순서시태음양면경(的12經脈流注順序是太陰陽明經)${\longrightarrow}$소음태양경(少陰太陽經)${\longrightarrow}$궐음소양경적표리경락위주(厥陰少陽經的表裏經絡爲主). 수선매기류주태음양명경적이유(首先脈氣流主太陰陽明經的理由), 취시후천수곡정미화호흡대기결합이형성경맥지기(就是後天水穀精微和呼吸大氣結合而形成經服之氣). 지후맥기류주소음태양경(之後脈氣流注少陰太陽經), 설명순행태음양명경적후천영양물질전입한열경락이위신(說明盾行太陰陽陽明經的後天營養物質轉入寒熱經絡而爲寫薪), 기차맥기순해궐음소양경적(其次脈氣盾行厥陰少陽經的), 인위한열경락지화기피궐음소양경지풍기조절이후조절조습경락(因爲寒熱經絡之火氣被厥陰少陽經之風氣調節而後調節操濕經絡), 비여작반시(比如作飯時), 화기취시소음태양경(火氣就是少陰太陽經), 풍기취시궐음소양경(風氣就是厥陰少陽經), 정내지반취시태음양명경.(鼎內之飯就是太陰陽明經). ${\ulcorner}$령추음양계일월(靈樞陰陽緊日月)${\lrcorner}$ 적삼음삼양월별배속주요시추관합합관추순서(的三陰三陽月別配屬主要是樞關闔闔關樞順序). 저양배서적원인시개문화관문과정(這樣排序的原因是開門和關門過程). 재춘하(在春夏), 양경락지맥기여개문관문과정(陽經絡之脈氣如開門關門過程), 수착추관합합관추순서순행(隨看樞關闔闔關樞順序盾行), 재추동(在秋冬), 음경락지맥기여개문폐문과정(陰經絡之脈氣如開門閉門過程), 수착추관합합관추순행(隨看樞關闔闔關樞盾行).
PDF

Creation and labeling of multiple phonotopic maps using a hierarchical self-organizing classifier (계층적 자기조직화 분류기를 이용한 다수 음성자판의 생성과 레이블링)

Chung, Dam;Lee, Kee-Cheol;Byun, Young-Tai
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.3
- /
- pp.600-611
- /
- 1996
Recently, neural network-based speech recognition has been studied to utilize the adaptivity and learnability of neural network models. However, conventional neural network models have difficulty in the co-articulation processing and the boundary detection of similar phonmes of the Korean speech. Also, in case of using one phonotopic map, learning speed may dramatically increase and inaccuracies may be caused because homogeneous learning and recognition method should be applied for heterogenous data. Hence, in this paper, a neural net typewriter has been designed using a hierarchical self-organizing classifier(HSOC), and related algorithms are presented. This HSOC, during its learing stage, distributed phoneme data on hierarchically structured multiple phonotopic maps, using Kohonen's self-organizing feature maps(SOFM). Presented and experimented in this paper were the algorithms for deciding the number of maps, map sizes, the selection of phonemes and their placement per map, an approapriate learning and preprocessing method per map. If maps are divided according to a priorlinguistic knowledge, we would have difficulty in acquiring linguistic knowledge and how to alpply it(e.g., processing extended phonemes). Contrarily, our HSOC has an advantage that multiple phonotopic maps suitable for given input data are self-organizable. The resulting three korean phonotopic maps are optimally labelled and have their own optimal preprocessing schemes, and also confirm to the conventional linguistic knowledge.
PDF

Statistical Analysis of Korean Phonological Variations Using a Grapheme-to-phoneme System (발음열 자동 생성기를 이용한 한국어 음운 변화 현상의 통계적 분석)

이경님;정민화
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.7
- /
- pp.656-664
- /
- 2002
We present a statistical analysis of Korean phonological variations using a Grapheme-to-Phoneme (GPT) system. The GTP system used for experiments generates pronunciation variants by applying rules modeling obligatory and optional phonemic changes and allophonic changes. These rules are derived form morphophonological analysis and government standard pronunciation rules. The GTP system is optimized for continuous speech recognition by generating phonetic transcriptions for training and constructing a pronunciation dictionary for recognition. In this paper, we describe Korean phonological variations by analyzing the statistics of phonemic change rule applications for the 60,000 sentences in the Samsung PBS Speech DB. Our results show that the most frequently happening obligatory phonemic variations are in the order of liaison, tensification, aspirationalization, and nasalization of obstruent, and that the most frequently happening optional phonemic variations are in the order of initial consonant h-deletion, insertion of final consonant with the same place of articulation as the next consonants, and deletion of final consonant with the same place of articulation as the next consonant's, These statistics can be used for improving the performance of speech recognition systems.
PDF KSCI

Performance Comparison of Out-Of-Vocabulary Word Rejection Algorithms in Variable Vocabulary Word Recognition (가변어휘 단어 인식에서의 미등록어 거절 알고리즘 성능 비교)

김기태;문광식;김회린;이영직;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.2
- /
- pp.27-34
- /
- 2001
Utterance verification is used in variable vocabulary word recognition to reject the word that does not belong to in-vocabulary word or does not belong to correctly recognized word. Utterance verification is an important technology to design a user-friendly speech recognition system. We propose a new utterance verification algorithm for no-training utterance verification system based on the minimum verification error. First, using PBW (Phonetically Balanced Words) DB (445 words), we create no-training anti-phoneme models which include many PLUs(Phoneme Like Units), so anti-phoneme models have the minimum verification error. Then, for OOV (Out-Of-Vocabulary) rejection, the phoneme-based confidence measure which uses the likelihood between phoneme model (null hypothesis) and anti-phoneme model (alternative hypothesis) is normalized by null hypothesis, so the phoneme-based confidence measure tends to be more robust to OOV rejection. And, the word-based confidence measure which uses the phoneme-based confidence measure has been shown to provide improved detection of near-misses in speech recognition as well as better discrimination between in-vocabularys and OOVs. Using our proposed anti-model and confidence measure, we achieve significant performance improvement; CA (Correctly Accept for In-Vocabulary) is about 89％, and CR (Correctly Reject for OOV) is about 90％, improving about 15-21％ in ERR (Error Reduction Rate).
PDF

Improvement of Keyword Spotting Performance Using Normalized Confidence Measure (정규화 신뢰도를 이용한 핵심어 검출 성능향상)

Kim, Cheol;Lee, Kyoung-Rok;Kim, Jin-Young;Choi, Seung-Ho;Choi, Seung-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.4
- /
- pp.380-386
- /
- 2002
Conventional post-processing as like confidence measure (CM) proposed by Rahim calculates phones' CM using the likelihood between phoneme model and anti-model, and then word's CM is obtained by averaging phone-level CMs[1]. In conventional method, CMs of some specific keywords are tory low and they are usually rejected. The reason is that statistics of phone-level CMs are not consistent. In other words, phone-level CMs have different probability density functions (pdf) for each phone, especially sri-phone. To overcome this problem, in this paper, we propose normalized confidence measure. Our approach is to transform CM pdf of each tri-phone to the same pdf under the assumption that CM pdfs are Gaussian. For evaluating our method we use common keyword spotting system. In that system context-dependent HMM models are used for modeling keyword utterance and contort-independent HMM models are applied to non-keyword utterance. The experiment results show that the proposed NCM reduced FAR (false alarm rate) from 0.44 to 0.33 FA/KW/HR (false alarm/keyword/hour) when MDR is about 8%. It achieves 25% improvement of FAR.
PDF KSCI

A Pre-Selection of Candidate Units Using Accentual Characteristic In a Unit Selection Based Japanese TTS System (일본어 악센트 특징을 이용한 합성단위 선택 기반 일본어 TTS의 후보 합성단위의 사전선택 방법)

Na, Deok-Su;Min, So-Yeon;Lee, Kwang-Hyoung;Lee, Jong-Seok;Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.4
- /
- pp.159-165
- /
- 2007
In this paper, we propose a new pre-selection of candidate units that is suitable for the unit selection based Japanese TTS system. General pre-selection method performed by calculating a context-dependent cost within IP (Intonation Phrase). Different from other languages, however. Japanese has an accent represented as the height of a relative pitch, and several words form a single accentual phrase. Also. the prosody in Japanese changes in accentual phrase units. By reflecting such prosodic change in pre-selection. the qualify of synthesized speech can be improved. Furthermore, by calculating a context-dependent cost within accentual phrase, synthesis speed can be improved than calculating within intonation phrase. The proposed method defines AP. analyzes AP in context and performs pre-selection using accentual phrase matching which calculates CCL (connected context length) of the Phoneme's candidates that should be synthesized in each accentual phrase. The baseline system used in the proposed method is VoiceText, which is a synthesizer of Voiceware. Evaluations were made on perceptual error (intonation error, concatenation mismatch error) and synthesis time. Experimental result showed that the proposed method improved the qualify of synthesized speech. as well as shortened the synthesis time.
https://doi.org/10.7776/ASK.2007.26.4.159 인용 PDF KSCI

Search Result 529, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)