Search | Korea Science

Auditory Neural Information Processing Modeling for Speech Recognition (음성인식을 위한 청각신경 정보처리 모델링)

Lee, Hee-Kyu;Lee, Kwang-Hyung
- The Journal of the Acoustical Society of Korea
- /
- v.9 no.3
- /
- pp.42-47
- /
- 1990
A neural auditory system is studied for the aim of making better speech recognition systems. The cochlear mechanics is described. A IIR digital filter modeling of basilar membrane is discussed for the speech recognition. A multi-layer model of consonant recognition using phoneme detection filters and discriminant functions for feature estimation is constructed. This model shows more then 90% recognition rate in consonants.
PDF

Objective parameter extraction in perceptual dysphonia assessment (청지각적 음성장애평가에서의 객관적인 파라미터 추출)

Jang, Seung-Jin;Choe, Ye-Rin;Kim, Eun-Yeon;Kim, Won-Sik
- Proceedings of the Korean Society for Emotion and Sensibility Conference
- /
- 2009.05a
- /
- pp.181-182
- /
- 2009
GRBAS(G : grade, R : rough, B : breathy, S : strained, A : asthenic) 음성장애평가는 성대의 이상 또는 말마비장애 등의 환자들을 평가하는 척도로 널리 사용된다. 하지만 사람에 의해 주관적인 평가로 이루어지는 방식의 문제점이 많이 제기되어, 자동화 알고리즘에 의한 객관적인 청지각적 음성장애 평가도구를 개발하려는 시도가 많이 연구되어왔다. 이러한 개발에 있어 보편적으로 선행되어야 하는 음소 분류 및 일치성 판단을 위한 객관적인 파라미터를 구하고자 함이 본 연구의 목적이다.
PDF

The Design and Implementation of Korean Text-to-Speech Conversion System on a Rule-Based Framework (한국어(韓國語) 규칙(規則) 음성(音聲) 합성(合成) 시스템의 구현(具現))

Son, Yung-Taek;Kim, Yong-Kap;Matsumoto, Tatsuro
- Annual Conference on Human and Language Technology
- /
- 1993.10a
- /
- pp.141-148
- /
- 1993
본고는, 한글 한자가 혼용된 입력 텍스트를 음성으로 변환 출력하는 포르만트 음성 합성 방식 즉, 한국어 규칙 음성 합성(이하에는 KTTS[Korean Text To Speech System]이라고 함)의 전반적인 처리 흐름에 대하여 소개한다. 특히, 입력 텍스트에 있어서, 한자 또는 각종 부호의 한글 변환 기능, 음성 출력용 문법 정보 추출에 필요한 입력문의 해석 및 구문경계 설정 기능, 또한 음소 기호 변환 및 파라메터 값 생성과 변경 처리기능을 중심으로 설명하고자 한다. 또한 본 시스템의 완성과 더불어 실시하였던 청취 실험 평가 결과에 대하여 덧붙이겠다.
PDF

A Study on Hangeul Mobile Handwriting Practice and Analyzing Application Development Based on Deep Learning (딥러닝 기반 한글 전자 필기 연습 및 분석 앱 개발에 대한 연구)

Ko, Ju-Eun;Oh, Jee-Eun;Min, Kyoung-Won
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.05a
- /
- pp.322-325
- /
- 2022
전 세계적으로 코로나바이러스가 유행함에 따라 비대면 활동을 비롯하여 전자 필기 이용 및 상품 소비가 증가하였다. 전자 필기에 대한 수요가 늘어남에 따라 전자 필기 글씨체 교정에 대한 관심 또한 증가하는 추세이다. 본 논문에서는 전자 필기 이미지에서 음절과 음소 영역을 추출하여 글씨를 분석하고, 이를 사용하여 사용자의 손글씨에서 개선점을 찾아낼 수 있는 딥러닝 알고리즘을 제안한다. 제안한 알고리즘을 통해 사용자가 원하는 전자 필기 글씨체를 효과적으로 습득할 수 있도록 사용자 글씨에 대해 구체적인 피드백을 제공하는 딥러닝 기반 태블릿 PC 용 한글 전자 필기 연습 및 분석 앱에 대한 연구를 소개하였다.
https://doi.org/10.3745/PKIPS.y2022m05a.322 인용 PDF

A study on extraction of the frames representing each phoneme in continuous speech (연속음에서의 각 음소의 대표구간 추출에 관한 연구)

박찬응;이쾌희
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.4
- /
- pp.174-182
- /
- 1996
In continuous speech recognition system, it is possible to implement the system which can handle unlimited number of words by using limited number of phonetic units such as phonemes. Dividing continuous speech into the string of tems of phonemes prior to recognition process can lower the complexity of the system. But because of the coarticulations between neiboring phonemes, it is very difficult ot extract exactly their boundaries. In this paper, we propose the algorithm ot extract short terms which can represent each phonemes instead of extracting their boundaries. The short terms of lower spectral change and higher spectral chang eare detcted. Then phoneme changes are detected using distance measure with this lower spectral change terms, and hgher spectral change terms are regarded as transition terms or short phoneme terms. Finally lower spectral change terms and the mid-term of higher spectral change terms are regarded s the represent each phonemes. The cepstral coefficients and weighted cepstral distance are used for speech feature and measuring the distance because of less computational complexity, and the speech data used in this experimetn was recoreded at silent and ordinary in-dorr environment. Through the experimental results, the proposed algorithm showed higher performance with less computational complexity comparing with the conventional segmetnation algorithms and it can be applied usefully in phoneme-based continuous speech recognition.
PDF

Acoustic Model Improvement and Performance Evaluation of the Variable Vocabulary Speech Recognition System (가변 어휘 음성 인식기의 음향모델 개선 및 성능분석)

이승훈;김회린
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.8
- /
- pp.3-8
- /
- 1999
Previous variable vocabulary speech recognition systems with context-independent acoustic modeling, could not represent the effect of neighboring phonemes. To solve this problem, we use allophone-based context-dependent acoustic model. This paper describes the method to improve acoustic model of the system effectively. Acoustic model is improved by using allophone clustering technique that uses entropy as a similarity measure and the optimal allophone model is generated by changing the number of allophones. We evaluate performance of the improved system by using Phonetically Optimized Words(POW) DB and PC commands(PC) DB. As a result, the allophone model composed of six hundreds allophones improved the recognition rate by 13% from the original context independent model m POW test DB.
PDF

Fast Speech Recognition System using Classification of Energy Labeling (에너지 라벨링 그룹화를 이용한 고속 음성인식시스템)

Han Su-Young;Kim Hong-Ryul;Lee Kee-Hee
- Journal of the Korea Society of Computer and Information
- /
- v.9 no.4 s.32
- /
- pp.77-83
- /
- 2004
In this paper, the Classification of Energy Labeling has been proposed. Energy parameters of input signal which are extracted from each phoneme are labelled. And groups of labelling according to detected energies of input signals are detected. Next. DTW processes in a selected group of labeling. This leads to DTW processing faster than a previous algorithm. In this Method, because an accurate detection of parameters is necessary on the assumption in steps of a detection of speeching duration and a detection of energy parameters, variable windows which are decided by pitch period are used. A pitch period is detected firstly : next window scale is decided between 200 frames and 300 frames. The proposed method makes it possible to cancel an influence of windows and reduces the computational complexity by $25\%$.
PDF

Morphological Parafoveal Preview Benefit Effects in Reading Korean (우리글 읽기에서 형태소정보의 미리보기 효과)

Lee, Sangeun;Choo, Hyeree;Koh, Sungryong
- Korean Journal of Cognitive Science
- /
- v.31 no.2
- /
- pp.25-54
- /
- 2020
While there is no evidence for parafoveal processing in alphabetic languages such as English and Finnish, there is some evidence that morphological information is processed in syllabic languages like Chinese. Korean writing system, Hangul, would be able to provide morphological preview benefit effects since it is an "alphabetic syllabary" which contains both alphabetic and syllabic features. This study explored morphological parafoveal preview benefit effects during reading Korean using irregular verbs, which have phonological and orthographical differences between fundamental and conjugated forms. In the Experiment, the target word was irregular conjugated form, and there were four preview conditions: identical (e.g. 구워), fundamental form (e.g. 굽다), orthographically related (e.g. 굼다), and unrelated control (e.g. 죨어). In the result of study, identical was shortest and morphological, orthographical, unrelated preview were followed. Moreover, measures of first-pass reading of morphological preview were significantly shorter than those of unrelated control preview. This results support the hypothesis of morphological preview benefit effects in Korean. The implications of the results are discussed.
https://doi.org/10.19066/cogsci.2020.31.2.1 인용 PDF KSCI

A Study on the Spoken Korean Citynames Using Multi-Layered Perceptron of Back-Propagation Algorithm (오차 역전파 알고리즘을 갖는 MLP를 이용한 한국 지명 인식에 대한 연구)

Song, Do-Sun;Lee, Jae-Gheon;Kim, Seok-Dong;Lee, Haing-Sei
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.6
- /
- pp.5-14
- /
- 1994
This paper is about an experiment of speaker-independent automatic Korean spoken words recognition using Multi-Layered Perceptron and Error Back-propagation algorithm. The object words are 50 citynames of D.D.D local numbers. 43 of those are 2 syllables and the rest 7 are 3 syllables. The words were not segmented into syllables or phonemes, and some feature components extracted from the words in equal gap were applied to the neural network. That led independent result on the speech duration, and the PARCOR coefficients calculated from the frames using linear predictive analysis were employed as feature components. This paper tried to find out the optimum conditions through 4 differerent experiments which are comparison between total and pre-classified training, dependency of recognition rate on the number of frames and PAROCR order, recognition change due to the number of neurons in the hidden layer, and the comparison of the output pattern composition method of output neurons. As a result, the recognition rate of $89.6\%$ is obtaimed through the research.
PDF

Improvement of the Linear Predictive Coding with Windowed Autocorrelation (윈도우가 적용된 자기상관에 의한 선형예측부호의 개선)

Lee, Chang-Young;Lee, Chai-Bong
- The Journal of the Korea institute of electronic communication sciences
- /
- v.6 no.2
- /
- pp.186-192
- /
- 2011
In this paper, we propose a new procedure for improvement of the linear predictive coding. To reduce the error power incurred by the coding, we interchanged the order of the two procedures of windowing on the signal and linear prediction. This scheme corresponds to LPC extraction with windowed autocorrelation. The proposed method requires more calculational time because it necessitates matrix inversion on more parameters than the conventional technique where an efficient Levinson-Durbin recursive procedure is applicable with smaller parameters. Experimental test over various speech phonemes showed, however, that our procedure yields about 5 % less power distortion compared to the conventional technique. Consequently, the proposed method in this paper is thought to be preferable to the conventional technique as far as the fidelity is concerned. In a separate study of speaker-dependent speech recognition test for 50 isolated words pronounced by 40 people, our approach yielded better performance too.
https://doi.org/10.13067/JKIECS.2011.6.2.186 인용 PDF KSCI

Search Result 86, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)