Search | Korea Science

Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using Simulated Speech Model (모의 음성 모델을 이용한 효과적인 구개인두부전증 환자 음성 인식)

Sung, Mee Young;Kwon, Tack-Kyun;Sung, Myung-Whun;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.5
- /
- pp.1243-1250
- /
- 2015
This paper presents an effective recognition method of VPI patient's speech for a VPI speech reconstruction system. Speaker adaptation technique is employed to improve VPI speech recognition. This paper proposes to use simulated speech for generating an initial model for speaker adaptation, in order to effectively utilize the small size of VPI speech for model adaptation. We obtain 83.60% in average word accuracy by applying MLLR for speaker adaptation. The proposed speaker adaptation method using simulated speech model brings 6.38% improvement in average accuracy. The experimental results demonstrate that the proposed speaker adaptation method is highly effective for developing recognition system of VPI speech which is not suitable for constructing large-size speech database.
https://doi.org/10.6109/jkiice.2015.19.5.1243 인용 PDF KSCI KPUBS HTML

Ultrastructure of the Digestive Diverticulum of Saxidomus purpuratus (Bivalvia: Veneridae) (개조개, Saxidomus purpuratus 소화맹낭의 미세구조)

Ju, Sun-Mi;Lee, Jung-Sick
- The Korean Journal of Malacology
- /
- v.27 no.3
- /
- pp.159-165
- /
- 2011
The anatomy and ultrastructure of the digestive diverticulum of Saxidomus purpuratus were described using light and electron microscopy. The digestive diverticulum of dark green color was situated on the gonad and connected to stomach by a primary duct. Digestive diverticulum is composed of numerous digestive tubules. The epithelial layer of digestive tubule, which is simple, is composed of basophilic cells and digestive cells. Basophilic cells are columnar in shape, and the electron density is higher than that of the digestive cell. The cytoplasm has a well-developed endoplasmic reticulum, tubular mitochondria, Golgi complex and membrane-bounded granules of high electron density. Digestive cells are columnar in shape, with development of microvilli on the free surface. Pinocytic vasicles, lysosomes and numerous mitochondria were observed in the apical cytoplasm of digestive cells. The results of this study suggest that basophilic cells and digestive cells in the digestive tubule are specialized in the extracellular and intracellular digestions, respectively.
https://doi.org/10.9710/kjm.2011.27.3.159 인용 PDF KSCI

Improvement of the Linear Predictive Coding with Windowed Autocorrelation (윈도우가 적용된 자기상관에 의한 선형예측부호의 개선)

Lee, Chang-Young;Lee, Chai-Bong
- The Journal of the Korea institute of electronic communication sciences
- /
- v.6 no.2
- /
- pp.186-192
- /
- 2011
In this paper, we propose a new procedure for improvement of the linear predictive coding. To reduce the error power incurred by the coding, we interchanged the order of the two procedures of windowing on the signal and linear prediction. This scheme corresponds to LPC extraction with windowed autocorrelation. The proposed method requires more calculational time because it necessitates matrix inversion on more parameters than the conventional technique where an efficient Levinson-Durbin recursive procedure is applicable with smaller parameters. Experimental test over various speech phonemes showed, however, that our procedure yields about 5 % less power distortion compared to the conventional technique. Consequently, the proposed method in this paper is thought to be preferable to the conventional technique as far as the fidelity is concerned. In a separate study of speaker-dependent speech recognition test for 50 isolated words pronounced by 40 people, our approach yielded better performance too.
https://doi.org/10.13067/JKIECS.2011.6.2.186 인용 PDF KSCI

A Study on the Improvement of Automatic Text Recognition of Road Signs Using Location-based Similarity Verification (위치기반 유사도 검증을 이용한 도로표지 안내지명 자동인식 개선방안 연구)

Chong, Kyusoo
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.18 no.6
- /
- pp.241-250
- /
- 2019
Road signs are guide facilities for road users, and the Ministry of Land, Infrastructure and Transport has established and operated a system to enhance the convenience of managing these road signs. The role of road signs will decrease in the future autonomous driving, but they will continue to be needed. For the accurate mechanical recognition of texts on road signs, automatic road sign recognition equipment has been developed and it has applied image-based text recognition technology. Yet there are many cases of misrecognition due to irregular specifications and external environmental factors such as manual manufacturing, illumination, light reflection, and rainfall. The purpose of this study is to derive location-based destination names for finding misrecognition errors that cannot be overcome by image analysis, and to improve the automatic recognition of road signs destination names by using Levenshtein similarity verification method based on phoneme separation.
https://doi.org/10.12815/kits.2019.18.6.241 인용 PDF KSCI

A Study of Morphophonemic Processes of Korean using Neural Networks (인공신경망을 이용한 한국어 형태음운현상 연구)

Lee, Chan-Do
- The Transactions of the Korea Information Processing Society
- /
- v.2 no.2
- /
- pp.215-228
- /
- 1995
Despite their importance in language, there have been relatively few computational studies in understanding words. This paper describes how neural networks can learn to perceive and produce words. Most traditional linguistic theories presuppose abstract underlying representations (UR) and a set of explicit rules to obtain the surface realization. There are, however, a number of questions that can be raised regarding this approach: (1) assumption of URs, (2) formation of rules, and (3) interaction of rules. In this paper, it is hypothesized that rules would emerge as the generalizations the network abstracts in the process of learning to associate forms with meanings of the words. Employing a simple recurrent network, a series of simulations on different types of morphophonemic processes was run. The results of the simulations show that this network is capable of learning to perceive whether words are in basic from or in inflected form, given only forms, and to produce words in the right form, given arbitrary meanings, this eliminating the need for presupposing abstract URs and rules.
PDF

The suppression of noise-induced speech distortions for speech recognition (음성인식을 위한 잡음하의 음성왜곡제거)

Chi, Sang-Mun;Oh, Yung-Hwan
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.12
- /
- pp.93-102
- /
- 1998
In noisy environments, human speech productions are influenced by noises(Lombard effect), and speech signals are contaminated. These distortions dramatically reduce the performance of speech recognition systems. This paper proposes a method of the Lombard effect compensation and noise suppression in order to improve speech recognition performance in noise environments. To estimate the intensity of the Lombard effect which is a nonlinear distortion depending on the ambient noise levels, speakers, and phonetic units, we formulate the measure of the Lombard effect level based on the acoustic speech signal, and the measure is used to compensate the Lombard effect. The distortions of speech under noisy environments are cancelled out as follows. First, spectral subtraction and band-pass filtering are used to cancel out noise. Second, energy nomalization is proposed to cancel out the variation of vocal intensity by the Lombard effect. Finally, the Lombard effect level controls the transform which converts Lombard speech cepstrum to clean speech cepstrum. The proposed method was validated on 50 korean word recognition. Average recognition rates were 82.6%, 95.7%, 97.6% with the proposed method, while 46.3%, 75.5%, 87.4% without any compensation at SNR 0, 10, 20 dB, respectively.
PDF

The syllable recovrey rule-based system and the application of a morphological analysis method for the post-processing of a continuous speech recognition (연속음성인식 후처리를 위한 음절 복원 rule-based 시스템과 형태소분석기법의 적용)

박미성;김미진;김계성;최재혁;이상조
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.36C no.3
- /
- pp.47-56
- /
- 1999
Various phonological alteration occurs when we pronounce continuously in korean. This phonological alteration is one of the major reasons which make the speech recognition of korean difficult. This paper presents a rule-based system which converts a speech recognition character string to a text-based character string. The recovery results are morphologically analyzed and only a correct text string is generated. Recovery is executed according to four kinds of rules, i.e., a syllable boundary final-consonant initial-consonant recovery rule, a vowel-process recovery rule, a last syllable final-consonant recovery rule and a monosyllable process rule. We use a x-clustering information for an efficient recovery and use a postfix-syllable frequency information for restricting recovery candidates to enter morphological analyzer. Because this system is a rule-based system, it doesn't necessitate a large pronouncing dictionary or a phoneme dictionary and the advantage of this system is that we can use the being text based morphological analyzer.
PDF

Classification of Consonants by SOM and LVQ (SOM과 LVQ에 의한 자음의 분류)

Lee, Chai-Bong;Lee, Chang-Young
- The Journal of the Korea institute of electronic communication sciences
- /
- v.6 no.1
- /
- pp.34-42
- /
- 2011
In an effort to the practical realization of phonetic typewriter, we concentrate on the classification of consonants in this paper. Since many of consonants do not show periodic behavior in time domain and thus the validity for Fourier analysis of them are not convincing, vector quantization (VQ) via LBG clustering is first performed to check if the feature vectors of MFCC and LPCC are ever meaningful for consonants. Experimental results of VQ showed that it's not easy to draw a clear-cut conclusion as to the validity of Fourier analysis for consonants. For classification purpose, two kinds of neural networks are employed in our study: self organizing map (SOM) and learning vector quantization (LVQ). Results from SOM revealed that some pairs of phonemes are not resolved. Though LVQ is free from this difficulty inherently, the classification accuracy was found to be low. This suggests that, as long as consonant classification by LVQ is concerned, other types of feature vectors than MFCC should be deployed in parallel. However, the combination of MFCC/LVQ was not found to be inferior to the classification of phonemes by language-moded based approach. In all of our work, LPCC worked worse than MFCC.
https://doi.org/10.13067/JKIECS.2011.6.1.034 인용 PDF KSCI

Evaluation of Word Recognition System For Mobile Telephone (이동전화를 위한 단어 인식기의 성능평가)

Kim Min-Jung;Hwang Cheol-Jun;Chung Ho-Youl;Chung Hyun-Yeol
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.92-95
- /
- 1999
본 논문에서는 음성에 의해 구동되는 이동천화를 구현하기 위한 기초 실험으로서, 이동전화상에서 많이 사용되는 단어 데이터를 직접 채록하여 단어 인식 실험을 수행하여 인식기의 성능을 평가하였다. 인식 실험에 사용된 단어 데이터베이스는 서울 화자 360명(남성화자 180명, 여성화자 180명), 41상도 화자 240명(남성화자 120명, 여성화자 120명)으로 구성된 600명의 발성을 이용하여 구성하였다. 발성 단어는 이동전화에 주로 사용되는 중요 기능과 제어 단어, 그리고 숫자음을 포함한 55개 단어로 구성되었으며, 각 화자가 3회씩 발성하였다. 데이터의 채집환경은 잡음이 다소 있는 사무실환경이며, 샘플링율은 8kHz였다. 인식의 기본단위는 48개의 유사음소단위(Phoneme Like Unit : PLU)를 사용하였으며, 정적 특징으로 멜켑스트럼과 동적 특징으로 회귀계수를 특징 파라미터로 사용하였다. 인식실험에서는 OPDP(One Pass Dynamic Programming)알고리즘을 사용하였다. 인식실험을 위한 모델은 각 지역에 따라 학습을 수행한 모델과, 지역에 상관없이 학습한 모델을 만들었으며, 기존의 16Htz의 초기 모델에 8kHz로 채집된 데이터를 적응화시키는 방법을 이용하여 학습을 수행하였다. 인식실험에 있어서는 각 지역별 모델과 지역에 관계없이 학습한 모델에 대하여, 각 지역별로, 그리고 지역에 관계없이 평가용 데이터로 인식실험을 수행하였다 인식실험 결과, $90\%$이상의 비교적 높은 인식률을 얻어 인식시스템 성능의 유효성을 확인할 수 있었다.
PDF

Fast Speech Recognition System using Classification of Energy Labeling (에너지 라벨링 그룹화를 이용한 고속 음성인식시스템)

Han Su-Young;Kim Hong-Ryul;Lee Kee-Hee
- Journal of the Korea Society of Computer and Information
- /
- v.9 no.4 s.32
- /
- pp.77-83
- /
- 2004
In this paper, the Classification of Energy Labeling has been proposed. Energy parameters of input signal which are extracted from each phoneme are labelled. And groups of labelling according to detected energies of input signals are detected. Next. DTW processes in a selected group of labeling. This leads to DTW processing faster than a previous algorithm. In this Method, because an accurate detection of parameters is necessary on the assumption in steps of a detection of speeching duration and a detection of energy parameters, variable windows which are decided by pitch period are used. A pitch period is detected firstly : next window scale is decided between 200 frames and 300 frames. The proposed method makes it possible to cancel an influence of windows and reduces the computational complexity by $25\%$.
PDF

Search Result 529, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)