Search | Korea Science

Extension of K-L Dynamic Parameter for Connected Digit Recognition (숫자음 인식을 위한 K-L 동적 특징파라미터의 확장)

김주곤
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.257-261
- /
- 1998
일반적으로 인식률이 저조한 연속 숫자음의 인식 정도 향상을 위해서 K-L 동적특징의 확장에 대해서 검토한다. 이 검토결과를 4연속 숫자음을 대상으로 하는 인식 실험을 수행하여 숫자음 인식에 있어서 확장된 K-L 동적특징의 유효성을 확인하고자 한다. 이를 위하여 음성자료는 국어공학센터에서 채록한 4연속 숫자음을 사용하며, 확장한 K-L 동적특징의 유효성을 확인하기 위해서는 단일 특징 파라미터로서 멜-켑스트럼과 회귀계수, K-L 동적계수 등과 이들 특징 파라미터를 결합한 경우에 대해서 특징파라미터를 확장하여 K-L 동적 특징을 추출하고, 4연속 숫자음인식 실험을 수행하였다. 이때 인식의 기본 단위로는 48개의 유사음소단위를 음소모델로 사용하였으며, 인식실험에 있어서는 유한 상태 오토마타에 의한 구문제어를 통한 OPDP 법을 이용하였다. 인식 실험 결과, 단일 특징파라미터로서 멜-켑스트럼을 사용한 경우 67.5%, 이를 확장한 K-L 동적계수를 사용한 경우 78.2%를 보였다. 또한 결합한 특징파라미터에 있어서는 멜-켑스트럼과 희귀계수를 사용한 경우 78.4%의 인식률을 보였으며, 이를 K-L 동적계수로 확장한 경우 82.3%의 인식률을 얻어 확장한 K-L 동적특징파라미터의 유효성을 확인하였다.
PDF

A Study on Pitch Extraction Method using FIR-STREAK Digital Filter (FIR-STREAK 디지털 필터를 사용한 피치추출 방법에 관한 연구)

Lee, Si-U
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.1
- /
- pp.247-252
- /
- 1999
In order to realize a speech coding at low bit rates, a pitch information is useful parameter. In case of extracting an average pitch information form continuous speech, the several pitch errors appear in a frame which consonant and vowel are coexistent; in the boundary between adjoining frames and beginning or ending of a sentence. In this paper, I propose an Individual Pitch (IP) extraction method using residual signals of the FIR-STREAK digital filter in order to restrict the pitch extraction errors. This method is based on not averaging pitch intervals in order to accomodate the changes in each pitch interval. As a result, in case of Ip extraction method suing FIR-STREAK digital filter, I can't find the pitch errors in a frame which consonant and vowel are consistent; in the boundary between adjoining frames and beginning or ending of a sentence. This method has the capability of being applied to many fields, such as speech coding, speech analysis, speech synthesis and speech recognition.
PDF

Improved speech enhancement of multi-channel Wiener filter using adjustment of principal subspace vector (다채널 위너 필터의 주성분 부공간 벡터 보정을 통한 잡음 제거 성능 개선)

Kim, Gibak
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.5
- /
- pp.490-496
- /
- 2020
We present a method to improve the performance of the multi-channel Wiener filter in noisy environment. To build subspace-based multi-channel Wiener filter, in the case of single target source, the target speech component can be effectively estimated in the principal subspace of speech correlation matrix. The speech correlation matrix can be estimated by subtracting noise correlation matrix from signal correlation matrix based on the assumption that the cross-correlation between speech and interfering noise is negligible compared with speech correlation. However, this assumption is not valid in the presence of strong interfering noise and significant error can be induced in the principal subspace accordingly. In this paper, we propose to adjust the principal subspace vector using speech presence probability and the steering vector for the desired speech source. The multi-channel speech presence probability is derived in the principal subspace and applied to adjust the principal subspace vector. Simulation results show that the proposed method improves the performance of multi-channel Wiener filter in noisy environment.
https://doi.org/10.7776/ASK.2020.39.5.490 인용 PDF KSCI

An On-line Speech and Character Combined Recognition System for Multimodal Interfaces (멀티모달 인터페이스를 위한 음성 및 문자 공용 인식시스템의 구현)

석수영;김민정;김광수;정호열;정현열
- Journal of Korea Multimedia Society
- /
- v.6 no.2
- /
- pp.216-223
- /
- 2003
In this paper, we present SCCRS(Speech and Character Combined Recognition System) for speaker /writer independent. on-line multimodal interfaces. In general, it has been known that the CHMM(Continuous Hidden Markov Mode] ) is very useful method for speech recognition and on-line character recognition, respectively. In the proposed method, the same CHMM is applied to both speech and character recognition, so as to construct a combined system. For such a purpose, 115 CHMM having 3 states and 9 transitions are constructed using MLE(Maximum Likelihood Estimation) algorithm. Different features are extracted for speech and character recognition: MFCC(Mel Frequency Cepstrum Coefficient) Is used for speech in the preprocessing, while position parameter is utilized for cursive character At recognition step, the proposed SCCRS employs OPDP (One Pass Dynamic Programming), so as to be a practical combined recognition system. Experimental results show that the recognition rates for voice phoneme, voice word, cursive character grapheme, and cursive character word are 51.65%, 88.6%, 85.3%, and 85.6%, respectively, when not using any language models. It demonstrates the efficiency of the proposed system.
PDF

A Method of Recognizing and Validating Road Name Address from Speech-oriented Text (음성 기반 도로명 주소 인식 및 주소 검증 기법)

Lee, Keonsoo;Kim, Jung-Yeon;Kang, Byeong-Gwon
- Journal of Internet Computing and Services
- /
- v.22 no.1
- /
- pp.31-39
- /
- 2021
Obtaining delivery addresses from calls is one of the most important processes in TV home shopping business. By automating this process, the operational efficiency of TV home shopping can be increased. In this paper, a method of recognizing and validating road name address, which is the address system of South Korea, from speech oriented text is proposed. The speech oriented text has three challenges. The first is that the numbers are represented in the form of pronunciation. The second is that the recorded address has noises that are made from repeated pronunciation of the same address, or unordered address. The third is that the readability of the resulted address. For resolving these problems, the proposed method enhances the existing address databases provided by the Korea Post and Ministry of the Interior and Safety. Various types of pronouncing address are added, and heuristic rules for dividing ambiguous pronunciations are employed. And the processed address is validated by checking the existence in the official address database. Even though, this proposed method is for the STT result of the address pronunciation, this also can be used for any 3^rd party services that need to validate road name address. The proposed method works robustly on noises such as positions change or omission of elements.
https://doi.org/10.7472/jksii.2021.22.1.31 인용 PDF KSCI HTML

Traffic Management of Integrated Services using ATM Networks (ATM 망을 이용한 통합서비스의 트래픽 관리)

Kim, Hoon;Park, Jong-Dae;Nam, Sang-Shic;Park, Kwang-Chae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2001.10b
- /
- pp.1477-1480
- /
- 2001
기존 통신사업자가 급변하는 통신시장에 대응하기 위한 구체적 접근방법에 초점을 맞추어 통신기술의 변화와 이에 따른 기존망을 어떻게 개선하여야만 수익성에 차질을 빚지 않을 수 있느냐가 전재 조건이 된다. 먼저 통신기술의 변화에 따른 망의 진화방향을 음성의 패킷화 실현, 망 구조의 단순화 및 통합화를 통한 운용비용의 절감, 향후 신규서비스의 수용에 용이한 방향이 있어야 한다. 본 논문에서는 ATM을 중심으로 한 차세대 교환망에서 음성과 데이터가 동일 패킷망을 사용하므로서 망 대역폭을 효율적으로 활용하는 방법과 유효 대역 사용률을 향상하는 유연한 대역관리 방법에 대해 개괄적으로 논하였으며, 이를 바탕으로 대역폭 할당 프로토콜을 분석한 수 있는 모델을 제안하고, 주어진 음성 및 데이터 트래픽의 요구와 제약을 조건으로 시스템 파라미터를 최적화하기 위해 update interval 시간과 음성과 데이터 트래픽에 예약된 슬롯의 수를 사용하였다. 분석적인 모델은 성능에 관한 트래픽 유형들의 영향뿐만 아니라 혼합 트래픽 시스템의 동적 할당 방법과 대역관리 방법을 제공한다.
PDF

Connected Korean Digit Speech Recognition Using Vowel String and Number of Syllables (음절수와 모음 열을 이용한 한국어 연결 숫자 음성인식)

Youn, Jeh-Seon;Hong, Kwang-Seok
- The KIPS Transactions:PartA
- /
- v.10A no.1
- /
- pp.1-6
- /
- 2003
In this paper, we present a new Korean connected digit recognition based on vowel string and number of syllables. There are two steps to reduce digit candidates. The first one is to determine the number and interval of digit. Once the number and interval of digit are determined, the second is to recognize the vowel string in the digit string. The digit candidates according to vowel string are recognized based on CV (consonant vowel), VCCV and VC unit HMM. The proposed method can cope effectively with the coarticulation effects and recognize the connected digit speech very well.
https://doi.org/10.3745/KIPSTA.2003.10A.1.001 인용 PDF KSCI

The Recognition of Korean Single vowels by Use of the Diffusion Filter Bank as a Pre-processor (확산필터뱅크를 전처리기로 사용한 한국어 단모음인식)

Huh, Man-Tak;Kim, Jae-Chang
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.1
- /
- pp.81-87
- /
- 1997
In this paper, a new pre-processing method for the recognition of single vowels by use of spectrum envelope is presented. We use new extraction method of a spectrum envelope using the diffusion filter bank. By dividing analysis band of a diffusion filter bank into subbands, we decreased the number of diffusion process. And, by increasing the number of difference, we got higher selectivity. As a result of them, we reduced the total processing time, and got higher enhancement of discrimination. By getting 88.3% of average recognition rate for single vowels of natural voice through computer simulation. We confirmed it to be useful for speech recognition which use spectrum analysis of the voice signal to have many frequency components.
PDF

Korean isolated word recognizer using new time alignment method of speech signal (새로운 시간축 정규화 방법을 이용한 한국어 고립단어 인식기)

Nam, Myeong-U;Park, Gyu-Hong;No, Seung-Yong
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.38 no.5
- /
- pp.567-575
- /
- 2001
This paper suggests new method to get fixed size parameter from different length of voice signals. The efficiency of speech recognizer is determined by how to compare the similarity(distance of each pattern) of the parameter from voice signal. But the variation of voice signal and the difference of speech speed make it difficult to extract the fixed size parameter from the voice signal. The method suggested in this paper is to normalize the parameter at fixed size by using the 2 dimension DCT(Discrete Cosine Transform) after representing the parameter by spectrogram. To prove validity of the suggested method, parameter extracted from 32 auditory filter-bank(it estimates auditory nerve firing probabilities) is used for the input of neural network after being processed by 2 dimension DCT. And to compare with conventional methods, we used one of conventional methods which solve time alignment problem. The result shows more efficient performance and faster recognition speed in the speaker dependent and independent isolated word recognition than conventional method.
PDF

Difference State Number of CHMM Model to Improve the Performance of SCCRS (한국어 음성/문자 공용인식기의 성능향상을 위한 가변 상태수 CHMM모델의 구성)

Suk Soo-Young;Kim Min-Jung;Kim Kwang-Soo;Jung Ho-Youl;Chung Hyun-Yeol
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.95-98
- /
- 2002
문자인식 또는 음성인식을 위해 사용되어지는 CHMM(Continuous Hidden Markov Model)모델은 일반적으로 모델의 상태수를 일정한 수로 고정하는 고정 상태수 모델 구조를 가지고 있으나, 이는 개별적인 인식 단위의 특성을 고려하지 않은 경우로써 이를 고려한 가변 상태수 모델을 사용할 경우 인식률 향상을 기대할 수 있다. 개별적인 인식 단위에 적합한 모델 상태수를 결정하는 방법으로 파라미터 히스토그램 방법과, BIC(Bayesian Information Criterion)방법을 사용하는 것이 대표적이다. 이들 방법들은 개별적인 인식단위의 우도값만을 향상시키기 위한 방법으로 전체인식률과 직접적으로 비례하지는 않는다. 따라서, 본 논문에서는 고정 상태수를 갖는 모델 적용 방법과 인식단위별 상태수 변화에 따른 인식률을 비교하였으며, 이를 바탕으로 각 모델별 상태수를 달리하는 가변 상태수 CHMM모델 구성 방법을 제안한다. 제안된 가변상태수 모델의 유효성을 확인하기 위해 음성/문자 공용인식기 중 필기체 문자 인식에 적용한 결과 제안한 LM(Local Maximum)으로 구성된 가변 상태수 모델이 MLE와 BIC로 구성된 모델과 인식률 면에서는 거의 동일한 성능을 유지하면서 전체 상태수는 MLE 모델에 비해 $31\%$, BIC로 구성된 모델에 비해 $22\%$ 감소를 나타내어 제안한 모델의 유효성을 확인할 수 있었다.
PDF

Search Result 183, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)