Search | Korea Science

Performance Evaluation of Speech Recognition Using the Reconstructed Feature Parameter with Voiced-Unvoiced Measure (유ㆍ무성음 척도를 포함한 재구성 특징 파라미터의 음성 인식 성능평가)

이광석;한학용;고시영;허강인
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.7 no.2
- /
- pp.177-182
- /
- 2003
In this study, we research the robust speech recognition for the syllables and phoneme units with the feature parameter including the voiced-unvoiced measures for the confusable words. In order to make it possible, we propose the measure representing the voiced-unvoiced degree by using the HPS(Harmonic Product Spectrum) information, used on pitch detection. We proposed this measures with the sharpnes, peak count and height measure of HPS. We reconstructed the feature parameter including this measures, then we performs the speech recognition experiments and compared with the typical feature parameters under the CVC type confusable syllables DB.
PDF KSCI

A Study-on Context-Dependent Acoustic Models to Improve the Performance of the Korea Speech Recognition (한국어 음성인식 성능향상을 위한 문맥의존 음향모델에 관한 연구)

황철준;오세진;김범국;정호열;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.2 no.4
- /
- pp.9-15
- /
- 2001
In this paper we investigate context dependent acoustic models to improve the performance of the Korean speech recognition . The algorithm are using the Korean phonological rules and decision tree, By Successive State Splitting(SSS) algorithm the Hidden Merkov Netwwork(HM-Net) which is an efficient representation of phoneme-context-dependent HMMs, can be generated automatically SSS is powerful technique to design topologies of tied-state HMMs but it doesn't treat unknown contexts in the training phoneme contexts environment adequately In addition it has some problem in the procedure of the contextual domain. In this paper we adopt a new state-clustering algorithm of SSS, called Phonetic Decision Tree-based SSS (PDT-SSS) which includes contexts splits based on the Korean phonological rules. This method combines advantages of both the decision tree clustering and SSS, and can generated highly accurate HM-Net that can express any contexts To verify the effectiveness of the adopted methods. the experiments are carried out using KLE 452 word database and YNU 200 sentence database. Through the Korean phoneme word and sentence recognition experiments. we proved that the new state-clustering algorithm produce better phoneme, word and continuous speech recognition accuracy than the conventional HMMs.
PDF

Voice Conversion using Generative Adversarial Nets conditioned by Phonetic Posterior Grams (Phonetic Posterior Grams에 의해 조건화된 적대적 생성 신경망을 사용한 음성 변환 시스템)

Lim, Jin-su;Kang, Cheon-seong;Kim, Dong-Ha;Kim, Kyung-sup
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2018.10a
- /
- pp.369-372
- /
- 2018
This paper suggests non-parallel-voice-conversion network conversing voice between unmapped voice pair as source voice and target voice. Conventional voice conversion researches used learning methods that minimize spectrogram's distance error. Not only these researches have some problem that is lost spectrogram resolution by methods averaging pixels. But also have used parallel data that is hard to collect. This research uses PPGs that is input voice's phonetic data and a GAN learning method to generate more clear voices. To evaluate the suggested method, we conduct MOS test with GMM based Model. We found that the performance is improved compared to the conventional methods.
PDF

Adaptive Noise Suppression system based on Human Auditory Model (인간의 청각모델에 기초한 잡음환경에 적응된 잡음억압 시스템)

Choi, Jae-Seung
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2008.05a
- /
- pp.421-424
- /
- 2008
This paper proposes an adaptive noise suppression system based on human auditory model to enhance speech signal that is degraded by various background noises. The proposed system detects voiced and unvoiced sections for each frame and implements the adaptive auditory process, then reduces the noise speech signal using neural network including amplitude component and phase component. Base on measuring signal-to-noise ratios, experiments confirm that the proposed system is effective for speech signal that is degraded by various noises.
PDF

Intelligent Speech Web Considering User Inclination (사용자의 성향을 고려하는 지능형 음성 웹)

Kwon, Hyeong-Joon;Hong, Kwang-Seok
- The KIPS Transactions:PartB
- /
- v.15B no.4
- /
- pp.347-354
- /
- 2008
In this paper, we propose a method for personalizing and intelligence of speech Web. The proposed system records information that was demanded in the past as a transaction, explores association rules from those transactions, and discovers itemsets from frequent requests. This method is to recommend relevant information, based on frequent itemsets, to users who have similar inclinations to previous users. As a result of experimenting and implementation of proposed system for verification, we confirmed that the proposed system can recommend previously frequently requested information as relevant information.
https://doi.org/10.3745/KIPSTB.2008.15-B.4.347 인용 PDF KSCI

Robust Speaker Identification using Independent Component Analysis (독립성분 분석을 이용한 강인한 화자식별)

Jang, Gil-Jin;Oh, Yung-Hwan
- Journal of KIISE:Software and Applications
- /
- v.27 no.5
- /
- pp.583-592
- /
- 2000
This paper proposes feature parameter transformation method using independent component analysis (ICA) for speaker identification. The proposed method assumes that the cepstral vectors from various channel-conditioned speech are constructed by a linear combination of some characteristic functions with random channel noise added, and transforms them into new vectors using ICA. The resultant vector space can give emphasis to the repetitive speaker information and suppress the random channel distortions. Experimental results show that the transformation method is effective for the improvement of speaker identification system.
PDF

Development of Continuous Spoken Digit Recognition System using Statistical Model (통계적 모델에 의한 연속 숫자음의 인식 기술개발)

Lee, G.S.;Ann, T.O.;Kim, S.H.
- Annual Conference on Human and Language Technology
- /
- 1989.10a
- /
- pp.154-158
- /
- 1989
본 연구는 통제적 모델에 의한 연속 숫자음의 인식에 관한 것으로 4 연속 숫자음을 인식 대상으로하여 실험한다. 시스템은 크게 음향 음성 처리부 및 어휘 해석부 두 부분으로 나뉜다. 음향 음성 처리부에서는 입력 음성으로부터 특정 벡터인 12차의 LPC cepstrum 계수를 구하여, 프레임 레이블링과 소음소 레이블링 (phone labelling)을 한다. 프레임 레이블링인 베이스 분류법을 이용하였으며, 소음소 레이블링은 프레임 레이블과 사후확률 (posteriori probability)로 부터 이루어 졌다. 어휘 해석부분에서는 소음소 단위를 입력으로 받아 음운규칙을 통해 작성된 소음소 망을 거쳐 연속 숫자음 출력을 얻도록 했다. 본실험은 화자 3 명이 발음한 35 개의 4 연속 숫자음을 인식 대상으로 하였으며, 4 연속 숫자음을 평가단위로 80%의 인식율을 얻었고, 각 숫자음의 음절을 단위로 95%의 인식율을 얻어 제시한 알고리즘의 유효성을 입증하였다.
PDF

Noise Suppression Algorithm using Neural Network based Amplitude and Phase Spectrum (진폭 및 위상스펙트럼이 도입된 신경회로망에 의한 잡음억제 알고리즘)

Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.13 no.4
- /
- pp.652-657
- /
- 2009
This paper proposes an adaptive noise suppression system based on human auditory model to enhance speech signal that is degraded by various background noises. The proposed system detects voiced, unvoiced and silence sections for each frame and implements an adaptive auditory process, then reduces the noise speech signal using a neural network including amplitude component and phase component. Based on measuring signal-to-noise ratios, experiments confirm that the proposed system is effective for speech signal that is degraded by various noises.
https://doi.org/10.6109/JKIICE.2009.13.4.652 인용 PDF KSCI

Robust estimation of HMM parameters Based on the State-Dependent Source-Quantization for Speech Recognition (상태의존 소스 양자화에 기반한 음성인식을 위한 은닉 마르코프 모델 파라미터의 견고한 추정)

최환진;박재득
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.1
- /
- pp.66-75
- /
- 1998
최근 음성인식을 위한 대표적인 방법으로써 은닉 마르코프 모델이 사용되고 있으며, 이러한 방법은 음성의 특성을 잘 표현하도록 하는 음향적인 모델링 방법에 따라서 성능이 좌우된다. 본 논문에서는 상태에서의 출력확률은 견고히 추정하기 위한 방법으로 상태에서 의 출력활률을 소스들의 분포와 그들의 빈도로 가중한 출력분포로 표시하는 상태 의존 소스 양자화 모델링 방법을 제안한다. 이 방법은 한 상태 내에서 특징 파라미터들이 유사한 특성 을 가지며, 그들의 변이가 다른 상태에 있는 특징 파라미터들에 비해서 작다는 사실에 기반 한다. 실험결과에 의하면, 제안된 방법이 기존의 baseline시스템보다 단어 인식율의 경우는 2.7%, 문장 인식율의 경우 3.6%의 향상을 보였다. 이러한 결과로부터 제안된 SDSQ-DHMM이 인식율 향상면에서 유효하며, HMM에 있어서 상태별 출력확률의 견고한 추정을 위한 대안으로 사용될 수 있을 것으로 판단된다.
PDF

Analysis and Recognition of Korean Fricatives and Affricates (한국어 마찰음 및 파찰음의 분석과 인식)

정석재;정현열;이무영
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.5
- /
- pp.27-35
- /
- 1991
음소를 인식의 기본 단위로 하는 소규모 음성 인식 시스템을 구현하기 위한 기초 연구로서 마 찰음(/ㅅ, ㅆ, ㅎ/) 과 파찰음(/ㅈ, ㅉ, ㅊ/) 에 대하여 지속시간, 평균패턴, 분산비를 이용하여 각 음소 의 특징을 분석하고 각 음소군 내에서의 식별에 유효한 parameter들을 추출하여 인식 실험을 실시하 였다. 지속시간의 분포, 평균패턴의 분포, 분산비의 분포를 이용하여 분석한 결과 6차원 정도의 cepstrum 계수만으로 마찰음 및 파찰음의 식별이 가능하고, 시간 방향의 정보는 음성의 시단으로부터 14 frame 정도의 특징을 인식 파라미터로 할 경우가 최적임을 알 수 있었다. 이를 이용한 인식실험 결과에서는 조음방법별로 분류된 음소군내의 각 음소에 대한 인식실험의 인식률 보다는 발음방법별 인식실험시의 인식률이 높게 나타나 동일 음소군 내에서의 각 음소에 대한 식별이 더 어려움을 알 수 있었고, 특징 파라미터의 길이를 음성의 시단으로부터 14 frame 정도로 했을 때 조음방법별 인식률은 평균 81.1%, 발음방법별 인식률은 평균 97.9%로 최고의 인식률을 나타내었다. 특징 파라미터의 길이 를 14 frame 이상으로 증가시켜도 인식률은 큰 변화가 없어 분석 결과를 잘 설명하고 있음을 알 수 있었다.
PDF

Search Result 183, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)