통합 검색 | Korea Science

잡음에 강인한 음성 인식을 위한 환경 파라미터 보상에 관한 연구 (A Study on Environment Parameter Compensation Method for Robust Speech Recognition)

홍미정;이호웅
- 한국ITS학회 논문지
- /
- 제5권2호
- /
- pp.1-10
- /
- 2006
본 논문에서는 강인한 음성인식 기술의 하나인 모델 파라미터 변환 기법 중 Carnegie Mellon University(1996)에서 Moreno가 제안한 최신 VTS(Vector Taylor Series) 알고리즘을 이용하여 주어진 잡음 환경에서 실험하였다. 이러한 VTS 알고리즘의 성능평가를 위해서 기존의 잡음 처리 방법 중 CMN(Cepstral Mean Normalization) 기법을 도입하였으며, 데시벨별로 설정한 백색 잡음과 거리잡음을 환경잡음으로 주어졌을 때의 인식률을 비교하였다. 또한 기존 Moreno가 제안한 실험환경의 인식 결과와 본 논문에서의 실험결과를 비교 분석하였다. 인식 알고리즘으로는 실시간 구현이 가능한 이산HMM(Hidden Markov Model)을 사용하였다.
PDF

Multi-channel Speech Enhancement Using Blind Source Separation and Cross-channel Wiener Filtering

Jang, Gil-Jin;Choi, Chang-Kyu;Lee, Yong-Beom;Kim, Jeong-Su;Kim, Sang-Ryong
- The Journal of the Acoustical Society of Korea
- /
- 제23권2E호
- /
- pp.56-67
- /
- 2004
Despite abundant research outcomes of blind source separation (BSS) in many types of simulated environments, their performances are still not satisfactory to be applied to the real environments. The major obstacle may seem the finite filter length of the assumed mixing model and the nonlinear sensor noises. This paper presents a two-step speech enhancement method with multiple microphone inputs. The first step performs a frequency-domain BSS algorithm to produce multiple outputs without any prior knowledge of the mixed source signals. The second step further removes the remaining cross-channel interference by a spectral cancellation approach using a probabilistic source absence/presence detection technique. The desired primary source is detected every frame of the signal, and the secondary source is estimated in the power spectral domain using the other BSS output as a reference interfering source. Then the estimated secondary source is subtracted to reduce the cross-channel interference. Our experimental results show good separation enhancement performances on the real recordings of speech and music signals compared to the conventional BSS methods.
PDF KSCI

실시간 SIFT 기본주파수 검출기의 구현 (Implementation of a Real-time SIFT Pitch Detector)

이종석;이상욱
- 대한전자공학회논문지
- /
- 제23권1호
- /
- pp.101-113
- /
- 1986
In this paper, a real-time pitch detector LPC vocoder as implemented on a high speed digital signal processor, NEC 7720, is described. The pitch detector was based mainly on the SIFT algorithm. The SIFT pitch detector consists primarily of a digital low pass filter, inverse filter, computation of autocorrelation, a peak picker, interpolation, V/UV defcision and a final pitch smoother. In our approach, modification, mainly on the V/UV decision and a final pitch smoother, was made to estimate more accurate pitches. An 16-bit fixed-point aithmatic was employed for all necessary computation and the simulated results were compared with the eye detected pitches obtained from real speech data. The pitch detector occupies 98.8% of the instruction ROM, 37% of the data ROM, and 94% of internal RAM and takes 15.2ms to estimate a pitch when an analysis frame is consisted of 128 sampled speech data. It is observed that the tested results were well agreed with the computer simulation results.
PDF

A STUDY ON THE SIMULATED ANNEALING OF SELF ORGANIZED MAP ALGORITHM FOR KOREAN PHONEME RECOGNITION

Kang, Myung-Kwang;Ann, Tae-Ock;Kim, Lee-Hyung;Kim, Soon-Hyob
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1994년도 제11회 음성통신 및 신호처리 워크샵 논문집 (SCAS 11권 1호)
- /
- pp.407-410
- /
- 1994
In this paper, we describe the new unsuperivised learning algorithm, SASOM. It can solve the defects of the conventional SOM that the state of network can't converge to the minimum point. The proposed algorithm uses the object function which can evaluate the state of network in learning and adjusts the learning rate flexibly according to the evaluation of the object function. We implement the simulated annealing which is applied to the conventional network using the object function and the learning rate. Finally, the proposed algorithm can make the state of network converged to the global minimum. Using the two-dimensional input vectors with uniform distribution, we graphically compared the ordering ability of SOM with that of SASOM. We carried out the recognitioin on the new algorithm for all Korean phonemes and some continuous speech.
PDF

지능형 감정인식 모델설계 (Design of Intelligent Emotion Recognition Model)

김이곤;김서영;하종필
- 한국지능시스템학회:학술대회논문집
- /
- 한국퍼지및지능시스템학회 2001년도 추계학술대회 학술발표 논문집
- /
- pp.46-50
- /
- 2001
Voice is one of the most efficient communication media and it includes several kinds of factors about speaker, context emotion and so on. Human emotion is expressed in the speech, the gesture, the physiological phenomena (the breath, the beating of the pulse, etc). In this paper, the method to have cognizance of emotion from anyone's voice signals is presented and simulated by using neuro-fuzzy model.
PDF

Design of Intelligent Emotion Recognition Model

Kim, Yi-gon
- 한국지능시스템학회논문지
- /
- 제11권7호
- /
- pp.611-614
- /
- 2001
Voice is one of the most efficient communication media and it includes several kinds of factors about speaker, context emotion and so on. Human emotion is expressed is expressed in the speech, the gesture, the physiological phenomena(the breath, the beating of the pulse, etc). In this paper, the emotion recognition method model using neuro-fuzzy in order to have cognizance of emotion from voice signal is presented and simulated.
PDF

화자 인식을 위한 GMM기반의 이중 보상 구조 (Double Compensation Framework Based on GMM For Speaker Recognition)

김유진;정재호
- 대한음성학회지:말소리
- /
- 제45호
- /
- pp.93-105
- /
- 2003
In this paper, we present a single framework based on GMM for speaker recognition. The proposed framework can simultaneously minimize environmental variations on mismatched conditions and adapt the bias free and speaker-dependent characteristics of claimant utterances to the background GMM to create a speaker model. We compare the closed-set speaker identification for conventional method and the proposed method both on TIMIT and NTIMIT. In the several sets of experiments we show the improved recognition rates on a simulated channel and a telephone channel condition by 7.2% and 27.4% respectively.
PDF

2.4kbps EHSX 음성부호화기와 결합된 채널코딩 방법 (Design of Channel Coding Combined with 2.4kbps EHSX Coder)

이창환;김영준;이인성
- 한국콘텐츠학회논문지
- /
- 제10권9호
- /
- pp.88-96
- /
- 2010
본 논문에서는 2.4kbps EHSX 음성코더와 결합된 채널 코딩 방법을 제안한다. 채널 코더의 부호율은 1/2로 하였고, 그에 따라 부호율이 1/3인 convolutional 코더를 펑처링(puncturing)함으로써 부호율을 맞추었다. 채널 코더의 부호율을 1/3에서 1/2로 하기 위한 펑처링에 소스 부호화된 비트들의 중요도를 고려하여 채널 코딩을 수행할 때 성능 향상을 얻도록 하였다. 소스 코더로 사용된 EHSX 코더의 부호와된 비트들의 중요도는 4명의 여성과 남성으로 구성된 음성을 사용하여 분석하였고, 실험 결과, EHSX 코더가 출력하는 부호화된 비트들은 서로 다른 비트 에러 감응도(sensitive)를 가짐을 알 수 있었다. 소스 코더와 결합된 채널 코더의 성능 평가를 위한 모의 실험은 Rayleigh 페이딩 채널과 AWGN 채널 상에서 수행되었으며, 제안된 방법을 통해 MOS 0.25~0.35 정도의 성능 향상을 이루었다.
https://doi.org/10.5392/JKCA.2010.10.9.088 인용 PDF KSCI

HMM(Hidden Markov Model) 기반의 견고한 실시간 립리딩을 위한 효율적인 VLSI 구조 설계 및 FPGA 구현을 이용한 검증 (Design of an Efficient VLSI Architecture and Verification using FPGA-implementation for HMM(Hidden Markov Model)-based Robust and Real-time Lip Reading)

이지근;김명훈;이상설;정성태
- 한국컴퓨터정보학회논문지
- /
- 제11권2호
- /
- pp.159-167
- /
- 2006
립리딩은 잡음이 있는 환경에서 음성 인식 시스템의 성능 향상을 위한 한 방법으로 제안되었다. 기존의 논문들이 소프트웨어 립리딩 방법을 제안하는 것에 반하여, 본 논문에서는 실시간 립리딩을 위한 하드웨어 설계를 제안한다. 실시간 처리와 구현의 용이성을 위하여 본 논문에서는 립리딩 시스템을 이미지 획득 모듈, 특징 벡터 추출 모듈, 인식 모듈의 세 모듈로 분할하였다. 이미지 획득 모듈에서는 CMOS 이미지 센서를 사용하여 입력 영상을 획득하게 하였고, 특징 벡터 추출 모듈에서는 병렬 블록매칭 알고리즘을 이용하여 입력영상으로부터 특징벡터를 추출하도록 하였고, 이를 FPGA로 코딩하여 시뮬레이션 하였다. 인식 모듈에서는 추출된 특징 벡터에 대하여 HMM 기반 인식 알고리즘을 적용하여 발성한 단어를 인식하도록 하였고, 이를 DSP에 코딩하여 시뮬레이션 하였다. 시뮬레이션 결과 실시간 립리딩 시스템이 하드웨어로 구현 가능함을 알 수 있었다.
PDF

Stochastic Relaxation 방법을 이용한 온라인 벡터 양자화기 설계 (On-line Vector Quantizer Design Using Stochastic Relaxation)

송근배;이행세
- 전자공학회논문지CI
- /
- 제38권5호
- /
- pp.27-36
- /
- 2001
본 논문은 온라인 벡터 양자화기 설계에 stochastic relaxation (SR) 개념을 응용함으로써 SR 방법에 기초한 새로운 온라인 학습 알고리즘을 제안한다. 이는 전통적인 Kohonen 학습법 (KLA)이 안고 있는 극소점(local minimum)으로의 수렴 문제를 개선시켜준다. SR 방법의 응용은 simulated annealing (SA) 개념을 사용하느냐 안 하느냐에 따라 둘로 나눌 수 있는데, 이를 구분하기 위해 SA 개념을 이용하는 SR 알고리즘을 LOVQ-SA로, SA 개념을 이용하지 않는 알고리즘을 OLVQ SR로 부르기로 한다. 제안된 방법들은 KLA와 결합되어 있으며 KLA의 특성을 보존하도록 설계되었다. 이는 제안된 방법들의 수렴의 속도 및 안정성을 향상시켜준다. 제안된 방법의 우수성을 입증하기 위하여 Gauss-Markov 신호원과 음성 및 영상 자료에 대한 벡터양자화 실험을 하였으며 실험결과를 통하여 제안된 방법이 KLA 보다 일관되게 우수한 코드북을 생성함을 보인다.
PDF

검색결과 70건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)