Search | Korea Science

Speech enhancement based on reinforcement learning (강화학습 기반의 음성향상기법)

Park, Tae-Jun;Chang, Joon-Hyuk
- Proceedings of the Korea Information Processing Society Conference
- /
- 2018.05a
- /
- pp.335-337
- /
- 2018
음성향상기법은 음성에 포함된 잡음이나 잔향을 제거하는 기술로써 마이크로폰으로 입력된 음성신호는 잡음이나 잔향에 의해 왜곡되어지므로 음성인식, 음성통신 등의 음성신호처리 기술의 핵심 기술이다. 이전에는 음성신호와 잡음신호 사이의 통계적 정보를 이용하는 통계모델 기반의 음성향상기법이 주로 사용되었으나 통계 모델 기반의 음성향상기술은 정상 잡음 환경과는 달리 비정상 잡음 환경에서 성능이 크게 저하되는 문제점을 가지고 있었다. 최근 머신러닝 기법인 심화신경망 (DNN, deep neural network)이 도입되어 음성 향상 기법에서 우수한 성능을 내고 있다. 심화신경망을 이용한 음성 향상 기법은 다수의 은닉 층과 은닉 노드들을 통하여 잡음이 존재하는 음성 신호와 잡음이 존재하지 않는 깨끗한 음성 신호 사이의 비선형적인 관계를 잘 모델링하였다. 이러한 심화신경망 기반의 음성향상기법을 향상 시킬 수 있는 방법 중 하나인 강화학습을 적용하여 기존 심화신경망 대비 성능을 향상시켰다. 강화학습이란 대표적으로 구글의 알파고에 적용된 기술로써 특정 state에서 최고의 reward를 받기 위해 어떠한 policy를 통한 action을 취해서 다음 state로 나아갈지를 매우 많은 경우에 대해 학습을 통해 최적의 action을 선택할 수 있도록 학습하는 방법을 말한다. 본 논문에서는 composite measure를 기반으로 reward를 설계하여 기존 PESQ (Perceptual Evaluation of Speech Quality) 기반의 reward를 설계한 기술 대비 음성인식 성능을 높였다.
https://doi.org/10.3745/PKIPS.y2018m05a.335 인용 PDF

A Cross-talk Cancelling Technique for Improved 3-Dimensional Audio Reproduction (개선된 3차원 오디오 재생을 위한 크로스토크 제거 기법)

오승수;김기만
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.5 no.1
- /
- pp.8-13
- /
- 2001
It has been well known that cross-talk canceller for 3D audio using loudspeakers depends on a listeners position called the sweet-spot. Therefore, new cross-talk canceller was proposed that increases robustness to perturbations such as head movement, reverberations, and different head shapes. It was made up 3 loudspeakers structure to be combined with symmetric and asymmetric speaker geometry. In this paper, we propose new cross-talk canceller using 2 loudspeakers having the same efficiency as existing cross-talk canceller using 3 loudspeakers. The results of the study is verified through the listening tests and also presented a cross-talk cancelling methods for improved 3-D audio production in details.
PDF

Single Ping Clutter Reduction Algorithm Using Statistical Features of Peak Signal to Improve Detection in Active Sonar System (능동소나 탐지 성능 향상을 위한 피크 신호의 통계적 특징 기반 단일 핑 클러터 제거 기법)

Seo, Iksu;Kim, Seongweon
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.1
- /
- pp.75-81
- /
- 2015
In active sonar system, clutters degrade performance of target detection/tracking and overwhelm sonar operators in ASW (Antisubmarine Warfare). Conventional clutter reduction algorithms using consistency of local peaks are studied in multi-ping data and tracking filter research for active sonar was conducted. However these algorithms cannot classify target and clutters in single ping data. This paper suggests a single ping clutter reduction approach to reduce clutters in mid-frequency active sonar system using echo shape features. The algorithm performance test is conducted using real sea-trial data in heavy clutter density environment. It is confirmed that the number of clutters was reduced by about 80 % over the conventional algorithm while retaining the detection of target.
https://doi.org/10.7776/ASK.2015.34.1.075 인용 PDF KSCI

A Study on the Subband Acoustic Echo Canceller Using Weighted Overlap-Add SSB and QMF Filter Banks (중첩가산방식의 SSB 필터뱅크와 QMF 필터뱅크를 이용한 서브밴드 음향 반향 신호 제거기에 관한 연구)

차경환;심동연;김천덕
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.36S no.4
- /
- pp.93-100
- /
- 1999
확성회의 시스템에서 응용되는 반향신호 제거기는 긴 잔향시간을 갖는 실내 공간의 환경변화에 따라 필터 계수의 갱신에 많은 시간이 요구되어 실시간 처리에 문제점으로 지적되고 있다. 본 논문에서는 연산량 저감을 통한 실시간 처리를 위하여 중첩가산방식의 SSB(Single Side Band) 필터뱅크를 사용한 서브밴드 적응 신호처리법을 제안한다. 이 방법은 입력과 출력의 스펙트럼을 몇 개의 주파수 밴드로 분할하여, 각 밴드를 ES-NLMS(Exponential Step-Normalized Least Mean Square) 알고리즘을 이용하여 적응 처리하는 것이다. 시뮬레이션 결과 중첩가산방식의 SSB 필터뱅크가 풀밴드 보다 ERLE(Echo Return Loss Enhancement)가 1∼2㏈ 정도 작을 때 연산량이 풀밴드 보다 약95%, QMF(Quadrature Mirror Filter)필터뱅크보다 약50% 정도 감소하여 우수한 것으로 나타났다.
PDF

Robust Blind Source Separation to Noisy Environment For Speech Recognition in Car (차량용 음성인식을 위한 주변잡음에 강건한 브라인드 음원분리)

Kim, Hyun-Tae;Park, Jang-Sik
- The Journal of the Korea Contents Association
- /
- v.6 no.12
- /
- pp.89-95
- /
- 2006
The performance of blind source separation(BSS) using independent component analysis (ICA) declines significantly in a reverberant environment. A post-processing method proposed in this paper was designed to remove the residual component precisely. The proposed method used modified NLMS(normalized least mean square) filter in frequency domain, to estimate cross-talk path that causes residual cross-talk components. Residual cross-talk components in one channel is correspond to direct components in another channel. Therefore, we can estimate cross-talk path using another channel input signals from adaptive filter. Step size is normalized by input signal power in conventional NLMS filter, but it is normalized by sum of input signal power and error signal power in modified NLMS filter. By using this method, we can prevent misadjustment of filter weights. The estimated residual cross-talk components are subtracted by non-stationary spectral subtraction. The computer simulation results using speech signals show that the proposed method improves the noise reduction ratio(NRR) by approximately 3dB on conventional FDICA.
PDF

Implementation of Spatial Sound Localization System and Subjective Test (3차원 음상정위 시스템의 구현과 주관 평가)

이동우
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.43-46
- /
- 1998
본 논문에서는 헤드폰과 스테레오 스피커를 통하여 가상의 음상을 임의의 위치에 정위시키는 음상정위 시스템을 구현하고, 주관 평가를 통하여 음상정위 성능을 고찰하였다. 음상정위 시스템은 크게 방향감을 제어하는 컨벌루션 처리부와 공간감과 거리감을 처리하는 잔향 처리부, 그리고 스테레오 스피커를 통해 소리를 재생할 때 발생하는 크로스 토크(corsstalk)를 제거하기 위한 트랜스오럴(transaural) 필터부로 나누어진다. 구현된 시스템의 음상정위 성능은 리스링 룸에서 녹음된 음성과 메트로놈 소리를 이용하여 수평각/고도각, 정지음/이동음, 거리감 등을 헤드폰과 스피커를 통하여 각각 실험한 결과 수평각 지각은 스피커 재생보다 헤드폰 재생이 우수했으며, 정지음보다 이동음의 지각 결과가, 고도각 지각은 전.후(0$^{\circ}$~360$^{\circ}$) 방향보다 좌.우(90$^{\circ}$~270$^{\circ}$) 방향의 결과가 우수하게 나왔다.
PDF

3D Sound Player with various resampled HRTF′s (HRTF(머리전달함수)의 샘플링를 변환에 따른 입체음향 플레이어)

오재경;이동재;임철수;최범석;이원돈
- Proceedings of the KAIS Fall Conference
- /
- 2001.05a
- /
- pp.199-202
- /
- 2001
본 논문에서는 3D사운드 생성 기술 중 대표적인 방법인 원음에 HRTF(머리전달함수)를 콘볼루션(convolution)하는 방식으로 음상정위 모듈을 구현하였으며 음장감을 부여하기 위하여 잔향(reverberation) 효과를 추가하고 크로스토크 현상을 제거하기 위하여 트랜스오럴 필터를 추가하였다. 본 논문에서는 sampling rate conversion을 사용하여 decimation과 interpolation을 수행하여 44.1KHz의 sampling rate로 된 coefficient를 downsample하거나 upsample한 HRTR(머리전달함수)를 사용하여 콘볼루션(convolution)을 수행했다. 본 논문에서는 3D사운드 생성과정에서 필요한 연산과정을 최소화하여 일반 PC의 computing power로도 sampling rate conversion된 데이터를 처리하여 줄 수 있는 알고리즘을 제시하고 구현하였다.

3D Sound Application to N channel Sound File (다채널 음악파일에의 입체음향 적용)

Kim, Yong-Jin;Song, Jang-Ho;Lee, Dong-Jae;Lee, Won-Don
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.11a
- /
- pp.15-18
- /
- 2002
본 논문에서는 다양한 채널을 가진 음악 과일에 대하여 입체 음향 효과를 줄 수 있는 시스템을 개발 하였다. 그러기 위하여 3D 사운드 기술 중에 가장 대표적으로 알려진 HRTF(머리전달 함수)를 원음에 콘볼루션(Convolution)하는 방식으로 음상정위 모듈을 구현하였으며 음장감을 부여하기 위해 잔향 효과(Reverberation)효과를 추가하고 크로스토크 현상 제거를 위해 트랜스오럴(Transaural) 필터를 추가하였다. 이런 입체음향 기술을 가지고 여러 채널을 가진 음악 파일에 적용시켜서 다채널 입체음향 효과를 낸 수 있는 시뮬레이터를 구현해 보았다. 시스템 구현에는 한정된 채널이 아닌 다양한 채널에 대한 효과를 낼 수 있도록 하였으며 기본적인 실험으로는 미디를 바탕으로한 5개의 채널에 대하여 실험하여 이를 증명해 보았다.
PDF

Efficient Primary-Ambient Decomposition Algorithm for Audio Upmix (오디오 업믹스를 위한 효율적인 Primary-Ambient 분리 알고리즘)

Baek, Yong-Hyun;Lee, Keun-Sang;Jeon, Se-Woon;Lee, Seokpil;Park, Young-Choel
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2012.07a
- /
- pp.160-163
- /
- 2012
업믹스(Upmix) 기술은 홈시어터와 같은 다채널 스피커 재생 환경에서 콘텐츠의 대부분을 차지하는 스테레오 음원을 다채널 환경에 재생하기 위한 채널 포맷 변환 기술을 말한다. 업믹스를 위한 전처리 단계로서 특정 방향으로 패닝된 주(primary)성분과 잔향 및 배경음과 같은 Ambient 성분을 분리하는 과정이 필요하다. Primary와 Ambient를 분리하기 위한 방법으로 채널 간의 상관도, 적응 필터 및 주성분 분석법(principal component analysis, PCA)이 널리 이용되고 있다. 이에 본 논문에서는 비교적 정확하게 Primary와 Ambient를 분리한다고 알려진 주성분 분석법을 이용하여 신호를 분리해 내고 이 때 주성분 분석법이 가지는 문제점을 해결한 향상된 Primary-Ambient 분리 알고리즘을 제안하였다. 제안된 알고리즘은 분리 성능이 Primary 성분이 패닝된 각도에 영향을 받지 않으며 또한 Primary 성분에 섞인 잔여 Ambient를 제거함으로써 기존의 주성분 분석법 보다 더 정확하게 Primary와 Ambient를 분리 할 수 있고 상관성이 없는 Ambient 특성을 좀 더 정확하게 반영한다.
PDF

Modeling of Acoustic Echo Canceller Using Subband Adaptive Signal Processing (서브밴드 적응신호처리를 이용한 음향 에코제거기의 모델링)

Kim, Chun-Duck;Sim, Dong-Youn;Chung, Ho-Moon;Lee, Jun-Ku;Cha, Kyung-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.5
- /
- pp.43-49
- /
- 1997
Generally, echo cancelers of a TV conference system or a audio conference system are to carry out a real time processing in the case of the closed room having long reverberation time because the system requires much time to modify filter coefficients to environmental changes. Therefore this paper proposes a new subband adaptive filtering method using polyphase filter banks of MPEG(Moving Picture Experts Group) audio system to solve the problems. This method divides signal spectra of input and output into several frequency bands, and each band is adaptively filtered by using ES-NLMS (Exponential Step-Normalized Least Mean Square) algorithm. The optimal number of subband is determined by computational simulations. According to the results of simulation, ERLE of the subband model is 2dB smaller than general full band, calculation rate's of the subband model is decreased about 88%.
PDF

Search Result 32, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)