통합 검색 | Korea Science

On Effective Dual-Channel Noise Reduction for Speech Recognition in Car Environment

Ahn, Sung-Joo;Kang, Sun-Mee;Ko, Han-Seok
- 음성과학
- /
- 제11권1호
- /
- pp.43-52
- /
- 2004
This paper concerns an effective dual-channel noise reduction method to increase the performance of speech recognition in a car environment. While various single channel methods have already been developed and dual-channel methods have been studied somewhat, their effectiveness in real environments, such as in cars, has not yet been formally proven in terms of achieving acceptable performance level. Our aim is to remedy the low performance of the single and dual-channel noise reduction methods. This paper proposes an effective dual-channel noise reduction method based on a high-pass filter and front-end processing of the eigendecomposition method. We experimented with a real multi-channel car database and compared the results with respect to the microphones arrangements. From the analysis and results, we show that the enhanced eigendecomposition method combined with high-pass filter indeed significantly improve the speech recognition performance under a dual-channel environment.
PDF

이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출 (Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition)

신민화;박지훈;김홍국;이연우;이성로
- 말소리와 음성과학
- /
- 제2권3호
- /
- pp.141-148
- /
- 2010
In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.
PDF

청각 시스템의 특징을 이용한 음성 명료도 향상 (Speech Enhancement based on human auditory system characteristics)

이상훈;정홍
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2007년도 하계종합학술대회 논문집
- /
- pp.411-412
- /
- 2007
본 논문에서는 인간 청각 시스템의 특징을 이용한 음성명료도 향상 알고리즘을 제안한다. 기존의 연구들은 음성과 잡음이 같이 섞여 있는 Single-Channel에서의 명료도 향상의 대해 주로 다루었다. 하지만 잡음에 섞이기 전의 깨끗한 음성과 주변 잡음이 분리된 Dual-Channel에서의 명료도 향상에 관한 연구는 거의 다루어지지 않았다. 본 논문에서 음성을 잡음이 섞이기 전에 미리 강화시켜 나중에 잡음에 섞였을 때 명료도가 강화되도록 하는 방법을 제안한다. 인간 청각 시스템의 마스킹 효과를 적절히 이용하여 음성을 강화시키는 방법을 사용하였다. 실험 결과 이 방법은 단순히 볼륨만을 높이는 방법에 비해 명료도가 더 향상되는 것으로 나타났다.
PDF

MAV 환경에서의 CNN 기반 듀얼 채널 음향 향상 기법 (CNN based dual-channel sound enhancement in the MAV environment)

김영진;김은경
- 한국정보통신학회논문지
- /
- 제23권12호
- /
- pp.1506-1513
- /
- 2019
최근 드론과 같은 멀티로터 UAV(Unmanned Aerial Vehicle, 무인항공기)의 산업 범위가 크게 확대됨에 따라, UAV를 활용한 데이터의 수집 및 처리, 분석에 대한 요구도 함께 증가하고 있다. 그러나 UAV를 이용해서 수집된 음향 데이터는 UAV의 모터 소음과 바람 소리 등으로 크게 손상되어, 음향 데이터의 처리 및 분석이 어렵다는 단점이 있다. 따라서 본 논문에서는 UAV에 연결된 마이크를 통해 수신된 음향 신호로부터 목표 음향 신호의 품질을 향상시킬 수 있는 방법에 대해 연구하였다. 본 논문에서는 기존의 단일 채널 음향 향상 기술 중 하나인 densely connected dilated convolutional network를 음향 신호의 채널 간 특성을 반영할 수 있도록 확장하였으며, 그 결과 SDR, PESQ, STOI과 같은 평가 지표에서 기존 연구 대비 좋은 성능을 보였다.
https://doi.org/10.6109/jkiice.2019.23.12.1506 인용 PDF KSCI

멀티모달 인터랙션을 위한 사용자 병렬 모달리티 입력방식 및 입력 동기화 방법 설계 (Design of Parallel Input Pattern and Synchronization Method for Multimodal Interaction)

임미정;박범
- 대한인간공학회지
- /
- 제25권2호
- /
- pp.135-146
- /
- 2006
Multimodal interfaces are recognition-based technologies that interpret and encode hand gestures, eye-gaze, movement pattern, speech, physical location and other natural human behaviors. Modality is the type of communication channel used for interaction. It also covers the way an idea is expressed or perceived, or the manner in which an action is performed. Multimodal Interfaces are the technologies that constitute multimodal interaction processes which occur consciously or unconsciously while communicating between human and computer. So input/output forms of multimodal interfaces assume different aspects from existing ones. Moreover, different people show different cognitive styles and individual preferences play a role in the selection of one input mode over another. Therefore to develop an effective design of multimodal user interfaces, input/output structure need to be formulated through the research of human cognition. This paper analyzes the characteristics of each human modality and suggests combination types of modalities, dual-coding for formulating multimodal interaction. Then it designs multimodal language and input synchronization method according to the granularity of input synchronization. To effectively guide the development of next-generation multimodal interfaces, substantially cognitive modeling will be needed to understand the temporal and semantic relations between different modalities, their joint functionality, and their overall potential for supporting computation in different forms. This paper is expected that it can show multimodal interface designers how to organize and integrate human input modalities while interacting with multimodal interfaces.
https://doi.org/10.5143/JESK.2006.25.2.135 인용 PDF KSCI

이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출 (Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition)

신민화;박지훈;김홍국
- 한국방송∙미디어공학회:학술대회논문집
- /
- 한국방송공학회 2010년도 하계학술대회
- /
- pp.150-151
- /
- 2010
본 논문에서는 잡음환경에서의 이중채널 음성인식을 위한 통계모델 기반 음성구간 검출 방법을 제안한다. 제안된 방법에서는 다채널 입력 신호로부터 얻어진 공간정보를 이용하여 음성 존재 및 부재 확률모델을 구하고 이를 통해 음성구간 검출을 행한다. 이때, 공간정보는 두 채널간의 상호 시간 차이와 상호 크기 차이로, 음성 존재 및 부재 확률은 가우시안 커널 밀도 기반의 확률모델로 표현된다. 그리고 음성구간은 각 시간 프레임 별 음성 존재 확률 대비 음성 부재 확률의 비를 추정하여 검출된다. 제안된 음성구간 검출 방법의 평가를 위해 검출된 구간만을 입력으로 하는 음성인식 성능을 측정한다. 실험결과, 제안된 공간정보를 이용하는 통계모델 기반의 음성구간 검출 방법이 주파수 에너지를 이용하는 통계모델 기반의 음성구간 검출 방법과 주파수 스펙트럼 밀도 기반 음성구간 검출 방법에 비해 각각 15.6%, 15.4%의 상대적 오인식률 개선을 보였다.
PDF

암묵신호분리를 이용한 동시통화 음향반향제거기 (An Acoustic Echo Canceller for Double-talk by Blind Signal Separation)

이행우;윤현민
- 한국정보통신학회논문지
- /
- 제16권2호
- /
- pp.237-245
- /
- 2012
본 논문은 암묵신호분리방법을 이용하여 동시통화를 가능하게 하는 음향반향제거기에 관한 것이다. 음향반향 제거기는 동시통화 구간에서 성능이 저하되거나 발산하게 된다. 그래서 근단화자신호를 추정해서 잔차신호로부터 차감하기 위하여 암묵신호분리방법을 사용한다. 암묵신호분리방법은 이중 마이크를 가지고 2차 통계적 성질을 이용한 반복적인 계산에 의해 근단화자신호를 추정해낸다. 그런데 폐쇄된 반향환경에서 암묵신호분리의 혼합모델은 다채널이기 때문에 분리계수를 직접 계산하지 않고 반향제거기의 계수를 복사하여 그대로 사용한다. 많은 시뮬레이션을 통하여 제안한 음향반향제거기의 성능을 검증하였다. 시뮬레이션 결과, 이 방법을 사용한 음향반향제거기는 동시통화의 유무에 상관없이 안전하게 동작하고, 일반적인 LMS 알고리즘에 비해 ERLE가 평균 20dB 향상되는 것으로 나타났다.
https://doi.org/10.6109/jkiice.2012.16.2.237 인용 PDF KSCI

검색결과 7건 처리시간 0.03초

On Effective Dual-Channel Noise Reduction for Speech Recognition in Car Environment

이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출 (Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition)

청각 시스템의 특징을 이용한 음성 명료도 향상 (Speech Enhancement based on human auditory system characteristics)

MAV 환경에서의 CNN 기반 듀얼 채널 음향 향상 기법 (CNN based dual-channel sound enhancement in the MAV environment)

멀티모달 인터랙션을 위한 사용자 병렬 모달리티 입력방식 및 입력 동기화 방법 설계 (Design of Parallel Input Pattern and Synchronization Method for Multimodal Interaction)

이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출 (Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition)

암묵신호분리를 이용한 동시통화 음향반향제거기 (An Acoustic Echo Canceller for Double-talk by Blind Signal Separation)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)