Search | Korea Science

Acoustic Event Detection in Multichannel Audio Using Gated Recurrent Neural Networks with High-Resolution Spectral Features

Kim, Hyoung-Gook;Kim, Jin Young
- ETRI Journal
- /
- v.39 no.6
- /
- pp.832-840
- /
- 2017
Recently, deep recurrent neural networks have achieved great success in various machine learning tasks, and have also been applied for sound event detection. The detection of temporally overlapping sound events in realistic environments is much more challenging than in monophonic detection problems. In this paper, we present an approach to improve the accuracy of polyphonic sound event detection in multichannel audio based on gated recurrent neural networks in combination with auditory spectral features. In the proposed method, human hearing perception-based spatial and spectral-domain noise-reduced harmonic features are extracted from multichannel audio and used as high-resolution spectral inputs to train gated recurrent neural networks. This provides a fast and stable convergence rate compared to long short-term memory recurrent neural networks. Our evaluation reveals that the proposed method outperforms the conventional approaches.
https://doi.org/10.4218/etrij.17.0117.0157 인용 PDF KSCI

Polyphonic sound event detection using multi-channel audio features and gated recurrent neural networks (다채널 오디오 특징값 및 게이트형 순환 신경망을 사용한 다성 사운드 이벤트 검출)

Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.36 no.4
- /
- pp.267-272
- /
- 2017
In this paper, we propose an effective method of applying multichannel-audio feature values to GRNNs (Gated Recurrent Neural Networks) in polyphonic sound event detection. Real life sounds are often overlapped with each other, so that it is difficult to distinguish them by using a mono-channel audio features. In the proposed method, we tried to improve the performance of polyphonic sound event detection by using multi-channel audio features. In addition, we also tried to improve the performance of polyphonic sound event detection by applying a gated recurrent neural network which is simpler than LSTM (Long Short Term Memory), which shows the highest performance among the current recurrent neural networks. The experimental results show that the proposed method achieves better sound event detection performance than other existing methods.
https://doi.org/10.7776/ASK.2017.36.4.267 인용 PDF KSCI

Salience of Envelope Interaural Time Difference of High Frequency as Spatial Feature (공간감 인자로서의 고주파 대역 포락선 양이 시간차의 유효성)

Seo, Jeong-Hun;Chon, Sang-Bae;Sung, Koeng-Mo
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.6
- /
- pp.381-387
- /
- 2010
Both timbral features and spatial features are important in the assessment of multichannel audio coding systems. The prediction model, extending the ITU-R Rec. BS. 1387-1 to multichannel audio coding systems, with the use of spatial features such as ITDDist (Interaural Time Difference Distortion), ILDDist (Interaural Level Difference Distortion), and IACCDist (InterAural Cross-correlation Coefficient Distortion) was proposed by Choi et al. In that model, ITDDistswere only computed for low frequency bands (below 1500Hz), and ILDDists were computed only for high frequency bands (over 2500Hz) according to classical duplex theory. However, in the high frequency range, information in temporal envelope is also important in spatial perception, especially in sound localization. A new model to compute the ITD distortions of temporal envelopes in high frequency components is introduced in this paper to investigate the role of such ITD on spatial perception quantitatively. The computed ITD distortions of temporal envelopes in high frequency components were highly correlated with perceived sound quality of multichannel audio sounds.
https://doi.org/10.7776/ASK.2010.29.6.381 인용 PDF KSCI

MPEG-4 ALS - The Standard for Lossless Audio Coding

Liebchen, Tilman
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.7
- /
- pp.618-629
- /
- 2009
The MPEG-4 Audio Lossless Coding (ALS) standard belongs to the family MPEG-4 audio coding standards. In contrast to lossy codecs such as AAC, which merely strive to preserve the subjective audio quality, lossless coding preserves every single bit of the original audio data. The ALS core codec is based on forward-adaptive linear prediction, which combines remarkable compression with low complexity. Additional features include long-term prediction, multichannel coding, and compression of floating-point audio material. This paper describes the basic elements of the ALS codec with a focus on prediction, entropy coding, and related tools and points out the most important applications of this standardized lossless audio format.
https://doi.org/10.7776/ASK.2009.28.7.618 인용 PDF KSCI

Development of Integrated Mixer Controller for Digital Public Address (디지털전관방송을 위한 통합믹서컨트롤러 개발)

Cho, Juphil;Kim, Kwan-Woong;Kim, Daeik
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.17 no.1
- /
- pp.19-24
- /
- 2017
Nowadays, based on the advancement of IT techniques, innovative products combining IT techniques to PA system are developing. In this paper, we presented the hybrid mixer controller for digital PA system. We develop the integrated mixer controller which includes the digital mixer composing an existing digital PA system and function of digital integrated controller. Developed integrated mixer controller consists of multichannel mixer function with 16 audio input channels, 8 output channels. And, it has an equalizer for processing digital audio signal, matrix and limiter. Also, the developed controller has some features such as internet connection for controlling of overall PA system and remote monitoring of mixer process condition.
https://doi.org/10.7236/JIIBC.2017.17.1.19 인용 PDF KSCI

Sound event detection based on multi-channel multi-scale neural networks for home monitoring system used by the hard-of-hearing (청각 장애인용 홈 모니터링 시스템을 위한 다채널 다중 스케일 신경망 기반의 사운드 이벤트 검출)

Lee, Gi Yong;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.6
- /
- pp.600-605
- /
- 2020
In this paper, we propose a sound event detection method using a multi-channel multi-scale neural networks for sound sensing home monitoring for the hearing impaired. In the proposed system, two channels with high signal quality are selected from several wireless microphone sensors in home. The three features (time difference of arrival, pitch range, and outputs obtained by applying multi-scale convolutional neural network to log mel spectrogram) extracted from the sensor signals are applied to a classifier based on a bidirectional gated recurrent neural network to further improve the performance of sound event detection. The detected sound event result is converted into text along with the sensor position of the selected channel and provided to the hearing impaired. The experimental results show that the sound event detection method of the proposed system is superior to the existing method and can effectively deliver sound information to the hearing impaired.
https://doi.org/10.7776/ASK.2020.39.6.600 인용 PDF KSCI

Search Result 6, Processing Time 0.018 seconds

Acoustic Event Detection in Multichannel Audio Using Gated Recurrent Neural Networks with High-Resolution Spectral Features

Polyphonic sound event detection using multi-channel audio features and gated recurrent neural networks (다채널 오디오 특징값 및 게이트형 순환 신경망을 사용한 다성 사운드 이벤트 검출)

Salience of Envelope Interaural Time Difference of High Frequency as Spatial Feature (공간감 인자로서의 고주파 대역 포락선 양이 시간차의 유효성)

MPEG-4 ALS - The Standard for Lossless Audio Coding

Development of Integrated Mixer Controller for Digital Public Address (디지털전관방송을 위한 통합믹서컨트롤러 개발)

Sound event detection based on multi-channel multi-scale neural networks for home monitoring system used by the hard-of-hearing (청각 장애인용 홈 모니터링 시스템을 위한 다채널 다중 스케일 신경망 기반의 사운드 이벤트 검출)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)