• Title/Summary/Keyword: Gated Linear Unit (GLU)

Search Result 2, Processing Time 0.019 seconds

Temporal attention based animal sound classification (시간 축 주의집중 기반 동물 울음소리 분류)

  • Kim, Jungmin;Lee, Younglo;Kim, Donghyeon;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.406-413
    • /
    • 2020
  • In this paper, to improve the classification accuracy of bird and amphibian acoustic sound, we utilize GLU (Gated Linear Unit) and Self-attention that encourages the network to extract important features from data and discriminate relevant important frames from all the input sequences for further performance improvement. To utilize acoustic data, we convert 1-D acoustic data to a log-Mel spectrogram. Subsequently, undesirable component such as background noise in the log-Mel spectrogram is reduced by GLU. Then, we employ the proposed temporal self-attention to improve classification accuracy. The data consist of 6-species of birds, 8-species of amphibians including endangered species in the natural environment. As a result, our proposed method is shown to achieve an accuracy of 91 % with bird data and 93 % with amphibian data. Overall, an improvement of about 6 % ~ 7 % accuracy in performance is achieved compared to the existing algorithms.

Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label (약한 레이블을 이용한 확장 합성곱 신경망과 게이트 선형 유닛 기반 음향 이벤트 검출 및 태깅 알고리즘)

  • Park, Chungho;Kim, Donghyun;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.414-423
    • /
    • 2020
  • In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).