Search | Korea Science

Light weight architecture for acoustic scene classification (음향 장면 분류를 위한 경량화 모형 연구)

Lim, Soyoung;Kwak, Il-Youp
- The Korean Journal of Applied Statistics
- /
- v.34 no.6
- /
- pp.979-993
- /
- 2021
Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). In this study, we considered the problem that ASC faces in real-world applications that the model used should have low-complexity. We compared several models that apply light-weight techniques. First, a base CNN model was proposed using log mel-spectrogram, deltas, and delta-deltas features. Second, depthwise separable convolution, linear bottleneck inverted residual block was applied to the convolutional layer, and Quantization was applied to the models to develop a low-complexity model. The model considering low-complexity was similar or slightly inferior to the performance of the base model, but the model size was significantly reduced from 503 KB to 42.76 KB.
https://doi.org/10.5351/KJAS.2021.34.6.979 인용 PDF KSCI

Towards Low Complexity Model for Audio Event Detection

Saleem, Muhammad;Shah, Syed Muhammad Shehram;Saba, Erum;Pirzada, Nasrullah;Ahmed, Masood
- International Journal of Computer Science & Network Security
- /
- v.22 no.9
- /
- pp.175-182
- /
- 2022
In our daily life, we come across different types of information, for example in the format of multimedia and text. We all need different types of information for our common routines as watching/reading the news, listening to the radio, and watching different types of videos. However, sometimes we could run into problems when a certain type of information is required. For example, someone is listening to the radio and wants to listen to jazz, and unfortunately, all the radio channels play pop music mixed with advertisements. The listener gets stuck with pop music and gives up searching for jazz. So, the above example can be solved with an automatic audio classification system. Deep Learning (DL) models could make human life easy by using audio classifications, but it is expensive and difficult to deploy such models at edge devices like nano BLE sense raspberry pi, because these models require huge computational power like graphics processing unit (G.P.U), to solve the problem, we proposed DL model. In our proposed work, we had gone for a low complexity model for Audio Event Detection (AED), we extracted Mel-spectrograms of dimension 128×431×1 from audio signals and applied normalization. A total of 3 data augmentation methods were applied as follows: frequency masking, time masking, and mixup. In addition, we designed Convolutional Neural Network (CNN) with spatial dropout, batch normalization, and separable 2D inspired by VGGnet [1]. In addition, we reduced the model size by using model quantization of float16 to the trained model. Experiments were conducted on the updated dataset provided by the Detection and Classification of Acoustic Events and Scenes (DCASE) 2020 challenge. We confirm that our model achieved a val_loss of 0.33 and an accuracy of 90.34% within the 132.50KB model size.
https://doi.org/10.22937/IJCSNS.2022.22.9.26 인용 PDF KSCI

Search Result 2, Processing Time 0.019 seconds

Light weight architecture for acoustic scene classification (음향 장면 분류를 위한 경량화 모형 연구)

Towards Low Complexity Model for Audio Event Detection

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)