Browse > Article
http://dx.doi.org/10.5351/KJAS.2021.34.6.979

Light weight architecture for acoustic scene classification  

Lim, Soyoung (Department of Applied Statistics, Chung-Ang University)
Kwak, Il-Youp (Department of Applied Statistics, Chung-Ang University)
Publication Information
The Korean Journal of Applied Statistics / v.34, no.6, 2021 , pp. 979-993 More about this Journal
Abstract
Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). In this study, we considered the problem that ASC faces in real-world applications that the model used should have low-complexity. We compared several models that apply light-weight techniques. First, a base CNN model was proposed using log mel-spectrogram, deltas, and delta-deltas features. Second, depthwise separable convolution, linear bottleneck inverted residual block was applied to the convolutional layer, and Quantization was applied to the models to develop a low-complexity model. The model considering low-complexity was similar or slightly inferior to the performance of the base model, but the model size was significantly reduced from 503 KB to 42.76 KB.
Keywords
acoustic scene classification; light-weight model; deep learning; convolutional neural network;
Citations & Related Records
연도 인용수 순위
  • Reference
1 McDonnell M (2020). Low-complexity acoustic scene classification using one-bit-per-weight deep convolutional neural networks, DCASE2020 Challenge Technical Report
2 Han S, Mao H, and Dally WJ (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 4th International Conference on Learning Representations (ICLR 2016).
3 Lee YJ, Moon YH, Park JY, and Min OG (2019). Recent R&D trends for lightweight deep learning, Electronics and Telecommunications Trends, 34, 40-50.
4 Mesaros A, Heittola T, and Virtanen T (2018). A multi-device dataset for urban acoustic scene classification, arXiv preprint arXiv:1807.09840.
5 Suh S, Lim W, Jeong Y, Lee T, and Kim HY (2018). Dual CNN structured sound event detection algorithm based on real life acoustic dataset, The Korean Institute of Broadcast and Media Engineers, 23, 855-865.
6 Xiong Y, Kim HWJ, and Hedau V (2019). Antnets: Mobile Convolutional Neural Networks for Resource Efficient Image Classification, arXiv preprint arXiv:1904.03775.
7 Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, and Adam H (2017). Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications, arXiv preprint arXiv:1704.04861.
8 Chollet F (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 1251-1258.
9 Courbariaux M, Hubara I, Soudry D, El-Yaniv R, and Bengio Y (2016). Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1, arXiv preprint arXiv:1602.02830.
10 Heittola T, Mesaros A, and Virtanen T (2020). Acoustic scene classification in dcase 2020 challenge: generalization across devices and low complexity solutions. In Proceedings of the Fifth Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2020), 56-60.
11 Kingma DP and Ba J (2015). Adam: A method for stochastic optimization, 3rd International Conference on Learning Representations (ICLR 2015).
12 Koutini K, Henkel F, Eghbal-zadeh H, and Widmer G (2020). CP-JKU submissions to DCASE'20: Low-complexity cross-device acoustic scene classification with rf-regularized CNNs, DCASE2020 Challenge Technical Report.
13 He K, Zhang X, Ren S, and Sun J (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778.
14 Suh S, Park S, Jeong Y, and Lee T (2020). Designing acoustic scene classification models with CNN variants, DCASE2020 Challenge Technical Report.
15 Szegedy C, Liu W, Jia Y, et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), 1-9.
16 Zhang H, Cisse M, Dauphin YN, and Lopez-Paz D (2018). Mixup: Beyond empirical risk minimization, 6th International Conference on Learning Representations (ICLR 2018).
17 Hu H, Yang CHH, Xia X et al. (2020). Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation, arXiv preprint arXiv:2007.08389.
18 Strubell E, Ganesh A, McCallum A (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645-3650
19 McDonnell MD and Gao W (2020). Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), 141-145.
20 Sandler M, Howard A, Zhu M, Zhmoginov A, and Chen LC (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2018), 4510-4520.
21 Jan MA, Zakarya M, Khan M, Mastorakis S, Menon VG, Balasubramanian V, and Rehman AU (2021). An AIenabled lightweight data fusion and load optimization approach for Internet of Things, Future Generation Computer Systems, 122, 40-51.   DOI
22 Koutini K, Eghbal-Zadeh H, Dorfer M, and Widmer G (2019). The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification, 27th European signal processing conference (EUSIPCO 2019), 1-5.