Fig. 1. Block diagram of the DNN structure.
Fig. 2. Example of DRC curve.
Fig. 3. Block diagram for overall structure.
Fig. 4. Performance according to parameters of time stretching and pitch shifting. (a) Time stretching, (b) Pitch shifting.
Fig. 5. Performance according to DRC curve and block mixing method. (a) Dynamic range compression, (b) Block mixing.
Table 1. Distribution of weakly labeled data each class.
Table 2. Parameters of block mixing and dynamic range compression.
Table 3. Performance per data augmentation method and its parameters.
참고문헌
- E. Wold, T. Blum, D. Keislar, and J. Wheaten, "Content-based classification, search, and retrieval of audio," IEEE Multimedia, 3, 27-36 (1996). https://doi.org/10.1109/93.556537
- D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. D. Plumbley, "Detection and classification of acoustic scenes and events: an IEEE AASP challenge," Proc. of IEEE WASPAA, 1-4, (2013).
- P. Cano, M. Koppenberger, and N. Wack, "Content-based music audio recommendation," Proc. ACM 13th, 211-212 (2005).
- P. Foster, S. Sigtia, S. Krstulovic, J. Barker, and M. D. Plumbley, "CHiME-home: A dataset for sound source recognition in a domestic environment," Proc. of IEEE WASPAA, 15, 2015.
- J. Salamon and J. P. Bello, "Deep convolutional neural networks and data augmentation for environmental sound classification," in. IEEE Signal Process. Lett., 24, 279-283(2016).
- S. Mum, S. Park, D. K. Han, and H. Ko, "Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane," Proc. DCASE, 93-97 (2017).
- R. Seizel, N. Turpault, H. Eghbal-Zadeh, and A. P. Shah, "Large-scale weakly labeled semi-supervised sound event detection," arXiv preprint arXiv:1807.10501, July (2018).
- M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., 45, 2673-2681(1997). https://doi.org/10.1109/78.650093
- G. E. Dahl, T. N. Sainath, and G. E. Hinton "Improving DNNs for LVCSR using rectified linear units and dropout," Proc. IEEE ICASSP, 8609-8613 (2013).
- M. Hilsamer and S. Herzog, "A statistical approach to automated offline dynamic processing in the audio mastering process," In. DAFx, 35-40 (2014).
- Dolby E, "Standards and practices for authoring Dolby Digital and Dolby E bitstreams," Dolby Labortories, Inc. 2002.
- J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, "Audio set: An ontology and human-labeled dataset for audio events," Proc. IEEE ICASSP, 776-780 (2017).
- S. M. Beitzel, On Understanding And Classifying Web Queries, (Ph.D. thesis, Illinois Institute of Technology, Chicago, IL, CiteSeerX 10.1.1.127.634, 2006).