Browse > Article
http://dx.doi.org/10.5573/ieek.2013.50.12.224

Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns  

Kim, Samuel (Given Zone LLC)
Publication Information
Journal of the Institute of Electronics and Information Engineers / v.50, no.12, 2013 , pp. 224-231 More about this Journal
Abstract
Defining the concept of prominent audio content as the most informative audio content from the users' perspective within a given unstructured audio segment, we propose a simple but robust intensity-based attack/release pattern features to detect the prominent audio content. We also propose a web-based annotation procedure to retrieve users' subjective perception and annotated 18 hours of video clips across various genres, such as cartoon, movie, news, etc. The experiments with a linear classification method whose models are trained for speech, music, and sound effect demonstrate promising - but varying across the genres of programs - results (e.g., 86.7% weighted accuracy for speech-oriented talk shows and 49.3% weighted accuracy for {action movies}).
Keywords
audio classification; audio information retrieval; unstructured audio; genre classification;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 J. Pinquier and R. Andre-Obrecht, "Audio indexing: pri-mary components retrieval," Multimedia tools and applica-tions, vol. 30, no. 3, pp. 313-330, 2006.   DOI
2 K.-K. Lee, Y.-H. Cho, and K.-S. Park, "Implementation of an intelligent audio graphic equalizer system," Journal of The Institute of Electronics Engineers of Korea, vol. 43, no. 3, pp. 76-83, 2006.   과학기술학회마을
3 V. Hamacher, J. Chalupper, J. Eggers, E. Fischer, U. Kornagel, H. Puder, and U. Rass, "Signal processing in high-end hearing aids: state of the art, challenges, and future trends," EURASIP J. Appl. Signal Process., vol. 2005, pp. 2915-2929, Jan. 2005. [Online]. Available: http://dx.doi.org/10.1155/ASP.2005.2915   DOI   ScienceOn
4 J. M. Kates, "Classification of background noises for hearing-aid applications," The Journal of the Acoustical Society of America, vol. 97, no. 1, pp. 461-470, 1995. [Online]. Available: http://link.aip.org/link/?JAS/97/461/1   DOI
5 P. Nordqvist and A. Leijon, "An efficient robust sound classifi-cation algorithm for hearing aids," The Journal of the Acousti-cal Society of America, vol. 115, no. 6, pp. 3033-3041, 2004. [Online]. Available: http://link.aip.org/link/? JAS/115/3033/1   DOI   ScienceOn
6 S. Kim, P. Georgiou, and S. Narayanan, "Latent acoustic topic models for unstructured audio classification," APSIPA Transactions of Signal and Information Processing, vol. 1, Nov. 2012.
7 J.-E. Kim and I.-S. Lee, "Speech/mixed content signal classification based on GMM using MFCC," Journal of The Institute of Electronics Engineers of Korea, vol. 50, no. 2, pp. 185-192, 2013.   과학기술학회마을   DOI   ScienceOn
8 S. Chu, S. Narayanan, and C.-C. J. Kuo, "A semi-supervised learning approach to online audio background detection," in IEEE International Conference on Acoustic, Speech, and Sig-nal Processing (ICASSP), 2009.
9 T. Houtgast and H. J. M. Steeneken, "A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria," The Journal of the Acoustical Society of America, vol. 77, no. 3, pp. 1069-1077, 1985. [Online]. Available: http://link.aip.org/link/? JAS/77/1069/1   DOI
10 E. D. Burnett and H. C. Schweitzer, "Attack and release times of automatic-gain-control gearing aids," The Journal of the Acoustical Society of America, vol. 62, no. 3, pp. 784-786, 1977. [Online]. Avaliable: http://link.aip.org/link/?JAS/62/784/1   DOI
11 R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "Liblinear: A library for large linear classification," Journal of Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
12 The BBC sound effects library-original series. [Online]. Available: http://www.sound-ideas.com
13 B. Schuller, S. Steidl, A. Batliner, F. Schiel, and J. Krajewski, "The interspeech 2011 speaker state challenge," in Proceedings of Interspeech, 2011.
14 S. Kim, P. Georgiou, and S. Narayanan, "On-line genre classi-fication of TV programs using audio content," in IEEE Interna-tional Conference of Acoustics, Speech, and Signal Processing, 2013.
15 K. J. Han, S. Kim, and S. Narayanan, "Strategies to improve the robustness of agglomerative hierarchical clustering under data source variation for speaker diarization," IEEE transaction of speech, audio, and language processing, pp. 1590-1601, Nov 2008.
16 H. Lachambre, R. Andre-Obrecht, and J. Pinquier, "Singing voice detection in monophonic and polyphonic contexts," 15th European Signal Processing Conference (EUSIPCO), pp. 1344-1348, 2009.