Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns

Kim, Samuel;

doi:10.5573/ieek.2013.50.12.224

전자공학회논문지 (Journal of the Institute of Electronics and Information Engineers)

제50권12호
/
Pages.224-231
/
2013
/
2287-5026(pISSN)
/
2288-159X(eISSN)

대한전자공학회 (The Institute of Electronics and Information Engineers)

DOI QR Code

발생/소멸 패턴을 이용한 비정형 혼합 오디오의 주성분 검출

Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns

김사무엘

Kim, Samuel (Given Zone LLC)

투고 : 2013.10.24
심사 : 2013.11.27
발행 : 2013.12.25

https://doi.org/10.5573/ieek.2013.50.12.224 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

이 논문에서는 비정형 혼합 오디오 신호에서 청취자에게 전달 되도록 의도된 주된 신호의 종류를 검출해 낼 수 있는 방법을 제안한다. 주된 신호의 종류는 음성, 음악, 음향효과로 정하였으며, 인텐서티 기반의 발생/소멸 패턴에서 추출할 수 있는 특징을 사용하여 그들을 구별할 수 있는 방법을 소개한다. 청취자가 주어진 오디오 신호에서 주된 신호를 받아들이는 주관적인 평가를 반영하기 위해서, 웹기반의 평가시스템을 도입하여 18시간의 다양한 종류의 장르 비디오의 오디오를 평가하였다. 실험을 통하여 비디오의 장르별로 각기 다른 성능을 보이지만 가능성 있는 (음성위주의 토크쇼의 경우 86.7%, 액션 영화 49.3%)정확도를 보였다.

Defining the concept of prominent audio content as the most informative audio content from the users' perspective within a given unstructured audio segment, we propose a simple but robust intensity-based attack/release pattern features to detect the prominent audio content. We also propose a web-based annotation procedure to retrieve users' subjective perception and annotated 18 hours of video clips across various genres, such as cartoon, movie, news, etc. The experiments with a linear classification method whose models are trained for speech, music, and sound effect demonstrate promising - but varying across the genres of programs - results (e.g., 86.7% weighted accuracy for speech-oriented talk shows and 49.3% weighted accuracy for {action movies}).

키워드

참고문헌

K.-K. Lee, Y.-H. Cho, and K.-S. Park, "Implementation of an intelligent audio graphic equalizer system," Journal of The Institute of Electronics Engineers of Korea, vol. 43, no. 3, pp. 76-83, 2006.
V. Hamacher, J. Chalupper, J. Eggers, E. Fischer, U. Kornagel, H. Puder, and U. Rass, "Signal processing in high-end hearing aids: state of the art, challenges, and future trends," EURASIP J. Appl. Signal Process., vol. 2005, pp. 2915-2929, Jan. 2005. [Online]. Available: http://dx.doi.org/10.1155/ASP.2005.2915
J. M. Kates, "Classification of background noises for hearing-aid applications," The Journal of the Acoustical Society of America, vol. 97, no. 1, pp. 461-470, 1995. [Online]. Available: http://link.aip.org/link/?JAS/97/461/1 https://doi.org/10.1121/1.412274
P. Nordqvist and A. Leijon, "An efficient robust sound classifi-cation algorithm for hearing aids," The Journal of the Acousti-cal Society of America, vol. 115, no. 6, pp. 3033-3041, 2004. [Online]. Available: http://link.aip.org/link/? JAS/115/3033/1 https://doi.org/10.1121/1.1710877
S. Kim, P. Georgiou, and S. Narayanan, "Latent acoustic topic models for unstructured audio classification," APSIPA Transactions of Signal and Information Processing, vol. 1, Nov. 2012.
J.-E. Kim and I.-S. Lee, "Speech/mixed content signal classification based on GMM using MFCC," Journal of The Institute of Electronics Engineers of Korea, vol. 50, no. 2, pp. 185-192, 2013. https://doi.org/10.5573/ieek.2013.50.2.185
S. Chu, S. Narayanan, and C.-C. J. Kuo, "A semi-supervised learning approach to online audio background detection," in IEEE International Conference on Acoustic, Speech, and Sig-nal Processing (ICASSP), 2009.
J. Pinquier and R. Andre-Obrecht, "Audio indexing: pri-mary components retrieval," Multimedia tools and applica-tions, vol. 30, no. 3, pp. 313-330, 2006. https://doi.org/10.1007/s11042-006-0027-1
T. Houtgast and H. J. M. Steeneken, "A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria," The Journal of the Acoustical Society of America, vol. 77, no. 3, pp. 1069-1077, 1985. [Online]. Available: http://link.aip.org/link/? JAS/77/1069/1 https://doi.org/10.1121/1.392224
E. D. Burnett and H. C. Schweitzer, "Attack and release times of automatic-gain-control gearing aids," The Journal of the Acoustical Society of America, vol. 62, no. 3, pp. 784-786, 1977. [Online]. Avaliable: http://link.aip.org/link/?JAS/62/784/1 https://doi.org/10.1121/1.381554
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "Liblinear: A library for large linear classification," Journal of Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
The BBC sound effects library-original series. [Online]. Available: http://www.sound-ideas.com
B. Schuller, S. Steidl, A. Batliner, F. Schiel, and J. Krajewski, "The interspeech 2011 speaker state challenge," in Proceedings of Interspeech, 2011.
S. Kim, P. Georgiou, and S. Narayanan, "On-line genre classi-fication of TV programs using audio content," in IEEE Interna-tional Conference of Acoustics, Speech, and Signal Processing, 2013.
K. J. Han, S. Kim, and S. Narayanan, "Strategies to improve the robustness of agglomerative hierarchical clustering under data source variation for speaker diarization," IEEE transaction of speech, audio, and language processing, pp. 1590-1601, Nov 2008.
H. Lachambre, R. Andre-Obrecht, and J. Pinquier, "Singing voice detection in monophonic and polyphonic contexts," 15th European Signal Processing Conference (EUSIPCO), pp. 1344-1348, 2009.

전자공학회논문지 (Journal of the Institute of Electronics and Information Engineers)

발생/소멸 패턴을 이용한 비정형 혼합 오디오의 주성분 검출

Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)