A Study on the Signal Processing for Content-Based Audio Genre Classification

;;;

Journal of the Institute of Electronics Engineers of Korea SP (대한전자공학회논문지SP)

Volume 41 Issue 6
/
Pages.271-278
/
2004
/
1229-6384(pISSN)

The Institute of Electronics and Information Engineers (대한전자공학회)

A Study on the Signal Processing for Content-Based Audio Genre Classification

내용기반 오디오 장르 분류를 위한 신호 처리 연구

윤원중 (단국대학교 컴퓨터과학 및 통계학과) ;
이강규 (단국대학교 컴퓨터과학 및 통계학과) ;
박규식 (단국대학교 컴퓨터과학 및 통계학과)

Published : 2004.11.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a content-based audio genre classification algorithm that automatically classifies the query audio into five genres such as Classic, Hiphop, Jazz, Rock, Speech using digital sign processing approach. From the 20 seconds query audio file, the audio signal is segmented into 23ms frame with non-overlapped hamming window and 54 dimensional feature vectors, including Spectral Centroid, Rolloff, Flux, LPC, MFCC, is extracted from each query audio. For the classification algorithm, k-NN, Gaussian, GMM classifier is used. In order to choose optimum features from the 54 dimension feature vectors, SFS(Sequential Forward Selection) method is applied to draw 10 dimension optimum features and these are used for the genre classification algorithm. From the experimental result, we can verify the superior performance of the proposed method that provides near 90% success rate for the genre classification which means 10%∼20% improvements over the previous methods. For the case of actual user system environment, feature vector is extracted from the random interval of the query audio and it shows overall 80% success rate except extreme cases of beginning and ending portion of the query audio file.

본 논문에서는 디지털 신호처리를 이용하여 Classic, Hiphop, Jazz, Rock, Speech 등 5개의 오디오 장르를 자동적으로 분류하는 내용기반 오디오 장르 분류기를 제안하였다. 20초 분량의 질의 오디오로부터 23ms 크기의 Hamming window를 이동시켜 가며 Spectral Centroid, Rolloff, Flux 등 STFT 기반의 특징 계수들과 MFCC, LPC 등의 계수들을 구하여 총 54차에 해당하는 특징 벡터 열을 추출하였으며 분류 알고리즘으로는 k-NN, Gaussian, GMM 분류기를 사용하였다. 최적의 특징 벡터를 선별하는 알고리즘으로 총 54차의 특징벡터 중 가장 성능이 좋은 특징 계수들을 찾아 순차적으로 재배치하는 SFS(Sequential Forward Selection)방법을 사용하였고, 이를 이용하여 최적화 된 10차의 특징 벡터만을 선정해서 오디오 장르 분류에 사용하였다. SFS를 적용한 실험 결과 약 90% 가까운 분류 성공률을 보이고 있어 기존 연구에 비하여 약 10%∼20% 정도의 성능 향상을 꾀 할 수 있었다. 한편 실제 사용자들이 오디오 자동 장르 분류 시스템을 사용할 때 일어날 수 있는 상황을 가정하여 임의 구간에서 질의 데이터를 추출하여 실험을 수행하였으며 실험 결과 오디오 파일의 맨 앞과 맨 뒤 등 worst-case 질의를 제외하고는 약 80%대의 분류 성공률을 얻을 수 있었다.

Keywords

References

E. Wold, T. Blum, D. Keislar, and J. Wheaton, 'Content-based classification, search and retrieval of audio', IEEE Multimedia, 3(2), 1996 https://doi.org/10.1109/93.556537
T. Zhang and C. -C. Jay Kuo, 'Hierarchical System for Content-based Audio Classification and Retrieval', Proceedings of SPIE's Conference on Multimedia Storage and Archiving Systems III, SPIE Vol.3527, pp. 398-409, Boston, Nov. 1998 https://doi.org/10.1117/12.325832
G. Tzanetakis and P. Cook. 'Multifeature audio segmentation for browsing and annotation', In Proc. Workshop on applications of signal processing to audio and acoustics(WASPAA), New Paltz, NY, 1999. IEEE https://doi.org/10.1109/ASPAA.1999.810860
G. Tzanetakis and P. Cook, 'Musical Genre Classification of audio Signals', IEEE Transactions on Speech and Audio Processing, 2002
A. Ghias, J. Logan, D. Chamberlin, and B. Smith, 'Query by Humming: Musical Information Retrieval in an Audio Database', ACM Multimedia, pp. 213-236, 1995 https://doi.org/10.1145/217279.215273
M. Melucci and N. Orio, 'Musical Information Retrieval using Melodic Surface', Proceedings of the fourth ACM conference on Digital libraries, pp. 152-160, August 1999 https://doi.org/10.1145/313238.313293
R. J. McNab, L. Smith, I. H. Witten, C. L. Henderson, 'Tune Retrieval in the Multimedia Library', Multimedia Tools and Applications, vol.10, pp. 113-132, 2000 https://doi.org/10.1023/A:1009606600500
Lutz Prechelt and Rainer Typke, 'An Interface for Melody Input', ACM Transactions on Computer-Human Interaction, Vol. 8, No.2, pp. 133-149, June 2001 https://doi.org/10.1145/376929.376978
S. R. Subramanya, A. Youssef, B. Narahari, and R. Simha, 'Automated Classification of Audio Data and Retrieval Based on Audio Classes', International Conference on Computers and Their Applications(ISCA), Cancun, Mexico, April 1999
J. M. Gray. An Exploration of Musical Timbre. PhD thesis, Dept. of Psychology, Stanford University, 1975
M. J. Carey, E. S. Parris, and H. Lloyd-Thomas, 'A comparison of features for speech, music discrimination', In Proc. ICASSP, pp. 1432-1436, March 1999 https://doi.org/10.1109/ICASSP.1999.758084
J. Makhoul, 'Linear prediction: A tutorial overview', Proceedings of the IEEE, Apr. 1975
M. Slaney, 'A critique of pure audition', Computational Auditory Scene Analysis, 1997

Journal of the Institute of Electronics Engineers of Korea SP (대한전자공학회논문지SP)

A Study on the Signal Processing for Content-Based Audio Genre Classification

내용기반 오디오 장르 분류를 위한 신호 처리 연구

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)