Browse > Article

Speech/Music Discrimination Using Spectrum Analysis and Neural Network  

Keum, Ji-Soo (경희대학교 컴퓨터공학과)
Lim, Sung-Kil (경희대학교 컴퓨터공학과)
Lee, Hyon-Soo (경희대학교 컴퓨터공학과)
Abstract
In this research, we propose an efficient Speech/Music discrimination method that uses spectrum analysis and neural network. The proposed method extracts the duration feature parameter(MSDF) from a spectral peak track by analyzing the spectrum, and it was used as a feature for Speech/Music discriminator combined with the MFSC. The neural network was used as a Speech/Music discriminator, and we have reformed various experiments to evaluate the proposed method according to the training pattern selection, size and neural network architecture. From the results of Speech/Music discrimination, we found performance improvement and stability according to the training pattern selection and model composition in comparison to previous method. The MSDF and MFSC are used as a feature parameter which is over 50 seconds of training pattern, a discrimination rate of 94.97% for speech and 92.38% for music. Finally, we have achieved performance improvement 1.25% for speech and 1.69% for music compares to the use of MFSC.
Keywords
Speech/Music discrimination; Spectrum analysis; Spectral peak track; Speaker indexing;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Lie Lu, Hong-Jiang Zhang, Hao Jiang, 'Contents Analysis for Audio Classification and Segmentation,' IEEE Trans. Speech and Audio Proc. 10 (7) 504-516, 2002   DOI   ScienceOn
2 Eric Scheirer, Malcolm Slaney, 'Construction and Evaluation of A Robust Multifeature Speech/Music Discriminator,' in Proc. ICASSP, 2 1331-1334, 1997
3 Ji-Soo Keum, Hyon-Soo Lee, 'Speech/Music Discrimination using Spectral Peak Feature for Speaker Indexing,' in Proc. ISPACS, 323-326, 2006
4 Balaji Thoshkahna, V. Sudha, K.R. Ramakrishnan, 'A Speech-Music Discriminator using HILN Model Based Features,' in Proc. ICASSP, 5 425-428, 2006
5 박슬한, 최무열, 김형순, '켑스트럼 거리 기반의 음성/음악 판별 성능 향상,' 대한음성학회 말소리, 56 195-206, 2005
6 Tong Zhang, C.-C. Jay Kuo, 'Audio Content Analysis for Online Audiovisual Data Segmentation and Classification,' IEEE Trans. Speech and Audio Proc. 9 (4) 441-457, 2001   DOI   ScienceOn
7 지민제, '음성의 음향 스텍트로그램 분석,' 대한음성언어의학회 학술대회 심포지움 및 워크샵, 111-127, 1995
8 Soonil Kwon, Narayanan, S., 'Unsupervised Speaker Indexing Using Generic Models,' IEEE Trans. Speech and Audio Proc. 13 (5) 1004-1013, 2005   DOI   ScienceOn
9 이경록, 서봉수, 김진영, '오디오 인덱싱을 위한 음성/음악 분류 특징 비교,' 한국음향학회지 20 (2) 10-15, 2001
10 Carey, M.J., Parris, E.S., Lloyd-Thomas, H., 'A Comparison of Features for Speech, Music Discrimination,' in Proc. ICASSP, 1 1-152, 1999
11 한학용, 김수훈, 허강인, '오디오 데이터의 특징 파라미터 구성에 따른 내용 기반 분석,' 한국음향학회지, 21 (2) 182-189, 2002
12 John Saunders, 'Real-Time Discrimination of Broadcast Speech/Music,' in Proc. ICASSP, 2 993-996, 1996
13 Hard Harb, Liming Chen, 'Robust Speech Music Discrimination using Spectrum's First Order Statistics and Neural Networks.' in Proc. ISSPA, 125-128, 2003
14 Serkan Kiranyza, Ahmad Farooq Qureshi, Moncef Gabbouj, 'A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval,' IEEE Trans. Speech and Audio Proc. 14 (3) 1062-1081, 2006   DOI   ScienceOn