Search | Korea Science

Feature Parameter Extraction and Analysis in the Wavelet Domain for Discrimination of Music and Speech (음악과 음성 판별을 위한 웨이브렛 영역에서의 특징 파라미터)

Kim, Jung-Min;Bae, Keun-Sung
- MALSORI
- /
- no.61
- /
- pp.63-74
- /
- 2007
Discrimination of music and speech from the multimedia signal is an important task in audio coding and broadcast monitoring systems. This paper deals with the problem of feature parameter extraction for discrimination of music and speech. The wavelet transform is a multi-resolution analysis method that is useful for analysis of temporal and spectral properties of non-stationary signals such as speech and audio signals. We propose new feature parameters extracted from the wavelet transformed signal for discrimination of music and speech. First, wavelet coefficients are obtained on the frame-by-frame basis. The analysis frame size is set to 20 ms. A parameter $E_{sum}$ is then defined by adding the difference of magnitude between adjacent wavelet coefficients in each scale. The maximum and minimum values of $E_{sum}$ for period of 2 seconds, which corresponds to the discrimination duration, are used as feature parameters for discrimination of music and speech. To evaluate the performance of the proposed feature parameters for music and speech discrimination, the accuracy of music and speech discrimination is measured for various types of music and speech signals. In the experiment every 2-second data is discriminated as music or speech, and about 93% of music and speech segments have been successfully detected.
PDF

Improving Speech/Music Discrimination Parameter Using Time-Averaged MFCC (MFCC의 단구간 시간 평균을 이용한 음성/음악 판별 파라미터 성능 향상)

Choi, Mu-Yeol;Kim, Hyung-Soon
- MALSORI
- /
- no.64
- /
- pp.155-169
- /
- 2007
Discrimination between speech and music is important in many multimedia applications. In our previous work, focusing on the spectral change characteristics of speech and music, we presented a method using the mean of minimum cepstral distances (MMCD), and it showed a very high discrimination performance. In this paper, to further improve the performance, we propose to employ time-averaged MFCC in computing the MMCD. Our experimental results show that the proposed method enhances the discrimination between speech and music. Moreover, the proposed method overcomes the weakness of the conventional MMCD method whose performance is relatively sensitive to the choice of the frame interval to compute the MMCD.
PDF

Performance Improvement of Speech/Music Discrimination Based on Cepstral Distance (켑스트럼 거리 기반의 음성/음악 판별 성능 향상)

Park Seul-Han;Choi Mu Yeol;Kim Hyung Soon
- MALSORI
- /
- no.56
- /
- pp.195-206
- /
- 2005
Discrimination between speech and music is important in many multimedia applications. In this paper, focusing on the spectral change characteristics of speech and music, we propose a new method of speech/music discrimination based on cepstral distance. Instead of using cepstral distance between the frames with fixed interval, the minimum of cepstral distances among neighbor frames is employed to increase discriminability between fast changing music and speech. And, to prevent misclassification of speech segments including short pause into music, short pause segments are excluded from computing cepstral distance. The experimental results show that proposed method yields the error rate reduction of$68\%$, in comparison with the conventional approach using cepstral distance.
PDF

A Novel Algorithm for Discrimination of Voiced Sounds (유성음 구간 검출 알고리즘에 관한 연구)

Jang, Gyu-Cheol;Woo, Soo-Young;Yoo, Chang-D.
- Speech Sciences
- /
- v.9 no.3
- /
- pp.35-45
- /
- 2002
A simple algorithm for discriminating voiced sounds in a speech is proposed. In addition to low-frequency energy and zero-crossing rate (ZCR), both of which have been widely used in the past for identifying voiced sounds, the proposed algorithm incorporates pitch variation to improve the discrimination rate. Based on TIMIT corpus, evaluation result shows an improvement of 13% in the discrimination of voiced phonemes over that of the traditional algorithm using only energy and ZCR.
PDF

Speech/Music Discrimination Using Spectral Peak Track Analysis (스펙트럴 피크 트랙 분석을 이용한 음성/음악 분류)

Keum, Ji-Soo;Lee, Hyon-Soo
- Proceedings of the IEEK Conference
- /
- 2006.06a
- /
- pp.243-244
- /
- 2006
In this study, we propose a speech/music discrimination method using spectral peak track analysis. The proposed method uses the spectral peak track's duration at the same frequency channel for feature parameter. And use the duration threshold to discriminate the speech/music. Experiment result, correct discrimination ratio varies according to threshold, but achieved a performance comparable to another method and has a computational efficient for discrimination.
PDF

A Novel Speech/Music Discrimination Using Feature Dimensionality Reduction

Keum, Ji-Soo;Lee, Hyon-Soo;Hagiwara, Masafumi
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.10 no.1
- /
- pp.7-11
- /
- 2010
In this paper, we propose an improved speech/music discrimination method based on a feature combination and dimensionality reduction approach. To improve discrimination ability, we use a feature based on spectral duration analysis and employ the hierarchical dimensionality reduction (HDR) method to reduce the effect of correlated features. Through various kinds of experiments on speech and music, it is shown that the proposed method showed high discrimination results when compared with conventional methods.
https://doi.org/10.5391/IJFIS.2010.10.1.007 인용 PDF KSCI

Performance Comparison of Feature Parameters and Classifiers for Speech/Music Discrimination (음성/음악 판별을 위한 특징 파라미터와 분류기의 성능비교)

Kim Hyung Soon;Kim Su Mi
- MALSORI
- /
- no.46
- /
- pp.37-50
- /
- 2003
In this paper, we evaluate and compare the performance of speech/music discrimination based on various feature parameters and classifiers. As for feature parameters, we consider High Zero Crossing Rate Ratio (HZCRR), Low Short Time Energy Ratio (LSTER), Spectral Flux (SF), Line Spectral Pair (LSP) distance, entropy and dynamism. We also examine three classifiers: k Nearest Neighbor (k-NN), Gaussian Mixure Model (GMM), and Hidden Markov Model (HMM). According to our experiments, LSP distance and phoneme-recognizer-based feature set (entropy and dunamism) show good performance, while performance differences due to different classifiers are not significant. When all the six feature parameters are employed, average speech/music discrimination accuracy up to 96.6% is achieved.
PDF

Speech/Music Discrimination Using Multi-dimensional MMCD (다차원 MMCD를 이용한 음성/음악 판별)

Choi, Mu-Yeol;Song, Hwa-Jeon;Park, Seul-Han;Kim, Hyung-Soon
- MALSORI
- /
- no.60
- /
- pp.191-201
- /
- 2006
Discrimination between speech and music is important in many multimedia applications. Previously we proposed a new parameter for speech/music discrimination, the mean of minimum cepstral distances (MMCD), and it outperformed the conventional parameters. One weakness of MMCD is that its performance depends on range of candidate frames to compute the minimum cepstral distance, which requires the optimal selection of the range experimentally. In this paper, to alleviate the problem, we propose a multi-dimensional MMCD parameter which consists of multiple MMCDS with combination of different candidate frame ranges. Experimental results show that the multi-dimensional MMCD parameter yields an error rate reduction of 22.5% compared with the optimally chosen one-dimensional MMCD parameter.
PDF

Comparison & Analysis of Speech/Music Discrimination Features through Experiments (실험에 의한 음성·음악 분류 특징의 비교 분석)

Lee, Kyung-Rok;Ryu, Shi-Woo;Gwark, Jae-Young
- Proceedings of the Korea Contents Association Conference
- /
- 2004.11a
- /
- pp.308-313
- /
- 2004
In this paper, we compared and analyzed the discrimination performance of speech/music about combinations of each features parameter. Audio signals are classified into 3 classes (speech, music, speech and music). On three types of features, Mel-cepstrum, energy, zero-crossings used to the experiments. Then compared and analyzed the best of the combinations between features to speech/ music discrimination performance. The best result is achieved using Mel-cepstrum, energy and zero-crossings in a single feature vector (speech: 95.1%, music: 61.9%, speech & music: 55.5%).
PDF

Discrimination of Pathological Speech Using Hidden Markov Models

Wang, Jianglin;Jo, Cheol-Woo
- Speech Sciences
- /
- v.13 no.3
- /
- pp.7-18
- /
- 2006
Diagnosis of pathological voice is one of the important issues in biomedical applications of speech technology. This study focuses on the discrimination of voice disorder using HMM (Hidden Markov Model) for automatic detection between normal voice and vocal fold disorder voice. This is a non-intrusive, non-expensive and fully automated method using only a speech sample of the subject. Speech data from normal people and patients were collected. Mel-frequency filter cepstral coefficients (MFCCs) were modeled by HMM classifier. Different states (3 states, 5 states and 7 states), 3 mixtures and left to right HMMs were formed. This method gives an accuracy of 93.8% for train data and 91.7% for test data in the discrimination of normal and vocal fold disorder voice for sustained /a/.
PDF

Search Result 156, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)