통합 검색 | Korea Science

음악과 음성 판별을 위한 웨이브렛 영역에서의 특징 파라미터 (Feature Parameter Extraction and Analysis in the Wavelet Domain for Discrimination of Music and Speech)

김정민;배건성
- 대한음성학회지:말소리
- /
- 제61호
- /
- pp.63-74
- /
- 2007
Discrimination of music and speech from the multimedia signal is an important task in audio coding and broadcast monitoring systems. This paper deals with the problem of feature parameter extraction for discrimination of music and speech. The wavelet transform is a multi-resolution analysis method that is useful for analysis of temporal and spectral properties of non-stationary signals such as speech and audio signals. We propose new feature parameters extracted from the wavelet transformed signal for discrimination of music and speech. First, wavelet coefficients are obtained on the frame-by-frame basis. The analysis frame size is set to 20 ms. A parameter $E_{sum}$ is then defined by adding the difference of magnitude between adjacent wavelet coefficients in each scale. The maximum and minimum values of $E_{sum}$ for period of 2 seconds, which corresponds to the discrimination duration, are used as feature parameters for discrimination of music and speech. To evaluate the performance of the proposed feature parameters for music and speech discrimination, the accuracy of music and speech discrimination is measured for various types of music and speech signals. In the experiment every 2-second data is discriminated as music or speech, and about 93% of music and speech segments have been successfully detected.
PDF

MFCC의 단구간 시간 평균을 이용한 음성/음악 판별 파라미터 성능 향상 (Improving Speech/Music Discrimination Parameter Using Time-Averaged MFCC)

최무열;김형순
- 대한음성학회지:말소리
- /
- 제64호
- /
- pp.155-169
- /
- 2007
Discrimination between speech and music is important in many multimedia applications. In our previous work, focusing on the spectral change characteristics of speech and music, we presented a method using the mean of minimum cepstral distances (MMCD), and it showed a very high discrimination performance. In this paper, to further improve the performance, we propose to employ time-averaged MFCC in computing the MMCD. Our experimental results show that the proposed method enhances the discrimination between speech and music. Moreover, the proposed method overcomes the weakness of the conventional MMCD method whose performance is relatively sensitive to the choice of the frame interval to compute the MMCD.
PDF

켑스트럼 거리 기반의 음성/음악 판별 성능 향상 (Performance Improvement of Speech/Music Discrimination Based on Cepstral Distance)

박슬한;최무열;김형순
- 대한음성학회지:말소리
- /
- 제56호
- /
- pp.195-206
- /
- 2005
Discrimination between speech and music is important in many multimedia applications. In this paper, focusing on the spectral change characteristics of speech and music, we propose a new method of speech/music discrimination based on cepstral distance. Instead of using cepstral distance between the frames with fixed interval, the minimum of cepstral distances among neighbor frames is employed to increase discriminability between fast changing music and speech. And, to prevent misclassification of speech segments including short pause into music, short pause segments are excluded from computing cepstral distance. The experimental results show that proposed method yields the error rate reduction of$68\%$, in comparison with the conventional approach using cepstral distance.
PDF

유성음 구간 검출 알고리즘에 관한 연구 (A Novel Algorithm for Discrimination of Voiced Sounds)

장규철;우수영;유창동
- 음성과학
- /
- 제9권3호
- /
- pp.35-45
- /
- 2002
A simple algorithm for discriminating voiced sounds in a speech is proposed. In addition to low-frequency energy and zero-crossing rate (ZCR), both of which have been widely used in the past for identifying voiced sounds, the proposed algorithm incorporates pitch variation to improve the discrimination rate. Based on TIMIT corpus, evaluation result shows an improvement of 13% in the discrimination of voiced phonemes over that of the traditional algorithm using only energy and ZCR.
PDF

스펙트럴 피크 트랙 분석을 이용한 음성/음악 분류 (Speech/Music Discrimination Using Spectral Peak Track Analysis)

금지수;이현수
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2006년도 하계종합학술대회
- /
- pp.243-244
- /
- 2006
In this study, we propose a speech/music discrimination method using spectral peak track analysis. The proposed method uses the spectral peak track's duration at the same frequency channel for feature parameter. And use the duration threshold to discriminate the speech/music. Experiment result, correct discrimination ratio varies according to threshold, but achieved a performance comparable to another method and has a computational efficient for discrimination.
PDF

A Novel Speech/Music Discrimination Using Feature Dimensionality Reduction

Keum, Ji-Soo;Lee, Hyon-Soo;Hagiwara, Masafumi
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- 제10권1호
- /
- pp.7-11
- /
- 2010
In this paper, we propose an improved speech/music discrimination method based on a feature combination and dimensionality reduction approach. To improve discrimination ability, we use a feature based on spectral duration analysis and employ the hierarchical dimensionality reduction (HDR) method to reduce the effect of correlated features. Through various kinds of experiments on speech and music, it is shown that the proposed method showed high discrimination results when compared with conventional methods.
https://doi.org/10.5391/IJFIS.2010.10.1.007 인용 PDF KSCI

음성/음악 판별을 위한 특징 파라미터와 분류기의 성능비교 (Performance Comparison of Feature Parameters and Classifiers for Speech/Music Discrimination)

김형순;김수미
- 대한음성학회지:말소리
- /
- 제46호
- /
- pp.37-50
- /
- 2003
In this paper, we evaluate and compare the performance of speech/music discrimination based on various feature parameters and classifiers. As for feature parameters, we consider High Zero Crossing Rate Ratio (HZCRR), Low Short Time Energy Ratio (LSTER), Spectral Flux (SF), Line Spectral Pair (LSP) distance, entropy and dynamism. We also examine three classifiers: k Nearest Neighbor (k-NN), Gaussian Mixure Model (GMM), and Hidden Markov Model (HMM). According to our experiments, LSP distance and phoneme-recognizer-based feature set (entropy and dunamism) show good performance, while performance differences due to different classifiers are not significant. When all the six feature parameters are employed, average speech/music discrimination accuracy up to 96.6% is achieved.
PDF

다차원 MMCD를 이용한 음성/음악 판별 (Speech/Music Discrimination Using Multi-dimensional MMCD)

최무열;송화전;박슬한;김형순
- 대한음성학회지:말소리
- /
- 제60호
- /
- pp.191-201
- /
- 2006
Discrimination between speech and music is important in many multimedia applications. Previously we proposed a new parameter for speech/music discrimination, the mean of minimum cepstral distances (MMCD), and it outperformed the conventional parameters. One weakness of MMCD is that its performance depends on range of candidate frames to compute the minimum cepstral distance, which requires the optimal selection of the range experimentally. In this paper, to alleviate the problem, we propose a multi-dimensional MMCD parameter which consists of multiple MMCDS with combination of different candidate frame ranges. Experimental results show that the multi-dimensional MMCD parameter yields an error rate reduction of 22.5% compared with the optimally chosen one-dimensional MMCD parameter.
PDF

실험에 의한 음성·음악 분류 특징의 비교 분석 (Comparison & Analysis of Speech/Music Discrimination Features through Experiments)

이경록;류시우;곽재영
- 한국콘텐츠학회:학술대회논문집
- /
- 한국콘텐츠학회 2004년도 추계 종합학술대회 논문집
- /
- pp.308-313
- /
- 2004
본 논문에서는 각 특징 파라미터 조합의 음성/음악 분류 성능을 비교 분석하였다. 음향신호는 3가지(음성, 음악, 음성+음악)로 분류하였다. 본 실험에서는 분류 특징으로 멜캡스트럼, 에너지, 영교차 3가지 형태가 사용되었다. 음성/음악 분류 성능이 가장 좋은 특징간의 상호 조합을 비교 분석하였다. 실험결과 멜캡스트럼, 영교차 조합이 가장 좋은 결과(음성: 95.1%, 음악: 61.9%, 음성+음악: 55.5%)를 보인다는 것을 확인할 수 있었다.
PDF

Discrimination of Pathological Speech Using Hidden Markov Models

Wang, Jianglin;Jo, Cheol-Woo
- 음성과학
- /
- 제13권3호
- /
- pp.7-18
- /
- 2006
Diagnosis of pathological voice is one of the important issues in biomedical applications of speech technology. This study focuses on the discrimination of voice disorder using HMM (Hidden Markov Model) for automatic detection between normal voice and vocal fold disorder voice. This is a non-intrusive, non-expensive and fully automated method using only a speech sample of the subject. Speech data from normal people and patients were collected. Mel-frequency filter cepstral coefficients (MFCCs) were modeled by HMM classifier. Different states (3 states, 5 states and 7 states), 3 mixtures and left to right HMMs were formed. This method gives an accuracy of 93.8% for train data and 91.7% for test data in the discrimination of normal and vocal fold disorder voice for sustained /a/.
PDF

검색결과 156건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)