Search | Korea Science

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

Eng, Goh Kia;Ahmad, Abdul Manan
- 제어로봇시스템학회:학술대회논문집
- /
- 2004.08a
- /
- pp.1291-1295
- /
- 2004
This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.
PDF

Channel Compensation technique using silence cepstral mean subtraction (묵음 구간의 평균 켑스트럼 차감법을 이용한 채널 보상 기법)

Woo, Seung-Ok;Yun, Young-Sun
- Proceedings of the KSPS conference
- /
- 2005.04a
- /
- pp.49-52
- /
- 2005
Cepstral Mean Subtraction (CMS) makes effectively compensation for a channel distortion, but there are some shortcomings such as distortions of feature parameters, waiting for the whole speech sentence. By assuming that the silence parts have the channel characteristics, we consider the channel normalization using subtraction of cepstral means which are only obtained in the silence areas. If the considered techniques are successfully used for the channel compensation, the proposed method can be used for real time processing environments or time important areas. In the experiment result, however, the performance of our method is not good as CMS technique. From the analysis of the results, we found potentiality of the proposed method and will try to find the technique reducing the gap between CMS and ours method.
PDF

Speech/Music Discrimination Using Multi-dimensional MMCD (다차원 MMCD를 이용한 음성/음악 판별)

Choi, Mu-Yeol;Song, Hwa-Jeon;Park, Seul-Han;Kim, Hyung-Soon
- MALSORI
- /
- no.60
- /
- pp.191-201
- /
- 2006
Discrimination between speech and music is important in many multimedia applications. Previously we proposed a new parameter for speech/music discrimination, the mean of minimum cepstral distances (MMCD), and it outperformed the conventional parameters. One weakness of MMCD is that its performance depends on range of candidate frames to compute the minimum cepstral distance, which requires the optimal selection of the range experimentally. In this paper, to alleviate the problem, we propose a multi-dimensional MMCD parameter which consists of multiple MMCDS with combination of different candidate frame ranges. Experimental results show that the multi-dimensional MMCD parameter yields an error rate reduction of 22.5% compared with the optimally chosen one-dimensional MMCD parameter.
PDF

A Cepstral Analysis of Breathy Voice with Vocal Fold Paralysis (성대마비로 인한 기식 음성에 대한 Cepstral 분석)

Kang, Young-Ae;Seong, Cheol-Jae
- Phonetics and Speech Sciences
- /
- v.4 no.2
- /
- pp.89-94
- /
- 2012
The aim of this study is to investigate the usefulness of the parameter CPP (cepstral peak prominence) and LTAS (long term average spectrum) band energy for an analysis of breathy voice with vocal fold paralysis. Thirty-four female subjects who have vocal paralysis after thyroidectomy participated in this study. According to the perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 13). Maximum sustained phonation task was measured for acoustic analysis. CPP-related (i.e. mean F0, mean CPP, and mean CPPs) and LTAS-related (i.e. minimum, maximum, and mean) parameters were used. Independent samples t-test was conducted. Regarding CPP, there are significant differences in mean CPP and mean CPPs between groups. The values of mean CPP and CPPs in the non-breathy voice group are higher than those in the breathy voice group. The CPP could be regarded as the useful parameter for breathy voice analysis in the clinic. When it comes to LTAS, energy from 0 to 2 kHz are significantly different between groups. The minimum value of non-breathy group is lower than that of breathy group, whereas the maximum value of non-breathy group is higher. The frequency band below 2 kHz seems to be related to breathy voice.
https://doi.org/10.13064/KSSS.2012.4.2.089 인용 PDF

Design of a Quantization Algorithm of the Speech Feature Parameters for the Distributed Speech Recognition (분산 음성 인식 시스템을 위한 특징 계수 양자화 방식 설계)

Lee Joonseok;Yoon Byungsik;Kang Sangwon
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.4
- /
- pp.217-223
- /
- 2005
In this paper, we propose a predictive block constrained trellis coded quantization (BC-TCQ) to quantize cepstral coefficients for the distributed speech recognition. For Prediction of the cepstral coefficients. the 1st order auto-regressive (AR) predictor is used. To quantize the prediction error signal effectively. we use a BC-TCQ. The performance is compared to the split vector quantizers used in the ETSI standard, demonstrating reduction in the cepstral distance and computational complexity.
PDF KSCI

Digital Isolated Word Recognition System based on MFCC and DTW Algorithm (MFCC와 DTW에 알고리즘을 기반으로 한 디지털 고립단어 인식 시스템)

Zang, Xian;Chong, Kil-To
- Proceedings of the KIEE Conference
- /
- 2008.10b
- /
- pp.290-291
- /
- 2008
The most popular speech feature used in speech recognition today is the Mel-Frequency Cepstral Coefficients (MFCC) algorithm, which could reflect the perception characteristics of the human ear more accurately than other parameters. This paper adopts MFCC and its first order difference, which could reflect the dynamic character of speech signal, as synthetical parametric representation. Furthermore, we quote Dynamic Time Warping (DTW) algorithm to search match paths in the pattern recognition process. We use the software "GoldWave" to record English digitals in the lab environments and the simulation results indicate the algorithm has higher recognition accuracy than others using LPCC, etc. as character parameters in the experiment for Digital Isolated Word Recognition (DIWR) system.
PDF

The Effect of the Telephone Channel to the Performance of the Speaker Verification System (전화선 채널이 화자확인 시스템의 성능에 미치는 영향)

조태현;김유진;이재영;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.5
- /
- pp.12-20
- /
- 1999
In this paper, we compared speaker verification performance of the speech data collected in clean environment and in channel environment. For the improvement of the performance of speaker verification gathered in channel, we have studied on the efficient feature parameters in channel environment and on the preprocessing. Speech DB for experiment is consisted of Korean doublet of numbers, considering the text-prompted system. Speech features including LPCC(Linear Predictive Cepstral Coefficient), MFCC(Mel Frequency Cepstral Coefficient), PLP(Perceptually Linear Prediction), LSP(Line Spectrum Pair) are analyzed. Also, the preprocessing of filtering to remove channel noise is studied. To remove or compensate for the channel effect from the extracted features, cepstral weighting, CMS(Cepstral Mean Subtraction), RASTA(RelAtive SpecTrAl) are applied. Also by presenting the speech recognition performance on each features and the processing, we compared speech recognition performance and speaker verification performance. For the evaluation of the applied speech features and processing methods, HTK(HMM Tool Kit) 2.0 is used. Giving different threshold according to male or female speaker, we compare EER(Equal Error Rate) on the clean speech data and channel data. Our simulation results show that, removing low band and high band channel noise by applying band pass filter(150～3800Hz) in preprocessing procedure, and extracting MFCC from the filtered speech, the best speaker verification performance was achieved from the view point of EER measurement.
PDF

Speech/Music Discrimination Using Multi-dimensional MMCD (다차원 MMCD를 이용한 음성/음악 판별)

Choi, Mu-Yeol;Song, Hwa-Jeon;Park, Seul-Han;Kim, Hyung-Soon
- Proceedings of the KSPS conference
- /
- 2006.11a
- /
- pp.142-145
- /
- 2006
Discrimination between speech and music is important in many multimedia applications. Previously we proposed a new parameter for speech/music discrimination, the mean of minimum cepstral distances (MMCD), and it outperformed the conventional parameters. One weakness of it is that its performance depends on range of candidate frames to compute the minimum cepstral distance, which requires the optimal selection of the range experimentally. In this paper, to alleviate the problem, we propose a multi-dimensional MMCD parameter which consists of multiple MMCDs with different ranges of candidate frames. Experimental results show that the multi-dimensional MMCD parameter yields an error rate reduction of 22.5% compared with the optimally chosen one-dimensional MMCD parameter.
PDF

A Study on Comfortableness Evaluation Technique of Chairs using Electroencephalogram (뇌파를 이용한 의자의 쾌적성 평가 기술에 관한 연구)

김동준
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.52 no.12
- /
- pp.702-707
- /
- 2003
This study describes a new technique for human sensibility evaluation using electroencephalogram(EEG). Production of EEG is assumed to be linear. The linear predictor coefficients and the linear cepstral coefficients of EEG are used as the feature parameters of sensibility and pattern classification performances of them are compared. Using the better parameter, a human sensibility evaluation algorithm is designed. The obtained results are as follows. The linear predictor coefficients showed the better performance in pattern classification than the linear cepstral coefficients. Then, using the linear predictor coefficients as the feature parameter, a human sensibility evaluation algorithm is developed at the base of a multi-layer neural network. This algorithm showed 90% of accuracy in comfortableness evaluation in spite of fluctuations in statistics of EEG signal.
PDF KSCI

The Comparison of Speech Feature Parameters for Emotion Recognition (감정 인식을 위한 음성의 특징 파라메터 비교)

김원구
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2004.04a
- /
- pp.470-473
- /
- 2004
In this paper, the comparison of speech feature parameters for emotion recognition is studied for emotion recognition using speech signal. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy. MFCC parameters and their derivatives with or without cepstral mean subfraction are also used to evaluate the performance of the conventional pattern matching algorithms. Pitch and energy Parameters were used as a Prosodic information and MFCC Parameters were used as phonetic information. In this paper, In the Experiments, the vector quantization based emotion recognition system is used for speaker and context independent emotion recognition. Experimental results showed that vector quantization based emotion recognizer using MFCC parameters showed better performance than that using the Pitch and energy parameters. The vector quantization based emotion recognizer achieved recognition rates of 73.3％ for the speaker and context independent classification.
PDF

Search Result 59, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)