Search | Korea Science

Detection of Laryngeal Pathology in Speech Using Multilayer Perceptron Neural Networks (다층 퍼셉트론 신경회로망을 이용한 후두 질환 음성 식별)

Kang Hyun Min;Kim Yoo Shin;Kim Hyung Soon
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.115-118
- /
- 2002
Neural networks have been known to have great discriminative power in pattern classification problems. In this paper, the multilayer perceptron neural networks are employed to automatically detect laryngeal pathology in speech. Also new feature parameters are introduced which can reflect the periodicity of speech and its perturbation. These parameters and cepstral coefficients are used as input of the multilayer perceptron neural networks. According to the experiment using Korean disordered speech database, incorporation of new parameters with cepstral coefficients outperforms the case with only cepstral coefficients.
PDF

Snorer-Dependent Snore Recognition Using LPC Cepstral Coefficients (LPC 켑스트럼 계수를 이용한 특정인의 코골이 인식)

최호선;장원규;이경중
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.52 no.9
- /
- pp.554-559
- /
- 2003
In this paper the possibility of snorer-dependent snore recognition using cepstral coefficients was suggested. We assumed that snore and speech sounds have some similarities and we used cepstral coefficients which are widely used for speech recognition. Snoring data were acquired from 18 persons including 5 patients diagnosed as snore patient. To evaluate the performance of proposed method, the distance ratio based on LPC cepstral coefficients was selected as an index for snorer-dependent snore recognition. As a result, distance ratio of 3 was selected as optimal value showing the most efficient snorer-dependent snore recognition, which is high accuracy of 95.05％ on average. In conclusion, the proposed method showed the possibilities to be applied in clinical applications for snorer-dependent snore recognition.
PDF KSCI

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.3-10
- /
- 2015
In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.
https://doi.org/10.13064/KSSS.2015.7.1.003 인용 PDF KSCI

Filtering of Filter-Bank Energies for Robust Speech Recognition

Jung, Ho-Young
- ETRI Journal
- /
- v.26 no.3
- /
- pp.273-276
- /
- 2004
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log-spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance-based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log-spectral domain corresponding to the cepstral liftering. The proposed method performs a high-pass filtering based on the decorrelation of filter-bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.
PDF

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

Eng, Goh Kia;Ahmad, Abdul Manan
- 제어로봇시스템학회:학술대회논문집
- /
- 2004.08a
- /
- pp.1291-1295
- /
- 2004
This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.
PDF

Sound Reinforcement Based on Context Awareness for Hearing Impaired (청각장애인을 위한 상황인지기반의 음향강화기술)

Choi, Jae-Hun;Chang, Joon-Hyuk
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.48 no.5
- /
- pp.109-114
- /
- 2011
In this paper, we apply a context awareness based on Gaussian mixture model (GMM) to a sound reinforcement for hearing impaired. In our approach, the harmful sound amplified through the sound reinforcement algorithm according to context awareness based on GMM which is constructed as Mel-frequency cepstral coefficients (MFCC) feature vector from sound data. According to the experimental results, the proposed approach is found to be effective in the various acoustic environments.
PDF KSCI

Text-Independent Speaker Identification System Based On Vowel And Incremental Learning Neural Networks

Heo, Kwang-Seung;Lee, Dong-Wook;Sim, Kwee-Bo
- 제어로봇시스템학회:학술대회논문집
- /
- 2003.10a
- /
- pp.1042-1045
- /
- 2003
In this paper, we propose the speaker identification system that uses vowel that has speaker's characteristic. System is divided to speech feature extraction part and speaker identification part. Speech feature extraction part extracts speaker's feature. Voiced speech has the characteristic that divides speakers. For vowel extraction, formants are used in voiced speech through frequency analysis. Vowel-a that different formants is extracted in text. Pitch, formant, intensity, log area ratio, LP coefficients, cepstral coefficients are used by method to draw characteristic. The cpestral coefficients that show the best performance in speaker identification among several methods are used. Speaker identification part distinguishes speaker using Neural Network. 12 order cepstral coefficients are used learning input data. Neural Network's structure is MLP and learning algorithm is BP (Backpropagation). Hidden nodes and output nodes are incremented. The nodes in the incremental learning neural network are interconnected via weighted links and each node in a layer is generally connected to each node in the succeeding layer leaving the output node to provide output for the network. Though the vowel extract and incremental learning, the proposed system uses low learning data and reduces learning time and improves identification rate.
PDF

Mel-Frequency Cepstral Coefficients Using Formants-Based Gaussian Distribution Filterbank (포만트 기반의 가우시안 분포를 가지는 필터뱅크를 이용한 멜-주파수 켑스트럴 계수)

Son, Young-Woo;Hong, Jae-Keun
- The Journal of the Acoustical Society of Korea
- /
- v.25 no.8
- /
- pp.370-374
- /
- 2006
Mel-frequency cepstral coefficients are widely used as the feature for speech recognition. In FMCC extraction process. the spectrum. obtained by Fourier transform of input speech signal is divided by met-frequency bands, and each band energy is extracted for the each frequency band. The coefficients are extracted by the discrete cosine transform of the obtained band energy. In this Paper. we calculate the output energy for each bandpass filter by taking the weighting function when applying met-frequency scaled bandpass filter. The weighting function is Gaussian distributed function whose center is at the formant frequency In the experiments, we can see the comparative performance with the standard MFCC in clean condition. and the better Performance in worse condition by the method proposed here.
https://doi.org/10.7776/ASK.2006.25.8.370 인용 PDF KSCI

A Study on Comfortableness Evaluation Technique of Chairs using Electroencephalogram (뇌파를 이용한 의자의 쾌적성 평가 기술에 관한 연구)

김동준
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.52 no.12
- /
- pp.702-707
- /
- 2003
This study describes a new technique for human sensibility evaluation using electroencephalogram(EEG). Production of EEG is assumed to be linear. The linear predictor coefficients and the linear cepstral coefficients of EEG are used as the feature parameters of sensibility and pattern classification performances of them are compared. Using the better parameter, a human sensibility evaluation algorithm is designed. The obtained results are as follows. The linear predictor coefficients showed the better performance in pattern classification than the linear cepstral coefficients. Then, using the linear predictor coefficients as the feature parameter, a human sensibility evaluation algorithm is developed at the base of a multi-layer neural network. This algorithm showed 90% of accuracy in comfortableness evaluation in spite of fluctuations in statistics of EEG signal.
PDF KSCI

Speech/Music Discrimination Using Mel-Cepstrum Modulation Energy (멜 켑스트럼 모듈레이션 에너지를 이용한 음성/음악 판별)

Kim, Bong-Wan;Choi, Dea-Lim;Lee, Yong-Ju
- MALSORI
- /
- no.64
- /
- pp.89-103
- /
- 2007
In this paper, we introduce mel-cepstrum modulation energy (MCME) for a feature to discriminate speech and music data. MCME is a mel-cepstrum domain extension of modulation energy (ME). MCME is extracted on the time trajectory of Mel-frequency cepstral coefficients, while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect the MCME to perform better than the ME. To find out the best modulation frequency for MCME, we perform experiments with 4 Hz to 20 Hz modulation frequency. To show effectiveness of the proposed feature, MCME, we compare the discrimination accuracy with the results obtained from the ME and the cepstral flux.
PDF

Search Result 113, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)