• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.031 seconds

Parts-based Feature Extraction of Speech Spectrum Using Non-Negative Matrix Factorization (Non-Negative Matrix Factorization을 이용한 음성 스펙트럼의 부분 특징 추출)

  • 박정원;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.49-52
    • /
    • 2003
  • In this paper, we propose new speech feature parameter using NMf(Non-Negative Matrix Factorization). NMF can represent multi-dimensional data based on effective dimensional reduction through matrix factorization under the non-negativity constraint, and reduced data present parts-based features of input data. In this paper, we verify about usefulness of NMF algorithm for speech feature extraction applying feature parameter that is got using NMF in Mel-scaled filter bank output. According to recognition experiment result, we could confirm that proposal feature parameter is superior in recognition performance than MFCC(mel frequency cepstral coefficient) that is used generally.

  • PDF

New Parameter on Speech and EGG; Glottal Closure Delay Ratio (음성신호와 전기성문파를 이용하는 새로운 매개변수 ; 성대 폐쇄 지연비율(Glottal Closure Delay Ratio))

  • Choi, Jong-Min;Kwon, Tack-Kyun;Jung, Eun-Jung;Lee, Myung-Chul;Kim, Kwang-Hyun;Sung, Myung-Whun;Park, Kwang-Suk
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.18 no.1
    • /
    • pp.22-25
    • /
    • 2007
  • Background and Objectives: Biomedical signals have been usually used for the diagnosis of the laryngeal function such as speech, electroglottograph(EGG), airflow and other signals. But, in most cases these signals were analysed separately. Here, we propose a new interchannel parameter Glottal Closure Delay Ratio(GCDR) which is estimated from speech and EGG measured simultaneously. Materials and Method: Speech and EGG signal were recorded simultaneously from 13 normal subjects, 39 patients. The patients' data included 16 polyps and 23 vocal folds palsy. Time difference between glottal closing instance on EGG and the first maximum peak on speech in a pitch period was calculated. Glottal closing instance was defined as the maximum peak on the first derivative of EGG signal(dEGG). Results: The standard deviation and jitter were calculated using 20-30 GCDRs extracted from each data, and they are significant different between normal and vocal fold paralysis group. Conclusion: The GCDR may be the first index reflecting speech and EGG characteristics and the perturbation of this parameter was significant different between normal and vocal fold paralysis group.

  • PDF

On a Detection of the ZCR-Parameter for Higher Formants of Speech Signals (음성신호의 상위 포만트에 대한 ZCR-파라미터 검출에 관한 연구)

  • 유건수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1992.06a
    • /
    • pp.49-53
    • /
    • 1992
  • In many applications such as speech analysis, speech coding, speech recognition, etc., the voiced-unvoiced decision should be performed correctly for efficient processing. One of the parameters which are used for voice-unvoiced decision is zero-crossing. But the information of higher formants have not represented as the zero-crossing rate for higher formants of speech signals.

  • PDF

On a detecting the transition segments of speech signal by energ approximatio degree of the synchronized pitch (피치 동기된 에너지 유사도에 의한 음성신호의 전이구간 검출)

  • 김종득;박형빈;김대호;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.603-606
    • /
    • 1998
  • In a large number of words and the continued speech recognition system using a phoneme as teh recognition unit, it is necessary to segment processing. In this paper, a normalized AMDF new method. The suggested parameter represents a degree of sharpness at valley point. This method can detect the speech segment between the steady state and transient region to the continued speech without a prior information of speech signal.

  • PDF

On Improving the Effects of Varying the Window Length on Speech Energy Computation (음성 에너지계산에서 창함수-길이 변화영향의 개선에 관한 연구)

  • Bae, Myung-Jin;Ann, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.9 no.2
    • /
    • pp.34-41
    • /
    • 1990
  • The energy parameter is widely used in pre-processing of speech signals, because it represent the phoneme characteristics of well But, the energy parameter is affected by the window length during the extracting. Thus, in this paper, the window length effects are studied in detail, and we proposed a new energy extraction algorithm that reduces the length effects. The energy contours with this algorithm are well representing for the characteristics of speech phonemes. And the computations to implement the algorithm are only required one subtraction, one addition, and two comparison aperation per speech sample.

  • PDF

Implementation of Real-time Wheel Order Recognition System Based on the Predictive Parameters for Speaker's Intention

  • Moon, Serng-Bae;Jun, Seung-Hwan
    • Journal of Navigation and Port Research
    • /
    • v.35 no.7
    • /
    • pp.551-556
    • /
    • 2011
  • In this paper new enhanced post-process predicting the speaker's intention was suggested to implement the real-time control module for ship's autopilot using speech recognition algorithm. The parameter was developed to predict the likeliest wheel order based on the previous order and expected to increase the recognition rate more than pre-recognition process depending on the universal speech recognition algorithms. The values of parameter were assessed by five certified deck officers being good at conning vessel. And the entire wheel order recognition process were programmed to TMS320C5416 DSP so that the system could recognize the speaker's orders and control the autopilot in real-time. We conducted some experiments to verify the usefulness of suggested module. As a result, we have confirmed that the post-recognition process module could make good enough accuracy in recognition capabilities to realize the autopilot being operated by the speech recognition system.

On Detcdting the Steady State Segments of Speech Waveform by using the Normalized AMDF (규준화된 AMDF 이용한 음성파형의안정상태 구간검출)

  • Bae, Myung-Jin;Kim, Ul-Je;Ahn, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.3
    • /
    • pp.44-50
    • /
    • 1991
  • To recognize continued speech, it is necessary to segment the connected acoustic signal into phonetic units. In this paper, as a parameter to detect the transition regions in continued speech, we propose a new noramlized AMDF. The suggested parameter represents a change rate of magnitude of speech signals. As comparing this value with the adjactent frames value the state of the frames can be distinguished as a level between the steady state and transient state.

  • PDF

Implementation of Speech Recognition Security System Using Speaker Defendent Algorithm (화자 종속 알고리즘을 이용한 음성 인식 보안 시스템 구현)

  • 김영현;문철홍
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.65-68
    • /
    • 2003
  • In this paper, a speech recognition system using a speaker defendant algorithm is implemented on the PC. Results are loaded on a LDM display system that employs Intel StrongArm SA-1110. This research has completed so that this speech recognition system may correct its shortcomings. Sometimes a former system is operated by similar speech, not a same one. To input a vocalization is processed two times to solve mentioned defects. When references are creating, variable start-point and end-point are given to make efficient references. This references and new references are changed into feature parameter, LPC and MFCC. DTW is excuted using feature parameter. This security system will give user permission under fore execution have same result.

  • PDF

A Korean Speech Recognition Using Fuzzy Rule Base (Fuzzy Rule Base를 이용한 한국어 연속 음성인식)

  • Song, Jeong-Young
    • The Journal of Engineering Research
    • /
    • v.2 no.1
    • /
    • pp.13-21
    • /
    • 1997
  • This paper describes how to represent varations of feature parameters to improve recognition of continuous speech. For speech recognition, feature parameters, which are formant frequencies, pitches, logarithmic energies and zero crossing retes are used in general. But, their values and variations depend on speakers, for example disparities between man and woman, and on their age. It is difficult to decide a priority the value of the variation width. Hence, we try to represent this variation by introducing fuzziness and recognize a continuous speech by fuzzy inference using fuzzy production rules.

  • PDF

A Study on the Segmentation of Speech Signal into Phonemic Units (음성 신호의 음소 단위 구분화에 관한 연구)

  • Lee, Yeui-Cheon;Lee, Gang-Sung;Kim, Soon-Hyon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.4
    • /
    • pp.5-11
    • /
    • 1991
  • This paper suggests a segmentation method of speech signal into phonemic units. The suggested segmentation system is speaker-independent and performed without anyprior information of speech signal. In segmentation process, we first divide input speech signal into purevoiced region and not pure voiced speech regions. After then we apply the second algorithm which segments each region into the detailed phonemic units by using the voiced detection parameters, i.e., the time variation of 0th LPC cepstrum coefficient parameter and the ZCR parameter. Types of speech, used to prove the availability of segmentation algorithm suggested in this paper, are the vocabulary composed of isolated words and continuous words. According to the experiments, the successful segmentation rate for 507 phonemic units involved in the total vocabulary is 91.7%.

  • PDF