• Title/Summary/Keyword: speech

Search Result 7,763, Processing Time 0.036 seconds

Accurate Speech Detection based on Sub-band Selection for Robust Keyword Recognition (강인한 핵심어 인식을 위해 유용한 주파수 대역을 이용한 음성 검출기)

  • Ji Mikyong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.183-186
    • /
    • 2002
  • The speech detection is one of the important problems in real-time speech recognition. The accurate detection of speech boundaries is crucial to the performance of speech recognizer. In this paper, we propose a speech detector based on Mel-band selection through training. In order to show the excellence of the proposed algorithm, we compare it with a conventional one, so called, EPD-VAA (EndPoint Detector based on Voice Activity Detection). The proposed speech detector is trained in order to better extract keyword speech than other speech. EPD-VAA usually works well in high SNR but it doesn't work well any more in low SNR. But the proposed algorithm pre-selects useful bands through keyword training and decides the speech boundary according to the energy level of the sub-bands that is previously selected. The experimental result shows that the proposed algorithm outperforms the EPD-VAA.

  • PDF

The Effects of Speaking Mode on Intelligibility of Dysarthric Speech (뇌성마비 성인의 발화유형에 따른 명료도)

  • Kim, Soo-Jin;Ko, Hyun-Ju
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.171-176
    • /
    • 2009
  • Intelligibility measurement is one criterion for the assessment of the severity of speech disorders especially of dysarthric persons. Rate control, usually rate reduction, is used with many dysarthric speakers to improve their intelligibility. The purpose of this study is to compare how change intelligibility of speech produced by cerebral palsic speakers according to three speaking conditions. Speech samples were collected from 10 adults with cerebral palsy were asked to speak under three speaking conditions-(1) naturally(control), (2) more slowly(rate control), (3) louder and accurately(clear speech). In a perception test, after listening to the speech samples, a group of three judges were to write down whatever they heard. The result showed that total cerebral palsic subjects were divided into two subgroups according to their intelligibility according to three speaking conditions. Some subjects showed that speech intelligibility increased greatly if asked to speak 'louder and more accurately'. and the others showed no difference of intelligibility according to the speaking conditions. This study suggested that it would be useful clinically to find out the best instruction to improve intelligibility suitable for each speaker with cerebral palsy.

  • PDF

A Validity Study on Measurement of Mental Fatigue Using Speech Technology (음성기술을 이용한 정신피로 측정에 관한 타당성 연구)

  • Song, Seungkyu;Kim, Jongyeol;Jang, Junsu;Kwon, Chulhong
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.3-10
    • /
    • 2013
  • This study proposes a method to measure mental fatigue using speech technology, which has not been used in previous research and is easier than existing complex and difficult methods. It aims at establishing a relationship between the human voice and mental fatigue based on experiments to measure the influence of mental fatigue on the human voice. Two monotonous tasks of simple calculation such as finding the sum of three one digit numbers were used to measure the feeling of monotony and two sets of subjective questionnaires were used to measure mental fatigue. While thirty subjects perform the experiment, responses to the questionnaire and speech data were collected. Speech features related to speech source and the vocal tract filter were extracted from the speech data. According to the results, speech parameters deeply related to mental fatigue are a mean and standard deviation of fundamental frequency, jitter, and shimmer. This study shows that speech technology is a useful method for measuring mental fatigue.

Intra-and Inter-frame Features for Automatic Speech Recognition

  • Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun
    • ETRI Journal
    • /
    • v.36 no.3
    • /
    • pp.514-517
    • /
    • 2014
  • In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non-stationary nature of speech, and the other is the inter-frame change of a speech spectrum. We adopt the use of a sub-frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double-delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model-based speech recognizer.

Application of Shape Analysis Techniques for Improved CASA-Based Speech Separation (CASA 기반 음성분리 성능 향상을 위한 형태 분석 기술의 응용)

  • Lee, Yun-Kyung;Kwon, Oh-Wook
    • MALSORI
    • /
    • no.65
    • /
    • pp.153-168
    • /
    • 2008
  • We propose a new method to apply shape analysis techniques to a computational auditory scene analysis (CASA)-based speech separation system. The conventional CASA-based speech separation system extracts speech signals from a mixture of speech and noise signals. In the proposed method, we complement the missing speech signals by applying the shape analysis techniques such as labelling and distance function. In the speech separation experiment, the proposed method improves signal-to-noise ratio by 6.6 dB. When the proposed method is used as a front-end of speech recognizers, it improves recognition accuracy by 22% for the speech-shaped stationary noise condition and 7.2% for the two-talker noise condition at the target-to-masker ratio than or equal to -3 dB.

  • PDF

Performance Analysis of Noisy Speech Recognition Depending on Parameters for Noise and Signal Power Estimation in MMSE-STSA Based Speech Enhancement (MMSE-STSA 기반의 음성개선 기법에서 잡음 및 신호 전력 추정에 사용되는 파라미터 값의 변화에 따른 잡음음성의 인식성능 분석)

  • Park Chul-Ho;Bae Keun-Sung
    • MALSORI
    • /
    • no.57
    • /
    • pp.153-164
    • /
    • 2006
  • The MMSE-STSA based speech enhancement algorithm is widely used as a preprocessing for noise robust speech recognition. It weighs the gain of each spectral bin of the noisy speech using the estimate of noise and signal power spectrum. In this paper, we investigate the influence of parameters used to estimate the speech signal and noise power in MMSE-STSA upon the recognition performance of noisy speech. For experiments, we use the Aurora2 DB which contains noisy speech with subway, babble, car, and exhibition noises. The HTK-based continuous HMM system is constructed for recognition experiments. Experimental results are presented and discussed with our findings.

  • PDF

The influence of utterance length on speech rate in spontaneous speech (자연발화 음성 코퍼스에서 발화 속도에 대한 발화 길이의 영향)

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.9-17
    • /
    • 2017
  • The current study examined speech rate and its variance in spontaneous Seoul Korean speech. The current study focused on factors affecting the variance of speech rate such as utterance length, individual speakers, and gender. The results revealed that, first, utterance length has a significant influence on speech rate. Longer utterances were spoken at a faster rate. Second, regarding the effect of utterance length, individual speakers differed significantly in their speaking rate. The variation between speakers and within speakers tended to increase as utterance length increases. Third, there were speakers' gender differences, indicating that males produced considerably faster speaking rate than females. Additionally, the current study implied that non-linguistic factors in spontaneous speech can affect the variance of speakers' speaking rate.

Harmonic Structure Features for Robust Speaker Diarization

  • Zhou, Yu;Suo, Hongbin;Li, Junfeng;Yan, Yonghong
    • ETRI Journal
    • /
    • v.34 no.4
    • /
    • pp.583-590
    • /
    • 2012
  • In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech.

Speech Active Interval Detection Method in Noisy Speech (잡음음성에서의 음성 활성화 구간 검출 방법)

  • Lee, Kwang-Seok;Choo, Yeon-Gyu;Kim, Hyun-Deok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.779-782
    • /
    • 2008
  • It is important to detect speech active interval from Noisy Speech in speech communication and speech recognition. In this research, we propose characteristic parameter with combining spectral Entropy for detect speech active interval in Noisy Speech, and compare performance of speech active interval based on energy. The results shows that analysis using proposed characteristic parameter is higher performance the others in noisy environment.

  • PDF

The Use of a Temporary Speech Aid Prosthesis to Treat Speech in Velopharyngeal Insufficiency (VPI) (비인강폐쇄부전 환자의 언어교정을 위해 발음 보조장치를 이용한 증례)

  • Kim, Eun-Ju;Ko, Seung-O;Shin, Hyo-Keun;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.3-14
    • /
    • 2002
  • VPI occurs when the velum and lateral and posterior pharyngeal wall fail to separate the nasal cavity from the oral cavity during deglutition and speech. There are a number of congenital and acquired conditions which result in VPI. Congenital conditions include cleft palate, submucous cleft palate and congenital palatal insufficiency (CPI). Acquired conditions include carcinoma of the palate or pharynx and neurologic disorders. The speech characteristics of VPI is characterized by hypernasality, nasal air emission, decreased intraoral air pressure, increased nasal air flow, decreased intelligibility. VPI can be treated with various methods that include speech therapy, surgical procedures to reduce the velopharyngeal gap, speech aid prosthesis, and combination of surgery and prosthesis. This article describes four cases of VPI treated by speech aid prosthesis and speech therapy with satisfactory result.

  • PDF