• Title/Summary/Keyword: Speech function

Search Result 694, Processing Time 0.024 seconds

Acoustic Analysis of Reinke Edema (라인케부종환자의 음성분석)

  • 김상균;최홍식;공석철;홍원표
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.7 no.1
    • /
    • pp.11-19
    • /
    • 1996
  • Reinke's edema is used for describing varying degrees of chronic swelling of the vocal folds. The acoustic analysis of Reinke's edema has not been reported so far in this country. The purpose of this study is to clarify acoustic and aerodynamic characteristics of the Reinke's edema. Several acoustic evaluations & aerodynamic studies were done in 20 Reinke's edema patients and the data was compared with those of 20 normal controls. Videolaryngoscopy also was done to classify the severity in grading. We used C-Speech, Doctor speech science, and Phonatory function analyser. In C-Speech, we compared jitter, shimmer, and SNR(signal to noise ratio) of normal and Rrinke's edema patient. In Doctor speech science, we compared NNE(Glottal noise energy), speech fundamental frequency, voice quality between two groups. And in phonatory function analyser for aerodynamic function test, we compared speech intensity, airflow rate, and expiratory pressure between two groups. In conclusion, Reinke's edema patients showed lower voice pitches than normal, additionally jitter, shimmer, SNR(signal to noise ratio), NNE(Glottal noise energy), airflow rate, and expiratory pressure may be meaningful parameters for diagnosis and prognosis for treatment.

  • PDF

Robust Histogram Equalization Using Compensated Probability Distribution

  • Kim, Sung-Tak;Kim, Hoi-Rin
    • MALSORI
    • /
    • v.55
    • /
    • pp.131-142
    • /
    • 2005
  • A mismatch between the training and the test conditions often causes a drastic decrease in the performance of the speech recognition systems. In this paper, non-linear transformation techniques based on histogram equalization in the acoustic feature space are studied for reducing the mismatched condition. The purpose of histogram equalization(HEQ) is to convert the probability distribution of test speech into the probability distribution of training speech. While conventional histogram equalization methods consider only the probability distribution of a test speech, for noise-corrupted test speech, its probability distribution is also distorted. The transformation function obtained by this distorted probability distribution maybe bring about miss-transformation of feature vectors, and this causes the performance of histogram equalization to decrease. Therefore, this paper proposes a new method of calculating noise-removed probability distribution by using assumption that the CDF of noisy speech feature vectors consists of component of speech feature vectors and component of noise feature vectors, and this compensated probability distribution is used in HEQ process. In the AURORA-2 framework, the proposed method reduced the error rate by over $44\%$ in clean training condition compared to the baseline system. For multi training condition, the proposed methods are also better than the baseline system.

  • PDF

Word class information in perception of prosodic prominence by Korean learners of English

  • Im, Suyeon
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.1-8
    • /
    • 2019
  • This study aims to investigate how prosodic prominence is perceived in relation to word class information (or parts-of-speech) by Korean learners of English compared with native English speakers in public speech. Two groups, Korean learners of English and native English speakers, were asked to judge words perceived as prominent simultaneously while listening to a speech. Parts-of-speech and three acoustic cues (i.e., max F0, mean phone duration, and mean intensity) were analyzed for each word in the speech. The results showed that content words tended to be higher in pitch and longer in duration than function words. Both groups of listeners rated prominence on content words more frequently than on function words. This tendency, however, was significantly greater for Korean learners of English than for native English speakers. Among the parts-of-speech of the content words, Korean learners of English were more likely than native English speakers to judge nouns and verbs as prominent. This study presents evidence that Korean learners of English consider most, if not all, content words as landing locations of prosodic prominence, in alignment with the previous study on the production of prominence.

The Effect of Membership Concentration in FVQ/HMM for Speaker-Independent Speech Recognition

  • Lee, Chang-Young;Nam, Ho-Soo;Jung, Hyun-Seok;Lee, Chai-Bong
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.7-16
    • /
    • 2005
  • We investigate the effect of membership concentration on the performance of the speaker-independent recognition system by FVQ/HMM. For the membership function, we adopt the result obtained from the objective function approach by Bezdek. Membership concentration is done by varying the exponent in the membership function. The number of selected clusters is constrained to two for the sake of cheap computational cost. Experimental results showed that the recognition rate has its maximum value when the membership function was taken to be inversely proportional to the distance of the input vector from the cluster centroid. When the membership concentration was two weak or too strong, the performance was found to be relatively poor as expected. Except these extreme cases, the membership concentration was not shown to affect the recognition rate significantly. This is in accordance with the general observation that the fuzzy system is not much sensitive. to the detailed shape of the membership function as long as it is overlapped over multiple classes.

  • PDF

Speech Stimuli on the Diagnostic Evaluation of Speech with Cleft Lip and Palate : Clinical Use and Literature Review (구개열 환자 말 평가 시 검사어에 대한 고찰 : 임상현장의 말 평가 어음자료와 문헌적 고찰을 중심으로)

  • Choi, Seong-Hee;Choi, Jae-Nam;Nam, Do-Hyun;Choi, Hong-Shik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.16 no.1
    • /
    • pp.33-48
    • /
    • 2005
  • Differential diagnosis of articulation and resonance problems in the cleft lip and palate speech is required for evaluating various factors contribute to speech problems such as VPI, dental occlusion, palatal fistulae, learning. However, validity of speech stimuli is current issue to evaluate accurately each problem in cleft speech. This study was conducted to investigate speech stimuli using in the clinical setting and review the literatures and articles published 1990 to 2005 for helping develop standardized speech samples. The results were recommendation to evaluate properly velopharyngeal function when conducting a diagnostic evaluation as follows : 1) In identification hypernasality, the speech stimuli should be included low pressure consonants to eliminate effects of nasal emission, compensatory articulation. 2) Speech stimuli should be consist of visual, front sounds to eliminate compensatory articulation and to stimulate easily. 3) Regarding early diagnosis and treatment, speech stimuli need to develop for infants and preschooler. 4) Stimulus length on nasalance scores should be at least 6 syllables. 5) In phonetic context on nasalance scores, /i/ vowel should be take into consideration excluding paragraph. 6) Connected speech stimuli should be developed for evaluating intelligibility and VP function.

  • PDF

Consideration on the Fuzzy Chaos Dimension for Speech Recognition (음성인식을 위한 퍼지 카오스 차원의 고찰)

  • Yoo, B.W.;Kim, S.K.;Park, H.S.;Kim, C.S.
    • Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.25-39
    • /
    • 1998
  • This paper deals with fuzzy correlation dimension for an appropriate speech recognition. The proposed fuzzy correlation dimension has absorbed time variation value of strange attractor as utilizing fuzzy membership function at calculation of integral correlation when the results of proposed dimension are applied to speech recognition fuzzed correlation dimension is superior to speech recognition, and correlation dimension is superior to speaker discrimination.

  • PDF

Optimally Weighted Cepstral Distance Measure for Speech Recognition (음성 인식을 위한 최적 가중 켑스트랄 거리 측정 방법)

  • 김원구
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.133-137
    • /
    • 1994
  • In this paper, a method for designing an optimal weight function for the weighted cepstral distance measure is proposed. A conventional weight function or cepstral lifter is obtained eperimentally depending on the spectral components to be emphasized. The proposed method minimizes the error between word reference patterns and the traning data. To compare the proposed optimal weight function with conventional function, speech recognition systems based on Dpynamic Time Warping and Hidden Markov Models were constructed to conduct speaker independent isolated word necogination eperiment. Results show that the proposed method gives better performance than conventional weight functions.

  • PDF

Speech Enhancement Using the Adaptive Noise Canceling Technique with a Recursive Time Delay Estimator (재귀적 지연추정기를 갖는 적응잡음제거 기법을 이용한 음성개선)

  • 강해동;배근성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.7
    • /
    • pp.33-41
    • /
    • 1994
  • A single channel adaptive noise canceling (ANC) technique with a recursive time delay estimator (RTDE) is presented for removing effects of additive noise on the speech signal. While the conventional method makes a reference signal for the adaptive filter using the pitch estimated on a frame basis from the input speech, the proposed method makes the reference signal using the delay estimated recursively on a sample-by-sample basis. As the RTDEs, the recursion formulae of autocorrelation function (ACF) and average magnitude difference function (AMDF) are derived. The normalized least mean square (NLMS) and recursive least square (RLS) algorithms are applied for adaptation of filter coefficients. Experimental results with noisy speech demonstrate that the proposed method improves the perceived speech quality as well as the signal-to-noise ratio and cepstral distance when compared with the conventional method.

  • PDF

Performance Estimation of a Window Shaker (유리창 도청방지 장치의 성능평가)

  • Kim, Seock-Hyun;Kim, Hee-Dong;Heo, Wook
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.05a
    • /
    • pp.649-654
    • /
    • 2007
  • Eavesdropping prevention performance is evaluated on a commercial window shaker, which is used to prevent a glass window from eavesdropping. Speech transmission index (STI) is introduced in order to estimate quantitatively the speech intelligibility of the sound detected on the glass window. Objective test by IEC standard using modulation transfer function (MTF) is performed to determine STI. Using Maximum Length Sequency (MLS) signal as a sound source, MTF is measured by accelerometers and laser doppler vibrometer. STI under different level of disturbing wave are compared to confirm the disturbing effect on the speech intelligibility.

  • PDF

A Study on Recognition of Korean Postpositions and Suffixes in Continuous Speech (한국어 연속음성에서의 조사 및 어미 인식에 관한 연구)

  • Song, Min-Suck;Lee, Ki-Young
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.181-195
    • /
    • 1999
  • This study proposes a method of recognizing postpositions and suffixes in Korean spoken language, using prosodic information. We detect grammatical boundaries automatically at first, by using prosodic information of the accentual phrase, and then we recognize grammatical function words by backward-tracking from the boundaries. The experiment employs 300 sentential speech data of 10 men's and 5 women's voice spoken in standard Korean, in which 1080 accentual phrases and 11 postpositions and suffixes are included. The result shows the recognition rate of postpositions in two cases. In one case in which only correctly detected boundaries are included, the recognition rate is 97.5%, and in the other case in which all detected boundaries are included, the recognition rate is 74.8%.

  • PDF