• Title/Summary/Keyword: phonetic data

Search Result 200, Processing Time 0.023 seconds

Class-Based Histogram Equalization for Robust Speech Recognition

  • Suh, Young-Joo;Kim, Hoi-Rin
    • ETRI Journal
    • /
    • v.28 no.4
    • /
    • pp.502-505
    • /
    • 2006
  • A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating the acoustic mismatch between training and test environments, but also at reducing the discrepancy between the phonetic distributions of training and test speech data. The algorithm utilizes multiple class-specific reference and test cumulative distribution functions, classifies the noisy test features into their corresponding classes, and equalizes the features by using their corresponding class-specific reference and test distributions. Experiments on the Aurora 2 database proved the effectiveness of the proposed method by reducing relative errors by 18.74%, 17.52%, and 23.45% over the conventional histogram equalization method and by 59.43%, 66.00%, and 50.50% over mel-cepstral-based features for test sets A, B, and C, respectively.

  • PDF

An Analysis of Phonetic Parameters for Individual Speakers (개별화자 음성의 특징 파라미터 분석)

  • Ko, Do-Heung
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.177-189
    • /
    • 2000
  • This paper investigates how individual speakers' speech can be distinguished using acoustic parameters such as amplitude, pitch, and formant frequencies. Word samples from fifteen male speakers in their 20's in three different regions were recorded in two different modes (i.e., casual and clear speech) in quiet settings, and were analyzed with a Praat macro scrip. In order to determine individual speakers' acoustical values, the total duration of voicing segments was measured in five different timepoints. Results showed that a high correlation coefficient between $F_1\;and\;F_2$ in formant frequency was found among the speakers although there was little correlation coefficient between amplitude and pitch. Statistical grouping shows that individual speakers' voices were not reflected in regional dialects for both casual and clear speech. In addition, the difference of maximum and minimum in amplitude was about 10 dB which indicates a perceptually audible degree. These acoustic data can give some meaningful guidelines for implementing algorithms of speaker identification and speaker verification.

  • PDF

The Comparison of Speech Feature Parameters for Emotion Recognition (감정 인식을 위한 음성의 특징 파라메터 비교)

  • 김원구
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.04a
    • /
    • pp.470-473
    • /
    • 2004
  • In this paper, the comparison of speech feature parameters for emotion recognition is studied for emotion recognition using speech signal. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy. MFCC parameters and their derivatives with or without cepstral mean subfraction are also used to evaluate the performance of the conventional pattern matching algorithms. Pitch and energy Parameters were used as a Prosodic information and MFCC Parameters were used as phonetic information. In this paper, In the Experiments, the vector quantization based emotion recognition system is used for speaker and context independent emotion recognition. Experimental results showed that vector quantization based emotion recognizer using MFCC parameters showed better performance than that using the Pitch and energy parameters. The vector quantization based emotion recognizer achieved recognition rates of 73.3% for the speaker and context independent classification.

  • PDF

'Hanmal' Korean Language Diphone Database for Speech Synthesis

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.55-63
    • /
    • 2005
  • This paper introduces a 'Hanmal' Korean language diphone database for speech synthesis, which has been publicly available since 1999 in the MBROLA web site and never been properly published in a journal. The diphone database is compatible with the MBROLA programme of high-quality multilingual speech synthesis systems. The usefulness of the diphone database is introduced in the paper. The paper also describes the phonetic and phonological structure of the database, showing the process of creating a text corpus. A machine-readable Korean SAMPA convention for the control data input to the MBROLA application is also suggested. Diphone concatenation and prosody manipulation are performed using the MBR-PSOLA algorithm. A set of segment duration models can be applied to the diphone synthesis of Korean.

  • PDF

An acoustic study on the alaryngeal voice using the Multi-Speech (Multi-Speech를 통한 후두적출자의 발성에 대한 음향학적 분석)

  • Noh Dongwoo;Paik Euna;Kang Sookyoon
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.133-137
    • /
    • 2003
  • The purpose of this study was to provide acoustic data on the voice of the laryngectomized patients for more scientific and efficient voice rehabilitation. The phonation of prolonged /a/ of 9 electronic artificial larynx(AL) users, 5 esophageal(EP) speech users, and 2 tracheo-esophageal(TEP) voice users were recorded and analyzed using Multi-Speech. Habitual f0, mean f0, sd f0, max f0, min f0, jitter, shimmer, and NHR were compared among groups of subjects using t-test. The EP and TEP groups exhibited higher f0 compared to the AL group. The AL and TEP groups showed more stable f0 than the EP group. In addition, the quality of TEP and EP voices were comparatively better in terms of jitter, shimmer, and NHR.

  • PDF

Performance Improvement ofSpeech Recognition Based on SPLICEin Noisy Environments (SPLICE 방법에 기반한 잡음 환경에서의 음성 인식 성능 향상)

  • Kim, Jong-Hyeon;Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.53
    • /
    • pp.103-118
    • /
    • 2005
  • The performance of speech recognition system is degraded by mismatch between training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE) was introduced to overcome environmental mismatch using stereo data. In this paper, we propose several methods to improve the conventional SPLICE and evaluate them in the Aurora2 task. We generalize SPLICE to compensate for covariance matrix as well as mean vector in the feature space, and thereby yielding the error rate reduction of 48.93%. We also employ the weighted sum of correction vectors using posterior probabilities of all Gaussians, and the error rate reduction of 48.62% is achieved. With the combination of the above two methods, the error rate is reduced by 49.61% from the Aurora2 baseline system.

  • PDF

A Study on the Durational Characteristics of Korean Lombard Speech (한국어 롬바드 음성의 지속시간 연구)

  • Kim, Sun-Hee
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.21-24
    • /
    • 2005
  • This paper presents durational characteristics of Korean Lombard speech using data, which consist of 500 Lombard utterances and 500 normal utterances of 10 speakers (5 males and 5 females). Each file was segmented and labeled manually and the duration of each segment and each word was extracted. The durational change of Lombard effect in comparison with normal speech was analyzed using a statistical method. The results show that the duration of words with Lombard effect is increased in comparison with normal style, and that the average unvoiced consonantal duration is reduced while the average vocalic duration is increased. Female speakers show a stronger tendency towards lengthening the duration in Lombard speech, but without statistical significance. Finally, this study also shows that the speakers of Lombard speech could be classified according to their different duration rate.

  • PDF

The features of Voice Range Profile of School-Age child (학령기 아동의 음성범위프로필(Voice Range Profile) 특징)

  • Moon, Kyung-Ah;Han, Ji-Yeon
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.52-54
    • /
    • 2007
  • This study has investigated the basic data of untrained boys and girls' VRP. The VRP comparison was executed between 5 boys(lO to 11 years old) and girls(10 to 11 years old). The measure of VRP was implemented by using Dr. Speech 4.0(Tiger-electronics) phonetogram program. The comparison of boys and girls' maximum and minimum range, the mean of boys' maximum range is 93.68dB(SD 7.90) and girls' range is 93.12dB(SD 5.11). There was no difference and the mean of minimum range of boy is 68.08dB(SD 3.59), girl is 71.10dB(SD 3.06).

  • PDF

Effects of Concurrent Linguistic or Cognitive Tasks on Speech Rate (언어 및 인지 과제 동시수행이 발화속도에 미치는 영향)

  • Han, Ji-Yeon;Kim, Hyo-Jeong;Kim, Moon-Jeong
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.102-105
    • /
    • 2007
  • This study was designed to examination effects of concurrent linguistic or cognitive tasks on speech rate. Eight normal speakers were repeated sentences either with or without simultaneous a linguistic task and a cognitive task. Linguistic task was conducted by generating verbs from nouns and cognitive task meaned performing mental arithmetic. Speech rate was measured from acoustic data. One-way ANOVA conducted to know speech rate difference among 3 different type of tasks. The results showed there was no significant difference between sentence repeat and linguistic tasks. But There was significant difference findings: sentence repeat and linguistic task, linguistic and cognitive task.

  • PDF

Word Frequency Effects on Duration and F0 in English Homophone Utterances

  • Kwon, Soon-Kyo;Jang, Tae-Yeoub
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.227-229
    • /
    • 2007
  • We investigate whether the word frequency effects occur in native speakers' homophone speech in such a way that less frequent words are produced with greater magnitudes in duration and F0 than more frequent words. Acoustic analyses of homophone data produced by four speakers reveal that there is a tendency that vowels in less frequent words get longer than those in more frequent words, and statistical tests verify the significance of their differences. On the other hand, no considerable correlation has been discovered between F0 and word frequency.

  • PDF