• Title/Summary/Keyword: Speaker differences

Search Result 84, Processing Time 0.033 seconds

An acoustical analysis method of numeric sounds by Praat (Praat를 이용한 숫자음의 음향적 분석법)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.127-137
    • /
    • 2000
  • This paper presents a macro script to analyze numeric sounds by a speech analysis shareware, Praat, and analyzes those sounds produced by three students who were born and raised in Pusan. Recording was done in a quiet office. To make a meaningful comparison, dynamic time points in relation to the total duration of voicing segments were determined to measure acoustical values. Results showed that a strong correlation coefficient was found between the repetitive production of numeric sounds within and across the speakers. Very high coefficients among diphthongal numbers (0 and 6) which usually show wide formant variation were noticed. This supports that each speaker produced numbers quite coherently. Also, the frequency differences between the three subjects were within a perceptually similar range. To identify a speaker among others may require to find subtle individual differences within this range. Perceptual experiments by synthesized numeric sounds may lead to resolve the issue.

  • PDF

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

  • Kim Dong-Hyun;Hong Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.4 no.3
    • /
    • pp.9-14
    • /
    • 2003
  • The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.

  • PDF

Experimental study on the heat transfer characteristics of woofer speaker unit (우퍼 스피커 유닛의 열전달 특성에 대한 실험적 연구)

  • Kim, Hyung-Jin;Kim, Dae-Wan;Lee, Moo-Yeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.5
    • /
    • pp.2623-2627
    • /
    • 2014
  • The objective of this study is to experimentally investigate the heat transfer characteristics of 200W woofer speaker unit with the input voice signals such as 500 Hz, 1000 Hz, 2000 Hz, and 3000 Hz. The temperature and heat transfer characteristics of the woofer speaker unit were evaluated with the input signals. As results. the temperature of the voice-coil for woofer speaker unit increased with a decrease of the input signals and the temperature differences between parts of the tested speaker unit increased with the decrease of the input voice signals. In addition, the voice-coil temperature for the input signal of 500 Hz showed 48.4 % lower than that of 3000 Hz during 18000 sec.

The Implementation of Real-Time Speaker Localization Using Multi-Modality (멀티모달러티를 이용한 실시간 음원추적 시스템 구현)

  • Park, Jeong-Ok;Na, Seung-You;Kim, Jin-Young
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.459-461
    • /
    • 2004
  • This paper presents an implementation of real-time speaker localization using audio-visual information. Four channels of microphone signals are processed to detect vertical as well as horizontal speaker positions. At first short-time average magnitude difference function(AMDF) signals are used to determine whether the microphone signals are human voices or not. And then the orientation and distance information of the sound sources can be obtained through interaural time difference and interaual level differences. Finally visual information by a camera helps get finer tuning of the speaker orientation. Experimental results of the real-time localization system show that the performance improves to 99.6% compared to the rate of 88.8% when only the audio information is used.

  • PDF

Acoustic Characteristics of Vowels in Korean Distant-Talking Speech (한국어 원거리 음성의 모음의 음향적 특성)

  • Lee Sook-hyang;Kim Sunhee
    • MALSORI
    • /
    • v.55
    • /
    • pp.61-76
    • /
    • 2005
  • This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.

  • PDF

Visual Presentation of Connected Speech Test (CST)

  • Jeong, Ok-Ran;Lee, Sang-Heun;Cho, Tae-Hwan
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.26-37
    • /
    • 1998
  • The Connected Speech Test (CST) was developed to test hearing aid performance using realistic stimuli (Connected speech) presented in a background of noise with a visible speaker. The CST has not been investigated as a measure of speech reading ability using the visual portion of the CST only. Thirty subjects were administered the 48 test lists of the CST using visual presentation mode only. Statistically significant differences were found between the 48 test lists and between the 12 passages of the CST (48 passages divided into 12 groups of 4 lists which were averaged.). No significant differences were found between male and female subjects; however, in all but one case, females scored better than males. No significant differences were found between students in communication disorders and students in other departments. Intra- and inter-subject variability across test lists and passages was high. Suggestions for further research include changing the scoring of the CST to be more contextually based and changing the speaker for the CST.

  • PDF

The Comparative Study of the Modalities of '-keyss' and '-(u)l kes' in Korean (`-겠`과 `-을 것`의 양태 비교 연구)

  • Yeom Jae-Il
    • Language and Information
    • /
    • v.9 no.2
    • /
    • pp.1-22
    • /
    • 2005
  • In this paper I propose the semantics of two modality markers in Korean, keyss and (u)1 kes. I compare the two modality markers with respect to some properties. First, keyss is used to express logical necessity while (u)1 kes can be used to express a simple prediction as well. Second, keyss expresses some logical conclusion from the speaker's own information state without claiming it is true. On the other hand, (u)1 kes expresses the claim that the speaker's prediction will be true. Third, the prediction of keyss is non-monotonic: it can be reversed without being inconsistent. However, that of (u)1 kes cannot. Fourth, (u)1 kes can be used freely in epistemic conditionals, but keyss cannot. Finally, when keyss is used, the prediction cannot be repeated. The prediction from the use of (u)1 kes can be repeated. To account for these differences, I propose that keyss is used when the speaker makes a purely logical presumption based on his/her own information state, and that (u)1 kes is used to make a prediction which is asserted to be true. This proposal accounts for all the differences of the two modality markers.

  • PDF

A Study on the Duration of Korean medial fortis by Japanese Speakers (일본인 학습자의 국어 어중 경음 지속 시간 연구)

  • Noh, Seok-Eun
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.67-70
    • /
    • 2005
  • The purpose of this paper is the comparison of the Korean medial fortis duration between Korean native speaker and Japanese native speaker who study Korean language. For this purpose, I selected words with medial fortis from the SITEC DB. The Korean medial fortis of Japanese tends to have longer closure/friction duration than Korean native speakers in 3 syllables words. There are no distinct differences in 2 syllables words. This might be owing to the different timing unit of Korean and Japanese.

  • PDF

Speaker-Adaptive Speech Synthesis based on Fuzzy Vector Quantizer Mapping and Neural Networks (퍼지 벡터 양자화기 사상화와 신경망에 의한 화자적응 음성합성)

  • Lee, Jin-Yi;Lee, Gwang-Hyeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.149-160
    • /
    • 1997
  • This paper is concerned with the problem of speaker-adaptive speech synthes is method using a mapped codebook designed by fuzzy mapping on FLVQ (Fuzzy Learning Vector Quantization). The FLVQ is used to design both input and reference speaker's codebook. This algorithm is incorporated fuzzy membership function into the LVQ(learning vector quantization) networks. Unlike the LVQ algorithm, this algorithm minimizes the network output errors which are the differences of clas s membership target and actual membership values, and results to minimize the distances between training patterns and competing neurons. Speaker Adaptation in speech synthesis is performed as follow;input speaker's codebook is mapped a reference speaker's codebook in fuzzy concepts. The Fuzzy VQ mapping replaces a codevector preserving its fuzzy membership function. The codevector correspondence histogram is obtained by accumulating the vector correspondence along the DTW optimal path. We use the Fuzzy VQ mapping to design a mapped codebook. The mapped codebook is defined as a linear combination of reference speaker's vectors using each fuzzy histogram as a weighting function with membership values. In adaptive-speech synthesis stage, input speech is fuzzy vector-quantized by the mapped codcbook, and then FCM arithmetic is used to synthesize speech adapted to input speaker. The speaker adaption experiments are carried out using speech of males in their thirties as input speaker's speech, and a female in her twenties as reference speaker's speech. Speeches used in experiments are sentences /anyoung hasim nika/ and /good morning/. As a results of experiments, we obtained a synthesized speech adapted to input speaker.

  • PDF

Fast Sequential Probability Ratio Test Method to Obtain Consistent Results in Speaker Verification (화자확인에서 일정한 결과를 얻기 위한 빠른 순시 확률비 테스트 방법)

  • Kim, Eun-Young;Seo, Chang-Woo;Jeon, Sung-Chae
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.63-68
    • /
    • 2010
  • A new version of sequential probability ratio test (SPRT) which has been investigated in utterance-length control is proposed to obtain uniform response results in speaker verification (SV). Although SPRTs can obtain fast responses in SV tests, differences in the performance may occur depending on the compositions of consonants and vowels in the sentences used. In this paper, a fast sequential probability ratio test (FSPRT) method that shows consistent performances at all times regardless of the compositions of vocalized sentences for SV will be proposed. In generating frames, the FSPRT will first conduct SV test processes with only generated frames without any overlapping and if the results do not satisfy discrimination criteria, the FSPRT will sequentially use frames applied with overlapping. With the progress of processes as such, the test will not be affected by the compositions of sentences for SV and thus fast response outcomes and even consistent performances can be obtained. Experimental results show that the FSPRT has better performance to the SPRT method while requiring less complexity with equal error rates (EER).

  • PDF