• Title/Summary/Keyword: synchronous speech

Search Result 24, Processing Time 0.023 seconds

An acoustical analysis of synchronous English speech using automatic intonation contour extraction (영어 동시발화의 자동 억양궤적 추출을 통한 음향 분석)

  • Yi, So Pae
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.97-105
    • /
    • 2015
  • This research mainly focuses on intonational characteristics of synchronous English speech. Intonation contours were extracted from 1,848 utterances produced in two different speaking modes (solo vs. synchronous) by 28 (12 women and 16 men) native speakers of English. Synchronous speech is found to be slower than solo speech. Women are found to speak slower than men. The effect size of speech rate caused by different speaking modes is greater than gender differences. However, there is no interaction between the two factors (speaking modes vs. gender differences) in terms of speech rate. Analysis of pitch point features has it that synchronous speech has smaller Pt (pitch point movement time), Pr (pitch point pitch range), Ps (pitch point slope) and Pd (pitch point distance) than solo speech. There is no interaction between the two factors (speaking modes vs. gender differences) in terms of pitch point features. Analysis of sentence level features reveals that synchronous speech has smaller Sr (sentence level pitch range), Ss (sentence slope), MaxNr (normalized maximum pitch) and MinNr (normalized minimum pitch) but greater Min (minimum pitch) and Sd (sentence duration) than solo speech. It is also shown that the higher the Mid (median pitch), the MaxNr and the MinNr in solo speaking mode, the more they are reduced in synchronous speaking mode. Max, Min and Mid show greater speaker discriminability than other features.

Speech Rate Variation in Synchronous Speech (동시발화에 나타나는 발화 속도 변이 분석)

  • Kim, Miran;Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.19-27
    • /
    • 2012
  • When two speakers read a text together, the produced speech has been shown to reduce a high degree of variability (e.g., pause duration and placement, and speech rate). This paper provides a quantitative analysis of speech rate variation exhibited in synchronous speech by examining the global and local patterns in two dialects of Mandarin Chinese (Taiwan and Shanghai). We analyzed the speech data in terms of mean speech rate and the reference of "Just Noticeable difference (JND)" within a subject and across subjects. Our findings show that speakers show lower and less variable speech rates when they read a text synchronously than when they read alone. This global pattern is observed consistently across speakers and dialects maintaining the unique local variation patterns of speech rate for each dialect. We conclude that paired speakers lower their speech rates and decrease the variability in order to ensure the synchrony of their speech.

Efficient Tracking of Speech Formant Using Closed Phase WRLS-VFF-VT Algorithm

  • Lee, Kyo-Sik;Park, Kyu-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.2E
    • /
    • pp.8-13
    • /
    • 2000
  • In this paper, we present an adaptive formant tracking algorithm for speech using closed phase WRLS-VFF-VT method. The pitch synchronous closed phase methods is known to give more accurate estimates of the vocal tract parameters than the pitch asynchronous method. However the use of a pitch-synchronous closed phase analysis method has been limited due to difficulties associated with the task of accurately isolating the closed phase region in successive periods of speech. Therefore we have implemented the pitch synchronous closed phase WRLS-VFF-VT algorithm for speech analysis, especially for formant tracking. The proposed algorithm with the variable threshold(VT) can provide a superior performance in the boundary of phone and voiced/unvoiced sound. The proposed method is experimentally compared with the other method such as two channel CPC method by using synthetic waveform and real speech data. From the experimental results, we found that the block data processing techniques, such as the two-channel CPC, gave reasonable estimates of the formant/antiformant. However, the data windows used by these methods included the effects of the periodic excitation pulses, which affected the accuracy of the estimated formants. On the other hand the proposed WRLS-VFF-VT method, which eliminated the influence of the pulse excitation by using an input estimation as part of the algorithm, gave very accurate formant/bandwidth estimates and good spectral matching.

  • PDF

Speech Signal Processing using Pitch Synchronous Multi-Spectra and DSP System Design in Cochlear Implant (피치동기 다중 스펙트럼을 이용한 청각보철장치의 음성신호처리 및 DSP 시스템 설계)

  • Shin, J. I.;Park, S. J.;Shin, D. K.;Lee, J. H.;Park, S. H.
    • Journal of Biomedical Engineering Research
    • /
    • v.20 no.4
    • /
    • pp.495-502
    • /
    • 1999
  • We propose efficient speech signal processing algorithms and a system for cochlear implant in this paper. The outer and the middle car which perform amplifying, lowpass filtering and AGC, are modeled by an analog system, and the inner ear acting as a time-delayed multi filter and the transducer is implemented by the DSP circuit which enables real-time processing. Especially, the basilar membrane characteristic of the inner ear is modeled by a nonlinear filter bank, and then tonotopy and periodicity of the auditory system is satisfied by using a pitch-synchronous multi-spectra(PSMS) method. Moreover, most of the speech processing is performed by S/W so the system can be easily modified. And as our program is written in C-language, it can be easily transplanted to the system using other processors.

  • PDF

Algorithm for Concatenating Multiple Phonemic Units for Small Size Korean TTS Using RE-PSOLA Method

  • Bak, Il-Suh;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.85-94
    • /
    • 2003
  • In this paper an algorithm to reduce the size of Text-to-Speech database is proposed. The algorithm is based on the characteristics of Korean phonemic units. From the initial database, a reduced phoneme unit set is induced by articulatory similarity of concatenating phonemes. Speech data is read by one female announcer for 1000 phonetically balanced sentences. All the recorded speech is then segmented by phoneticians. Total size of the original speech data is about 640 MB including laryngograph signal. To synthesize wave, RE-PSOLA (Residual-Excited Pitch Synchronous Overlap and Add Method) was used. The voice quality of synthesized speech was compared with original speech in terms of spectrographic informations and objective tests. The quality of the synthesized speech is not much degraded when the size of synthesis DB was reduced from 320 MB to 82 MB.

  • PDF

Detection of Glottal Closure Instant for Voiced Speech Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 성문폐쇄시점 검출)

  • Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.153-165
    • /
    • 2000
  • During the phonation of voiced sounds, instants exist where the glottis is opened or closed, due to the periodic vibration of the vocal cord. When closed, this is called the glottal closure instant(GCI) or epoch.. The correct detection of the GCI is one of the important problems in speech processing for pitch detection, pitch synchronous analysis, and so on. Recently, it has been shown that the local maxima points of the wavelet transformed speech signal correspond to the GCIs of speech signal. In this paper, we investigate the accuracy of Gels estimated from this wavelet transformed speech signal. For this purpose we compare them with the negative peak points of the differentiated EGG signal that represents the actual GCIs of speech signal.

  • PDF

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

  • 김유진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.61-74
    • /
    • 2004
  • In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.

Implementation of Formant Speech Analysis/Synthesis System (포만트 분석/합성 시스템 구현)

  • Lee, Joon-Woo;Son, Ill-Kwon;Bae, Keuo-Sung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.295-314
    • /
    • 1997
  • In this study, we will implement a flexible formant analysis and synthesis system. In the analysis part, the two-channel (i.e., speech & EGG signals) approach is investigated for accurate estimation of formant information. The EGG signal is used for extracting exact pitch information that is needed for the pitch synchronous LPC analysis and closed phase LPC analysis. In the synthesis part, Klatt formant synthesizer is modified so that the user can change synthesis parameters arbitarily. Experimental results demonstrate the superiority of the two-channel analysis method over the one-channel(speech signal only) method in analysis as well as in synthesis. The implemented system is expected to be very helpful for studing the effects of synthesis parameters on the quality of synthetic speech and for the development of Korean text-to-speech(TTS) system with the formant synthesis method.

  • PDF

Development of an algorithm for the control of prosodic factors to synthesize unlimited isolated words in the time domain (시간 영역에서의 무제한 고립어 합성을 위한 운율 요소 제어용 알고리즘 개발)

  • 강찬희
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.7
    • /
    • pp.59-68
    • /
    • 1998
  • This paper is to develop an algorithm for the unlimited korean speech synthesis. We present the results controlled of prosodic factors with isolated words as aynthesis basis unit int he time domain. With a new pitch-synchronous and parametric speech synthesis mehtod in the time domain here we mainly present the results of controlled prosody factors such a spitch periods, energy envelops and durations and the evaluaton of synthetic speech qualities. In the case of synthesis, it is possible ot synthesize connected words by controlling of a continuous unified prosody that makes to improve the naturalities. In the results of experiment, it also has been to be improved uncontinuities of pitch and zeroing of energy in the junction parts of speech waveforms. Specially it has been to be possible to synthesize speeches with unlimitted durations and tones. So on it makes the noisiness and the clearness better by improving the degradation effects from the phase distortion due to the discontinuities in the waveform connection parts.

  • PDF

Recognition Time Reduction Technique for the Time-synchronous Viterbi Beam Search (시간 동기 비터비 빔 탐색을 위한 인식 시간 감축법)

  • 이강성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.46-50
    • /
    • 2001
  • This paper proposes a new recognition time reduction algorithm Score-Cache technique, which is applicable to the HMM-base speech recognition system. Score-Cache is a very unique technique that has no other performance degradation and still reduces a lot of search time. Other search reduction techniques have trade-offs with the recognition rate. This technique can be applied to the continuous speech recognition system as well as the isolated word speech recognition system. W9 can get high degree of recognition time reduction by only replacing the score calculating function, not changing my architecture of the system. This technique also can be used with other recognition time reduction algorithms which give more time reduction. We could get 54% of time reduction at best.

  • PDF