• 제목/요약/키워드: Speech characteristics

검색결과 969건 처리시간 0.027초

모듈화한 신경 회로망을 이용한 광대역 음성 복원 (Wideband Speech Reconstruction Using Modular Neural Networks)

  • 우동헌;고참한;강현민;정진희;김유신;김형순
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.93-105
    • /
    • 2003
  • Since telephone channel has bandlimited frequency characteristics, speech signal over the telephone channel shows degraded speech quality. In this paper, we propose an algorithm using neural network to reconstruct wideband speech from its narrowband version. Although single neural network is a good tool for direct mapping, it has difficulty in training for vast and complicated data. To alleviate this problem, we modularize the neural networks based on appropriate clustering of the acoustic space. We also introduce fuzzy computing to compensate for probable misclassification at the cluster boundaries. According to our simulation, the proposed algorithm showed improved performance over the single neural network and conventional codebook mapping method in both objective and subjective evaluations.

  • PDF

음성신호의 Sub-Nyquist 비균일 표준화 및 완전 복구에 관한 연구 (Sub-Nyquist Nonuniform Sampling and Perfect Reconstruction of Speech Signals)

  • 이희영
    • 음성과학
    • /
    • 제12권2호
    • /
    • pp.153-170
    • /
    • 2005
  • The sub-Nyquist nonuniform sampling (SNNS) and the perfect reconstruction (PR) formula are proposed for the development of a systematic method to obtain minimal representation of a speech signal. In the proposed method, the instantaneous sampling frequency (ISF) varies, depending on the least upper boundary of spectral support of a speech signal in time-frequency domain (TFD). The definition of the instantaneous bandwidth (IB), which determines the ISF and is used for generating the set of samples that represent continuous-time signals perfectly, is given. Also, the spectral characteristics of the sampled data generated by the sub-Nyquist nonuniform sampling method is analyzed. The proposed method doesn't generate the redundant samples due to the time-varying property of the instantaneous bandwidth of a speech signal.

  • PDF

The Voiceless Stop Distinction in the Alaryngeal Speech

  • Hong, Ki-Hwan;Kim, Hyun-Ki
    • 음성과학
    • /
    • 제7권1호
    • /
    • pp.53-64
    • /
    • 2000
  • Theoretically, alaryngeal speakers have difficulty in accomplishing the production of voiceless consonants. However, the perceptual studies often reveal a clear production of voiceless consonants giving good articulation scores in skilled alaryngeal speakers. The purpose of the present study was to clarify the production of voiceless stops in mode of articulation to normal speakers and skilled alaryngeal speakers. The acoustic characteristics of alaryngeal speech compared to the normal speech were investigated with special reference to the voiceless stop consonants. The surface electromyography from neck is used to monitor pharyngeal activity during speech. The general result is. that esophageal, shunt and neoglottal speakers realize the distinctions between the three types of [p] in a manner parallel to normals, whereas those using an electric voice generator do not.

  • PDF

훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식 (Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter)

  • 정성윤;배건성
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.93-102
    • /
    • 2005
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.

  • PDF

유/무성음 결정에 다른 가변적인 시간축 변환 (Variable Time-Scale Modification with Voiced/Unvoiced Decision)

  • 손단영;김원구;윤대희;차일환
    • 전자공학회논문지B
    • /
    • 제32B권5호
    • /
    • pp.788-797
    • /
    • 1995
  • In this paper, a variable time-scale modification using SOLA(Synchronized OverLap and Add) is proposed, which takes into consideration the different time-scaled characteristics of voiced and unvoiced speech, Generally, voiced speech is subject to higher variations in length during time-scale modification than unvoiced speech, but the conventional method performs time-scale modification at a uniform rate for all speech. For this purpose, voiced and unvoiced speech duration at various talking speeds were statistically analyzed. The sentences were then spoken at rates of 0.7, 1.3, 1.5 and 1.8 times normal speed. A clipping autocorrelation function was applied to each analysis frame to determine voiced and unvoiced speech to obtain respective variation rates. The results were used to perform variable time-scale modification to produce sentences at rates of 0.7, 1.3, 1.5, 1.8 times normal speed. To evaluate performance, a MOS test was conducted to compare the proposed voiced/unvoiced variable time-scale modification and the uniform SOLA method. Results indicate that the proposed method produces sentence quality superior to that of the conventional method.

  • PDF

SPATIAL EXPLANATIONS OF SPEECH PERCEPTION: A STUDY OF FRICATIVES

  • Choo, Won;Mark Huckvale
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.399-403
    • /
    • 1996
  • This paper addresses issues of perceptual constancy in speech perception through the use of a spatial metaphor for speech sound identity as opposed to a more conventional characterisation with multiple interacting acoustic cues. This spatial representation leads to a correlation between phonetic, acoustic and auditory analyses of speech sounds which can serve as the basis for a model of speech perception based on the general auditory characteristics of sounds. The correlations between the phonetic, perceptual and auditory spaces of the set of English voiceless fricatives /f $\theta$ s $\int$ h / are investigated. The results show that the perception of fricative segments may be explained in terms of 2-dimensional auditory space in which each segment occupies a region. The dimensions of the space were found to be the frequency of the main spectral peak and the 'peakiness' of spectra. These results support the view that perception of a segment is based on its occupancy of a multi-dimensional parameter space. In this way, final perceptual decisions on segments can be postponed until higher level constraints can also be met.

  • PDF

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

  • Xi, Su Mei;Cho, Young Im
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제13권3호
    • /
    • pp.164-170
    • /
    • 2013
  • Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment. This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition. A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characteristics of the audio and video streams. A weight coefficient is introduced to adjust the weight of the video and audio streams automatically according to differences in the noise environment. Experimental results show that compared with other bimodal speech recognition approaches, this approach obtains better speech recognition performance.

Lipreading과 음성인식에 의한 향상된 화자 인증 시스템

  • 지승남;이종수
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2000년도 제15차 학술회의논문집
    • /
    • pp.274-274
    • /
    • 2000
  • In the future, the convenient speech command system will become an widely-using interface in automation systems. But the previous research in speech recognition didn't give satisfactory recognition results for the practical realization in the noise environment. The purpose of this research is the development of a practical system, which reliably recognizes the speech command of the registered users, by complementing an existing research which used the image information with the speech signal. For the lip-reading feature extraction from a image, we used the DWT(Discrete Wavelet Transform), which reduces the size and gives useful characteristics of the original image. And to enhance the robustness to the environmental changes of speakers, we acquired the speech signal by stereo method. We designed an economic stand-alone system, which adopted a Bt829 and an AD1819B with a TMS320C31 DSP based add-on board.

  • PDF

음성인식을 위한 복합형잡음제거필터와 최적특징추출에 관한 연구 (A study on the Optimal Feature Extraction and Cmplex Adaptive Filter for a speech recognition)

  • 차태호;장승관;최웅세;최일홍;김창석
    • 음성과학
    • /
    • 제4권2호
    • /
    • pp.55-68
    • /
    • 1998
  • In this paper, a novel method of noise reduction of speech based on a complex adaptive noise canceler and method of optimal feature extraction are proposed. This complex adaptive noise canceler needs simply the noise detection, and LMS algorithm used to calculate the adaptive filter coefficient. The method of optimal feature extraction requires the variance of noise. The experimental results have shown that the proposed method effectively reduced noise in noisy speech. Optimal feature extraction has shown similar characteristics in noise-free speech.

  • PDF

초기 수직반사음의 역할을 고려한 새로운 명료도 지표 (A new acoustical parameter for speech intelligibility with regard to early vertical reflections)

  • 박종영;한명호;정대업;오양기
    • KIEAE Journal
    • /
    • 제7권3호
    • /
    • pp.63-70
    • /
    • 2007
  • It is known that early reflections, their energy and delay times after the arrival of direct sound are important factors for speech intelligibility. In this basis, acoustical parameters like D50 and C80 had been proposed and are widely used for assessing the listening condition of rooms. These parameters are focused on the fraction of the early energy to the total, regardless of the spatial characteristics of the early reflections. This means that all the early reflections, arrived in certain time boundary. from front, behind, down and upside have the same impact on speech intelligibility. From the questionable simplicity, the influence of the direction of early reflections on speech intelligibility is examined in this study. A computer simulation speech intelligibility test, conducted for 22 university students, found that the reflection of vertical direction with method of the Paired comparison also the preference of 0.746 degree was visible an increase.