• Title/Summary/Keyword: Speech signal processing

Search Result 331, Processing Time 0.023 seconds

A construction of vowel string dictionary for unlimited word speech recognition (무제한 단어 음성인식을 위한 모음열 사전의 구축)

  • 김동환;윤재선;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.08a
    • /
    • pp.177-180
    • /
    • 2000
  • 기존의 제한적 단어 인식과는 달리 무제한 단어 음성인식에 있어서는 방대한 용량의 단어 모델을 참조로 인식이 이루어지게 되어, 참조모델과 입력패턴과의 비교를 위한 탐색시간이 너무 길어지게 된다. 본 논문에서 제한하는 방법은 무제한 단어 음성인식 시스템을 구축하기 위해 선행되어야 하는 모음열 사전을 구축하는 것이다. 음성인식시 입력패턴과 참조모델에 속한 모든 단어와의 비교를 수행하지 않고, 입력패턴의 모음열을 인식한 후, 인식된 모음열 단어들만을 참조모델에서 인식 후보로 두어 인식을 수행하게 하여 시간적인 측면에서의 효율성을 기하는 것이다. 결과적으로 본 연구 방법은 무제한 단어 음성인식에서의 실시간 처리라는 점에 주 목적을 두었다.

  • PDF

An Implementation of Unlimited Speech Recognition and Synthesis System using Transcription of Roman to Hangul (영한 음차 변환을 이용한 무제한 음성인식 및 합성기의 구현)

  • 양원렬;윤재선;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.08a
    • /
    • pp.181-184
    • /
    • 2000
  • 본 논문에서는 영한 음차 변환을 이용한 음성인식 및 합성기를 구현하였다. 음성인식의 경우 CV(Consonant Vowel), VCCV, VCV, VV, VC 단위를 사용하였다. 위의 단위별로 미리 구축된 모델을 결합함으로써 무제한 음성인식 시스템을 구축하였다. 따라서 영한 음차 변환을 이용하게 되면 인식 대상이 영어단어일 경우에도 이를 한글 발음으로 변환한 후 그에 해당하는 모델을 생성함으로써 인식이 가능하다. 음성 합성기의 경우 합성에 필요한 한국어 음성 데이터 베이스를 구축하고, 입력되는 텍스트에 따라 이를 연결하여 합성음을 생성한다. 영어가 입력될 경우 영한 음차 변환을 이용하여 입력된 영어발음을 한글로 바꾸어 준 후 입력하게 되므로 별도의 영어 합성기 없이도 합성음을 생성할 수 있다.

  • PDF

A study on Speech Recognition Using Recurrent Neural Predictive HMM (회귀신경망 예측 HMM을 이용한 음성 인식에 관한 연구)

  • 박경훈;한학용;김수훈;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.08a
    • /
    • pp.153-156
    • /
    • 2000
  • 본문에서는 예측형 회귀신경망과 HMM의 하이브리드 네트워크인 회귀신경망 예측 HMM을 구성하였다. 회귀신경망 예측 HMM은 예측형 회귀신경망을 HMM의 각 상태마다 예측기로 정의하여 일정치인 평균벡터 대신에 과거의 특징벡터의 영향을 받아 동적으로 변화하는 신경망에 의한 예측치를 이용하므로 학습패턴 설정자체가 시변성을 반영하는 동적 네트워크의 특성을 가진다. 따라서 음성과 같은 시계열 패턴의 인식에 유리하다. 회귀신경망 예측 HMM은 예측형 회귀신경망의 구조에 따라 Elman망 예측 HMM과 Jordan망 예측 HMM으로 구분하였다. 실험에서는 회귀신경망 예측 HMM의 상태수를 4, 5, 6으로 증가시켜 각 상태 수별로 예측차수 및 중간층 유니트 수의 변화에 따른 인식성능을 조사하였다. 실험결과 평가용. 데이터에 대하여 Elman망예측 HMM은 상태수가 6이고, 예측차수가 3차, 중간층 유니트의 수가 15차원일 때, Jordan망 예측 HMM의 경우 상태수가 5이고, 예측차수가 3차, 중간층 유니트의 수가 10차원일 때 각각 99.5%로 우수한 결과를 얻었다.

  • PDF

A Study on Multi-Pulse Speech Coding Method by Using V/S/TSIUVC (V/S/TSIUVC를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

  • Lee See-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.9
    • /
    • pp.1233-1239
    • /
    • 2004
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech qualify in case coexist with a voiced and an unvoiced consonants in a frame. This paper present a new multi-pulse coding method by using V/S/TSIUVC switching, individual pitch pulses and TSIUVC approximation-synthesis method in order to restrict a distortion of speech quality. The TSIUVC is extracted by using the zero crossing rate and individual pitch pulse. And the TSIUVC extraction rate was 91% for female voice and 96.2% for male voice respectively. The important thing is that the frequency information of 0.347kHz below and 2.813kHz above can be made with high quality synthesis waveform within TSIUVC. I evaluate the MPC use V/UV and the FBD-MPC use V/S/TSIUVC. As a result, I knew that synthesis speech of the FBD-MPC was better in speech quality than synthesis speech of the MPC.

  • PDF

A Study on the Improvement of DTW with Speech Silence Detection (음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구)

  • Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.117-124
    • /
    • 2003
  • Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.

  • PDF

A Study on the Audio Compensation System (음향 보상 시스템에 관한 연구)

  • Jeoung, Byung-Chul;Won, Chung-Sang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.509-517
    • /
    • 2013
  • In this paper, we researched a method that makes a good acoustic-speech system using a digital signal processing technique with dynamic microphone as a transducer. Good acoustic-speech system should deliver the original sound input to electric signal without distortion. By measuring the frequency response of the microphone, adjustment factors are obtained by comparing measured data and standard frequency response of microphone for each frequency band. The final sound levels are obtained using the developed adjustment factors of frequency responses from the microphone and speaker to match the original sound levels using the digital signal processing technique. Then, we minimize the changes in the frequency response and level due to the variation of the distance from source to microphone, where the frequency responses were measured according to the distance changes.

Robust Blind Source Separation to Noisy Environment For Speech Recognition in Car (차량용 음성인식을 위한 주변잡음에 강건한 브라인드 음원분리)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.89-95
    • /
    • 2006
  • The performance of blind source separation(BSS) using independent component analysis (ICA) declines significantly in a reverberant environment. A post-processing method proposed in this paper was designed to remove the residual component precisely. The proposed method used modified NLMS(normalized least mean square) filter in frequency domain, to estimate cross-talk path that causes residual cross-talk components. Residual cross-talk components in one channel is correspond to direct components in another channel. Therefore, we can estimate cross-talk path using another channel input signals from adaptive filter. Step size is normalized by input signal power in conventional NLMS filter, but it is normalized by sum of input signal power and error signal power in modified NLMS filter. By using this method, we can prevent misadjustment of filter weights. The estimated residual cross-talk components are subtracted by non-stationary spectral subtraction. The computer simulation results using speech signals show that the proposed method improves the noise reduction ratio(NRR) by approximately 3dB on conventional FDICA.

  • PDF

Improvement of Overlapped Codebook Search in QCELP (QCELP에서 중첩된 코드북 검색의 개선)

  • 박광철;한승진;이정현
    • The KIPS Transactions:PartC
    • /
    • v.8C no.1
    • /
    • pp.105-112
    • /
    • 2001
  • In this paper, we present the advanced QCELP codebook search improving the qualification of speech, which can make QCELP vocoder used in noise robust system. While conventional QCELP usually searches stochastic codebook once, we can find that two times search is the most suitable for improving the quality of speech after we did 2-5 times search. Consequently, the advanced QCELP vocoder represents excitation signal in detail using two times precise quantization and so improve the qualification of speech. In our experiment, we use the speeches collected from circumstance (such as lecture room, house, street, laboratory etc.) without regarding noise as input dat and measure the speech Qualification using SNR, segSNR. As the result of the experiment, we find that the advanced QCELP makes SNR and segSNR improved by 38.35% and 65.51% respectively compared with conventional QCELP.

  • PDF

On a Pitch Change of the Waveform Coding by the Cepstrum Analysis of Speech Waveforms (켑스트럼 분석에 의한 파형부호화의 피치변경에 관한 연구)

  • Bae, Myung-Jin;Lee, Mi-Suk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.4
    • /
    • pp.14-21
    • /
    • 1992
  • The waveform coding is concerned with simply preserving the wave shape of speech signal through a redundancy reduction process. In area of the speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. However, because the parameters of this coding are not classified as either excitation parameters and vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In this paper, we proposed a new pitch alternation method that can change the pitch periods in the waveform coding by using the cepstrum analysis. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing.

  • PDF

Influence of Nasometer Structure on Nasalance for Speech Therapy (언어치료환자를 위한 비음측정기 모듈의 구조가 비음치 산출에 미치는 영향)

  • Woo, Seong Tak;Park, Y.B.;Kim, J.Y.;Oh, D.H.;Ha, J.W.;Na, S.D.;Kim, M.N.
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.157-166
    • /
    • 2019
  • With the development of medical technology, interest in rehabilitation devices is increasing and various devices are being studied. In particular, devices for speech disorders such as hearing impairment and cleft palate are attracting attention. In general, the nasometer is used for patients with flaccid dysarthria and velopharyngeal incompetence(VPI). However, in the case of the conventional separator type nasometer, that has an acoustic feedback problem between the oral and nasal sounds. In recent, the mask type nasometer has been developed which is insensitive to acoustic feedback. But, still not popularized. In this paper, the nasometer characteristics of the conventional separation type and mask type are analyzed. Also, We were obtained clinical acoustic data from the 6 subjects and examined the significant differences in the structure of the separation type and mask type nasometer. Through experiments, it was confirmed that the measurement was about 3~15% higher in the mask type nasometer than the conventional nasometer having a separator type. Also, We was considered the necessity of nasometer signal processing for acoustic feedback reduction and nasalance calculation optimization.