• Title/Summary/Keyword: Speech signal processing

Search Result 331, Processing Time 0.025 seconds

Classification of Sasang Constitution Taeumin by Comparative of Speech Signals Analysis (음성 분석 정보값 비교를 통한 사상체질 태음인의 분류)

  • Kim, Bong-Hyun;Lee, Se-Hwan;Cho, Dong-Uk
    • The KIPS Transactions:PartB
    • /
    • v.15B no.1
    • /
    • pp.17-24
    • /
    • 2008
  • This paper proposes Sasang constitution classification through speech signals analysis values and comparison. For this, this paper wishes to propose Taeumin classification method of output values signals that comes out speech signal analysis to connect with process classification of Soeumin through skin diagnosis by first step in the whole system configuration to provide for objective index of Sasang constitution. First of all, these characteristic of voices wish to extract phonetic elements that each Sasang constitution groups' clear features. Also, we wish to classify Taeumin through constitution groups' difference and similarity on the basis of results value. Finally, the effectiveness of this method is verified through the experiments.

Automatic Synthesis Method Using Prosody-Rich Database (대용량 운율 음성데이타를 이용한 자동합성방식)

  • 김상훈
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.87-92
    • /
    • 1998
  • In general, the synthesis unit database was constructed by recording isolated word. In that case, each boundary of word has typical prosodic pattern like a falling intonation or preboundary lengthening. To get natural synthetic speech using these kinds of database, we must artificially distort original speech. However, that artificial process rather resulted in unnatural, unintelligible synthetic speech due to the excessive prosodic modification on speech signal. To overcome these problems, we gathered thousands of sentences for synthesis database. To make a phone level synthesis unit, we trained speech recognizer with the recorded speech, and then segmented phone boundaries automatically. In addition, we used laryngo graph for the epoch detection. From the automatically generated synthesis database, we chose the best phone and directly concatenated it without any prosody processing. To select the best phone among multiple phone candidates, we used prosodic information such as break strength of word boundaries, phonetic contexts, cepstrum, pitch, energy, and phone duration. From the pilot test, we obtained some positive results.

  • PDF

Performance Improvement of Acoustic Echo Canceller Using Post-Processor (후처리기를 이용한 음향 반향 제거기의 성능향상)

  • 박장식;김현태;손경식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.35-43
    • /
    • 1999
  • In this paper, a new robust adaptive algorithm and a post-processing method are proposed to improve the performance of AEC without computational burden. Its step-size is normalized by the sum of the powers of the reference input signal and the desired signal. When the near-end speaker's speech and noise are applied into the microphone, the step-size becomes small and the misalignment of coefficients are reduced. To reduce the residual echoes, a new post-processing method, which is co-operated with the proposed noise-robust adaptive algorithm, is proposed in this paper. The method is based on the correlation of the desired signal and the estimation error signal. The residual echoes are attenuated as proportional to the correlation normalized with the power of desired signals. The normalized correlation plays a role as Wiener filter for residual echoes. In the double-talk situation, the estimation error signals, that are residual echoes, dominantly include the near-end speaker's speech and the normalized correlation closes to 1. Therefore, the near-end speaker's speech can be transmitted without being attenuated. When the desired signals consists of only the acoustic echoes, the residual echoes are mostly attenuated and canceled by the proposed post-processor. The computation of AEC using the proposed post-processor is comparable to NLMS algorithm.

  • PDF

An Experimental Study of Korean Dialectal Speech (한국어 방언 음성의 실험적 연구)

  • Kim, Hyun-Gi;Choi, Young-Sook;Kim, Deok-Su
    • Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.49-65
    • /
    • 2006
  • Recently, several theories on the digital speech signal processing expanded the communication boundary between human beings and machines drastically. The aim of this study is to collect dialectal speech in Korea on a large scale and to establish a digital speech data base in order to provide the data base for further research on the Korean dialectal and the creation of value-added network. 528 informants across the country participated in this study. Acoustic characteristics of vowels and consonants are analyzed by Power spectrum and Spectrogram of CSL. Test words were made on the picture cards and letter cards which contained each vowel and each consonant in the initial position of words. Plot formants were depicted on a vowel chart and transitions of diphthongs were compared according to dialectal speech. Spectral times, VOT, VD, and TD were measured on a Spectrogram for stop consonants, and fricative frequency, intensity, and lateral formants (LF1, LF2, LF3) for fricative consonants. Nasal formants (NF1, NF2, NF3) were analyzed for different nasalities of nasal consonants. The acoustic characteristics of dialectal speech showed that young generation speakers did not show distinction between close-mid /e/ and open-mid$/\epsilon/$. The diphthongs /we/ and /wj/ showed simple vowels or diphthongs depending to dialect speech. The sibilant sound /s/ showed the aspiration preceded to fricative noise. Lateral /l/ realized variant /r/ in Kyungsang dialectal speech. The duration of nasal consonants in Chungchong dialectal speech were the longest among the dialects.

  • PDF

On Implementing the Digital DTMF Receiver Using PARCOR Analysis Method (PARCOR 분석 방법에 의한 디지털 DTMF 수신기 구현에 관한 연구)

  • Ha, Pan Bong;ANN, Souguil
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.2
    • /
    • pp.196-200
    • /
    • 1987
  • The following methods are proposed for implementing digital dual tone multi-frequency (DTMF) receiver: using infinite impulse response(IIR) digital filters, period-counting algorithm, discrete Fourier transform(DFT), and fast Fourier transform(FFT)[2]. The PARCOR(Partical Correlation) analysis method which has been widly used in the speech signal processing area is applied to the dual tone multi-frequency(DTMF) signal detection. This method is easy to implement digitally and stronger to digit simulation of speech than any other methods proposed up to date. Since sampling rate of 4KHz is used in the DTMF receiver for the detection of input DTMF signal originally sampled at 8KHz, it effects two times higher multiplexing efficiency.

  • PDF

Real-Time Implementation of Wideband Adaptive Multi Rate (AMR-WB) Speech Codec Using TMS32OC6201 (TMS320C6201을 이용한 적응 다중 전송율을 갖는 광대역 음성부호화기의 실시간 구현)

  • Lee, Seung-Won;Bae, Keun-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.9C
    • /
    • pp.1337-1344
    • /
    • 2004
  • This paper deals with analysis and real-time Implementation of a wide band adaptive multirate speech codec (AMR-WB) using a fixed-point DSP of TI's TMS320C6201. In the AMR-WB codec, input speech is divided into two frequency bands, lower and upper bands, and processed independently. The lower band signal is encoded based on the ACELP algorithm and the upper band signal is processed using the random excitation with a linear prediction synthesis filter. The implemented AMR-WB system used 218 kbytes of program memory and 92 kbytes of data memory. And its proper operation was confirmed by comparing a decoded speech signal sample-by-sample with that of PC-based simulation. Maximum required time of 5 75 ms for processing a frame of 20 ms of speech validates real-time operation of the Implemented system.

H/W Implementation of Speech Protestor for Cochlear Implant (청각보철장치용 어음발췌기의 하드웨어 구현)

  • Shin, J.I.;Park, S.H.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1998 no.11
    • /
    • pp.161-162
    • /
    • 1998
  • In this paper, a speech processor which is the most important part of the cochlear implant is developed, to recover auditory ability for the sensorineural disorders who have damaged for their inner ear. This system consists of the analog and digital signal processing part, of which functions is the pre-processing and the main processing, respectively. The main processing is peformed in DSP processor (TMS320C31-40) by using S/W. Because the program is used in this system, it is possible to cope with the individual status of the patients, very easily.

  • PDF

A Study on Recognition Units and Methods to Align Training Data for Korean Speech Recognition) (한국어 인식을 위한 인식 단위와 학습 데이터 분류 방법에 대한 연구)

  • 황영수
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.2
    • /
    • pp.40-45
    • /
    • 2003
  • This is the study on recognition units and segmentation of phonemes. In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition unit. In this paper, we study on the proper recognition units and segmentation of phonemes for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of the case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And recognizer using manually-aligned training data is a little superior to that using automatically-aligned training data. Also, the recognition rate of the case in which the bipbone is used as the recognition unit is better than that of the case in which the mono-Phoneme is used.

  • PDF

Interference Suppression Using Principal Subspace Modification in Multichannel Wiener Filter and Its Application to Speech Recognition

  • Kim, Gi-Bak
    • ETRI Journal
    • /
    • v.32 no.6
    • /
    • pp.921-931
    • /
    • 2010
  • It has been shown that the principal subspace-based multichannel Wiener filter (MWF) provides better performance than the conventional MWF for suppressing interference in the case of a single target source. It can efficiently estimate the target speech component in the principal subspace which estimates the acoustic transfer function up to a scaling factor. However, as the input signal-to-interference ratio (SIR) becomes lower, larger errors are incurred in the estimation of the acoustic transfer function by the principal subspace method, degrading the performance in interference suppression. In order to alleviate this problem, a principal subspace modification method was proposed in previous work. The principal subspace modification reduces the estimation error of the acoustic transfer function vector at low SIRs. In this work, a frequency-band dependent interpolation technique is further employed for the principal subspace modification. The speech recognition test is also conducted using the Sphinx-4 system and demonstrates the practical usefulness of the proposed method as a front processing for the speech recognizer in a distant-talking and interferer-present environment.

Robust Speech Hash Function

  • Chen, Ning;Wan, Wanggen
    • ETRI Journal
    • /
    • v.32 no.2
    • /
    • pp.345-347
    • /
    • 2010
  • In this letter, we present a new speech hash function based on the non-negative matrix factorization (NMF) of linear prediction coefficients (LPCs). First, linear prediction analysis is applied to the speech to obtain its LPCs, which represent the frequency shaping attributes of the vocal tract. Then, the NMF is performed on the LPCs to capture the speech's local feature, which is then used for hash vector generation. Experimental results demonstrate the effectiveness of the proposed hash function in terms of discrimination and robustness against various types of content preserving signal processing manipulations.