• Title/Summary/Keyword: Speech signal

Search Result 1,172, Processing Time 0.025 seconds

The Analysis and Recognition of Korean Speech Signal using the Phoneme (음소에 의한 한국어 음성의 분석과 인식)

  • Kim, Yeong-Il;Lee, Geon-Gi;Lee, Mun-Su
    • The Journal of the Acoustical Society of Korea
    • /
    • v.6 no.2
    • /
    • pp.38-47
    • /
    • 1987
  • As Korean language can be phonemically classified according to the characteristic and structure of its pronunciation, Korean syllables can be divided into the phonemes such as consonant and vowel. The divided phonemes are analyzed by using the method of partial autocorrelation, and the order of partial autocorelation coefficient is 15. In analysis, it is shown that each characteristic of the same consonants, vowels, and end consonant in syllables in similar. The experiments is carried out by dividing 675 syllables into consonants, vowels, and end consonants. The recognition rate of consonants, vowels, end-consonants, and syllables are $85.0(\%)$, $90.7(\%)$, $85.5(\%)$and $72.1(\%)$ respectively. In conclusion, it is shown that Korean syllables, divided by the phonemes, are analyzed and recognized with minimum data and short processing time. Furthermore, it is shown that Korean syllables, words and sentences are recognized in the same way.

  • PDF

A Study on the Spoken KOrean-Digit Recognition Using the Neural Netwok (神經網을 利用한 韓國語 數字音 認識에 관한 硏究)

  • Park, Hyun-Hwa;Gahang, Hae Dong;Bae, Keun Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.5-13
    • /
    • 1992
  • Taking devantage of the property that Korean digit is a mono-syllable word, we proposed a spoken Korean-digit recognition scheme using the multi-layer perceptron. The spoken Korean-digit is divided into three segments (initial sound, medial vowel, and final consonant) based on the voice starting / ending points and a peak point in the middle of vowel sound. The feature vectors such as cepstrum, reflection coefficients, ${\Delta}$cepstrum and ${\Delta}$energy are extracted from each segment. It has been shown that cepstrum, as an input vector to the neural network, gives higher recognition rate than reflection coefficients. Regression coefficients of cepstrum did not affect as much as we expected on the recognition rate. That is because, it is believed, we extracted features from the selected stationary segments of the input speech signal. With 150 ceptral coefficients obtained from each spoken digit, we achieved correct recognition rate of 97.8%.

  • PDF

Vocal separation method using weighted β-order minimum mean square error estimation based on kernel back-fitting (커널 백피팅 알고리즘 기반의 가중 β-지수승 최소평균제곱오차 추정방식을 적용한 보컬음 분리 기법)

  • Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.49-54
    • /
    • 2016
  • In this paper, we propose a vocal separation method using weighted ${\beta}$-order minimum mean wquare error estimation (WbE) based on kernel back-fitting algorithm. In spoken speech enhancement, it is well-known that the WbE outperforms the existing Bayesian estimators such as the minimum mean square error (MMSE) of the short-time spectral amplitude (STSA) and the MMSE of the logarithm of the STSA (LSA), in terms of both objective and subjective measures. In the proposed method, WbE is applied to a basic iterative kernel back-fitting algorithm for improving the vocal separation performance from monaural music signal. The experimental results show that the proposed method achieves better separation performance than other existing methods.

The Study on Intraoral Pressure, Closure Duration and VOT During Phonation of Korean Bilabial Stop Consonants (한국어 양순 파열음 발음시 구강내압과 폐쇄기, VOT에 대한 연구)

  • 표화영;최홍식
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.7 no.1
    • /
    • pp.50-55
    • /
    • 1996
  • Acoustic analysis study was performed on 20 normal subjects by speaking nonsense syllables composed of Korean bilabial stops$(/P, P^{\star}, P^{h}/)$ and their preceding and/or following vowel /a/ (that is, $[pa, p^{\star}a, p^{h}a, apa, ap^{\star}a, ap^{h}a]$) with an ultraminiature pressure, sensor. in their mouths. Speech materials were phonated twice, once with a moderate voice, another time with a loud voice. The acoustic signal and intraoral pressure were recorded simultaneously on computer. By these procedures, we were to measure the intraoral pressure, closure duration and VOT of Korean bilabial stops, and to compare the values one another according to the intensity of phonation and the position of the target consonants. Intraoral pressure was measured by the peak intraoral pressure value of Its wave closure duration by the time interval between the onset of intraoral pressure build-up and the burst meaning the release of closure ; Voice onset time(VOT) on by the time interval between the burst and the onset or glottal vibration. Heavily aspirated bilabial stop consonant /$p^h$/ showed the highest intraoral pressure value, unaspirated /$p^{\star}$/, the second, slightly aspirated /P/, the lowest. The syllable initial bilabial stops showed higher intraoral pressure than word initial stops, and the value of loudly phonated consonants were higher than moderate consonants. The longest closure duration period was that of /$p^{\star}$/ and the shortest, /P/, and the duration was longer in word initial position and in the moderate voice. In VOT, the order of the longest to shortest was $/{p^h}/, /p/, /{p^\star}/$, and the value was shorer when the consonant was in intervocalic position and when it was phonated with a loud voice.

  • PDF

Implementation of a 4-Channerl ADPCM CODEC Using a DSP (DSP를 사용한 4채널용 ADPCM CODEC의 실시간 구현에 관한 연구)

  • Lee, Ui-Taek;Lee, Gang-Seok;Lee, Sang-Uk
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.22 no.5
    • /
    • pp.29-38
    • /
    • 1985
  • In this paper we have designed and implemented in real time a simple, efficient and flexible AOPCM cosec using a high speed digital processor, NEC 7720. For ADPCM system, we have used an instantaneous adaptive quantizer and a first-order fixed predictor. The software for NEC 7720 has been developed and it was found that the NEC 7720 was capable of performing the entire ADPCAt algorithm for 4 channels in real time as optimizing the program. Computer simulation has born made to investigate a computational accuracr of NEC 7720 and to de-termine necessary parameters for a ADPCM codec. Real telephone speech, RC-shaped Gaussian noise and 1004 Hz tone signal were used for simulation. In simulation, the parameters werc optimized from the computed SNR and the informal listening test. The developed software was tested in real time operation using a hardware emulator for NEC 7720. It took a maximum 23.25$\mu$s to encode one sample and 113.5$\mu$s, including all the necessary 1/0 operations, to encode 4 channels. In the case of decoding process, it took 24.75$\mu$s to decode one sample and 119.5$\mu$s to decode 4 channels.

  • PDF

Nonlinear Prediction of Nonstationary Signals using Neural Networks (신경망을 이용한 비정적 신호의 비선형 예측)

  • Choi, Han-Go;Lee, Ho-Sub;Kim, Sang-Hee
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.10
    • /
    • pp.166-174
    • /
    • 1998
  • Neural networks, having highly nonlinear dynamics by virtue of the distributed nonlinearities and the learing ability, have the potential for the adaptive prediction of nonstationary signals. This paper describes the nonlinear prediction of these signals in two ways; using a nonlinear module and the cascade combination of nonlinear and linear modules. Fully-connected recurrent neural networks (RNNs) and a conventional tapped-delay-line (TDL) filter are used as the nonlinear and linear modules respectively. The dynamic behavior of the proposed predictors is demonstrated for chaotic time series adn speech signals. For the relative comparison of prediction performance, the proposed predictors are compared with a conventional ARMA linear prediction model. Experimental results show that the neural networks based adaptive predictor ourperforms the traditional linear scheme significantly. We also find that the cascade combination predictor is well suitable for the prediction of the time series which contain large variations of signal amplitude.

  • PDF

Walking Aid System for Visually Impaired People by Exploiting Touch-based Interface (촉각 인터페이스를 이용한 시각장애인 보행보조 시스템)

  • Lee, Ji-eun;Oh, Yoosoo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.522-525
    • /
    • 2015
  • In this paper, we propose a walking aid system that guides route to visually impaired people in order to recognize uncertain obstacles based on tactile stimulation. The proposed system is composed of the touch-based obstacle detection module, the obstacle height detection module, and the route guidance algorithms. The touch-based obstacle detection module detects each obstacle, which is located at left, right, and front of a visually impaired person by stimulating his thumb with the rotational force of the servomotor. The obstacle height detection module integrates detected data by the linear arrangement of ultrasonic sensors to identify the height of an obstacle about 3 of-phase(i.e., high, medium, low). The proposed route guidance algorithm guides an optimized path to the visually impaired person by updating his current position information based on the signal of the built-in GPS receiver in smartphone. In addition, the route guidance algorithm delivers information with speech to a visually impaired person through Bluetooth commuination in the developed route guidance app. The proposed system can create a path to avoid the obstacles by recognizing the placed situation of the obstacles with exploring the uncertain path.

  • PDF

A Lingual Sound Analysis based on Oriental Medicine Auscultation for Heart Diseases Diagnosis (심장(心臟) 질환(疾患) 진단(診斷)을 위한 한의학적 청진(聽診) 기반의 설음(舌音) 분석)

  • Kim, Bong-Hyun;Cho, Dong-Uk;Her, Sung-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.8B
    • /
    • pp.830-838
    • /
    • 2009
  • Oriental medicine lacks diagnosis data in fixed quantity possible to express visually to patients by depending on clinician's intuition than Western medicine that continues to development by various diagnosis devices. For that, this paper intends to examine relation between heart and voice signal regarded as center organ and source of life and mind in order to implement objectification through the visualization of oriental diagnosis method above all. According to because the heart is related to the tongue among five organs, by thinking with sounds, we would design the way of identifying existence of heart diseases focused on the fact that lingual sound pronunciation of heart patient is inexact. For this, we achieved a comparison, analysis of statistical bandwidth and morphological modeling of the second formants frequency about a lingual sound for their voice constituted subject group of heart diseases and normal people. Finally, we analyzed interrelationship to the result of experiment by designed method.

Implementation of Adaptive Multi Rate (AMR) Vocoder for the Asynchronous IMT-2000 Mobile ASIC (IMT-2000 비동기식 단말기용 ASIC을 위한 적응형 다중 비트율 (AMR) 보코더의 구현)

  • 변경진;최민석;한민수;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.56-61
    • /
    • 2001
  • This paper presents the real-time implementation of an AMR (Adaptive Multi Rate) vocoder which is included in the asynchronous International Mobile Telecommunication (IMT)-2000 mobile ASIC. The implemented AMR vocoder is a multi-rate coder with 8 modes operating at bit rates from 12.2kbps down to 4.75kbps. Not only the encoder and the decoder as basic functions of the vocoder are implemented, but VAD (Voice Activity Detection), SCR (Source Controlled Rate) operation and frame structuring blocks for the system interface are also implemented in this vocoder. The DSP for AMR vocoder implementation is a 16bit fixed-point DSP which is based on the TeakLite core and consists of memory block, serial interface block, register files for the parallel interface with CPU, and interrupt control logic. Through the implementation, we reduce the maximum operating complexity to 24MIPS by efficiently managing the memory structure. The AMR vocoder is verified throughout all the test vectors provided by 3GPP, and stable operation in the real-time testing board is also proved.

  • PDF

Pronunciation Influence Analysis of Carbonate Drink and Eucalyptus Fragrance by Applying Speech Signal Processing Techniques (음성신호 처리 기술을 적용한 탄산음료와 유칼립투스 발향이 발음에 미치는 영향 분석)

  • Kim, Bong-Hyun;Cho, Dong-Uk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.5C
    • /
    • pp.420-428
    • /
    • 2012
  • One of the most important means in modern NQ emphasized smart society is the communication skill. Especially, effects on improving pronunciation accuracy, it is mostly necessary to accurately express his or her own idea due to the personal relation influence 38% of voice. For this, this paper proposed the voice influence analysis of carbonate drink and eucalyptus fragrance. In particular, in the case of carbonate drink, the amounts of drinking accumulation is verified for analysing the drinking accumulation influence. Also, eucalyptus fragrance is reported for influencing the pronunciation accuracy. For this, jitter, shimmer, pitch and intensity of voice is analyzed. Finally, we accomplish an voice analysis of quantization, objective and visualization for such carbonate drink and eucalyptus fragrance.