• Title/Summary/Keyword: Cepstrum

Search Result 274, Processing Time 0.025 seconds

Analysis and parameter extraction of motion blurred image (움직임 열화 현상이 발생한 영상의 분석과 파라메터 추출)

  • 최지웅;최병철;강문기
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.10B
    • /
    • pp.1953-1962
    • /
    • 1999
  • While acquiring the image, the shaking of the image capturing equipment or the object seriously damages the image quality. This phenomenon, which degrades the clarity and the resolution of the image is called motion blur. In this paper, a newly defined function is introduced for finding the degree and the length of the motion blur. The domain of this function defined as Peak-trace domain. In The Peak-trace domain, the noise dominant region for calculating the noise variance and the signal dominant region for extracting the degree and the length of the motion blur are defined and analyzed. Using the information of the Peak-trace in the signal dominant region, we can find the direction of the motion regardless of the noise corruption. Weighted least mean square method helps extracting the Peak-trace more precisely. After getting the direction of the motion blur, we can find the length of the motion blur based on one dimensional Cepstrum. In the experiment, we could efficiently restore the degraded image using the information obtained by the proposed algorithm.

  • PDF

A Novel Two-Level Pitch Detection Approach for Speaker Tracking in Robot Control

  • Hejazi, Mahmoud R.;Oh, Han;Kim, Hong-Kook;Ho, Yo-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.89-92
    • /
    • 2005
  • Using natural speech commands for controlling a human-robot is an interesting topic in the field of robotics. In this paper, our main focus is on the verification of a speaker who gives a command to decide whether he/she is an authorized person for commanding. Among possible dynamic features of natural speech, pitch period is one of the most important ones for characterizing speech signals and it differs usually from person to person. However, current techniques of pitch detection are still not to a desired level of accuracy and robustness. When the signal is noisy or there are multiple pitch streams, the performance of most techniques degrades. In this paper, we propose a two-level approach for pitch detection which in compare with standard pitch detection algorithms, not only increases accuracy, but also makes the performance more robust to noise. In the first level of the proposed approach we discriminate voiced from unvoiced signals based on a neural classifier that utilizes cepstrum sequences of speech as an input feature set. Voiced signals are then further processed in the second level using a modified standard AMDF-based pitch detection algorithm to determine their pitch periods precisely. The experimental results show that the accuracy of the proposed system is better than those of conventional pitch detection algorithms for speech signals in clean and noisy environments.

  • PDF

Automatic Control of Horizontal-moving Stereoscopic Camera by Disparity Compensation (시차 보정에 의한 수평이동방식 입체카메라의 자동제어)

  • Kwon, Ki-Chul;Lee, Yong-Bum;Choi, Young-Soo;Huh, Kyung-Moo;Kim, Nam
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.5
    • /
    • pp.77-85
    • /
    • 2001
  • The purpose of this study is to suggest Vergence Information Extracting Algorithm(VIEA) which enables quick and accurate vergence information achievement for automation of vergence and focus control of horizontal moving stereoscopic camera. Firstly, for this purpose, the geometric structure of horizontal moving stereoscopic camera device was analyzed and linear relation between the vergence and the focus control. Then stereoscopic camera was designed and produced with the application of vergence and focus relation formula. Finally, VIEA that uses Cepstrum filter was employed to implement Automatic Vergence and Focus Controlling Stereoscopic Camera System(AVFCSCS). VIEA showed lower vergence achievement time and error ratio in comparison with existing algorithms. The suggested system in this study substantially reduced the controlling time and error-ratio as to make it possible to achieve natural and clear images. It also simplified the handling of stereoscopic camera for the convenience of end-users.

  • PDF

Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution (음성 신호 특징과 셉스트럽 특징 분포에서 묵음 특징 정규화를 융합한 음성 인식 성능 향상)

  • Hwang, Jae-Cheon
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.5
    • /
    • pp.13-17
    • /
    • 2017
  • Existing Speech feature extracting method in speech Signal, there are incorrect recognition rates due to incorrect speech which is not clear threshold value. In this article, the modeling method for improving speech recognition performance that combines the feature extraction for speech and silence characteristics normalized to the non-speech. The proposed method is minimized the noise affect, and speech recognition model are convergence of speech signal feature extraction to each speech frame and the silence feature normalization. Also, this method create the original speech signal with energy spectrum similar to entropy, therefore speech noise effects are to receive less of the noise. the performance values are improved in signal to noise ration by the silence feature normalization. We fixed speech and non speech classification standard value in cepstrum For th Performance analysis of the method presented in this paper is showed by comparing the results with CHMM HMM, the recognition rate was improved 2.7%p in the speech dependent and advanced 0.7%p in the speech independent.

The suppression of noise-induced speech distortions for speech recognition (음성인식을 위한 잡음하의 음성왜곡제거)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.12
    • /
    • pp.93-102
    • /
    • 1998
  • In noisy environments, human speech productions are influenced by noises(Lombard effect), and speech signals are contaminated. These distortions dramatically reduce the performance of speech recognition systems. This paper proposes a method of the Lombard effect compensation and noise suppression in order to improve speech recognition performance in noise environments. To estimate the intensity of the Lombard effect which is a nonlinear distortion depending on the ambient noise levels, speakers, and phonetic units, we formulate the measure of the Lombard effect level based on the acoustic speech signal, and the measure is used to compensate the Lombard effect. The distortions of speech under noisy environments are cancelled out as follows. First, spectral subtraction and band-pass filtering are used to cancel out noise. Second, energy nomalization is proposed to cancel out the variation of vocal intensity by the Lombard effect. Finally, the Lombard effect level controls the transform which converts Lombard speech cepstrum to clean speech cepstrum. The proposed method was validated on 50 korean word recognition. Average recognition rates were 82.6%, 95.7%, 97.6% with the proposed method, while 46.3%, 75.5%, 87.4% without any compensation at SNR 0, 10, 20 dB, respectively.

  • PDF

Phoneme Segmentation in Consideration of Speech feature in Korean Speech Recognition (한국어 음성인식에서 음성의 특성을 고려한 음소 경계 검출)

  • 서영완;송점동;이정현
    • Journal of Internet Computing and Services
    • /
    • v.2 no.1
    • /
    • pp.31-38
    • /
    • 2001
  • Speech database built of phonemes is significant in the studies of speech recognition, speech synthesis and analysis, Phoneme, consist of voiced sounds and unvoiced ones, Though there are many feature differences in voiced and unvoiced sounds, the traditional algorithms for detecting the boundary between phonemes do not reflect on them and determine the boundary between phonemes by comparing parameters of current frame with those of previous frame in time domain, In this paper, we propose the assort algorithm, which is based on a block and reflecting upon the feature differences between voiced and unvoiced sounds for phoneme segmentation, The assort algorithm uses the distance measure based upon MFCC(Mel-Frequency Cepstrum Coefficient) as a comparing spectrum measure, and uses the energy, zero crossing rate, spectral energy ratio, the formant frequency to separate voiced sounds from unvoiced sounds, N, the result of out experiment, the proposed system showed about 79 percents precision subject to the 3 or 4 syllables isolated words, and improved about 8 percents in the precision over the existing phonemes segmentation system.

  • PDF

An Accuracy Improvement Method on Acoustic Source Localization Using Ground Reflection Effect (지면반사효과를 이용한 폭발 소음원의 위치 추정 정밀도 향상법)

  • Go, Yeong-Ju;Choi, Donghun;Lee, Jaehyung;Choi, Jong-Soo;Ha, Jae-Hyoun;Na, Taeheum
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.26 no.1
    • /
    • pp.69-74
    • /
    • 2016
  • A technique for improving estimation accuracy is introduced in order to locate the impact position of artillery shell during the weapon scoring test. Study on localization of impacts using acoustic measurement has been conducted and the usability of sensor array is verified with experiments. When the blast occurs above the ground in the firing range, the acoustic sensor above the ground can measure the directly propagated sound with the ground-reflected one. In this study, a method for reducing estimation error by using the reflection signal measurements based on the time difference of arrival method. Considering the reflection sound works as same as placing a virtual sensor symmetrically through the ground. This idea enables a virtual three-dimensional array configuration with a two-dimensional plane array above the ground as such. The time difference between the direct and the reflected propagations can be estimated using cepstrum analysis. Performance test has been made in the simulation experiment in the football size area.

A Comparative Study of Speech Parameters for Speech Recognition Neural Network (음성 인식 신경망을 위한 음성 파라키터들의 성능 비교)

  • Kim, Ki-Seok;Im, Eun-Jin;Hwang, Hee-Yung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.61-66
    • /
    • 1992
  • There have been many researches that uses neural network models for automatic speech recognition, but the main trend was finding the neural network models and learning rules appropriate to automatic speech recognition. However, the choice of the input speech parameter for the neural network as well as neural network model itself is a very important factor for the improvement of performance of the automatic speech recognition system using neural network. In this paper we select 6 speech parameters from surveys of the speech recognition papers which uses neural networks, and analyze the performance for the same data and the same neural network model. We use 8 sets of 9 Korean plosives and 18 sets of 8 Korean vowels. We use recurrent neural network and compare the performance of the 6 speech parameters while the number of nodes is constant. The delta cepstrum of linear predictive coefficients showed best result and the recognition rates are 95.1% for the vowels and 100.0% for plosives.

  • PDF

Comparison of Initial Therapeutic Effects of Voice Therapy and Injection Laryngoplasty for Unilateral Vocal Cord Paralysis Patients (일측 성대마비 환자에 대해 음성치료와 성대주입술의 초기 치료 효과 비교 연구)

  • Lee, Chang-Yoon;An, Soo-Youn;Chang, Hyun;Son, Hee Young
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.28 no.2
    • /
    • pp.112-117
    • /
    • 2017
  • Background and Objectives : The purpose of this study was to classify patients with unilateral vocal fold paralysis according to their fixed location and to analysis the effects of two treatment methods by early voice therapy and injection laryngoplasty. Materials and Methods : Twenty patients who were classified as full abduction and slight abduction according to the position of paralysis were treated injection laryngoplasy, and 23 patients were treated by voice therapy. Twenty patients were treated injection laryngoplasy and 23 patients were treated voice therapy. Results were evaluated by acoustic analysis, electroglottography, cepstrum analysis before and after therapy. The voice therapy was conducted by improving the larynx movement and glottal contact, whilst removing hypertension of the supraglottic and use the breathing. Results : Significant improvement was found in the acoustic parameter, cepstrum parameter, and EGG before and after treatment in both groups. There was no significant difference between the two groups when compared before and after treatment to compare the effects of injection laryngoplasty and voice therapy. Conclusion : The initial treatments for unilateral vocal cord paralysis are injection laryngoplasty and voice therapy. however, there is no precise standard about which method should be applied first. Therefore, in this study, we tried to classify patients according to their paralysis position and then apply two methods. The results of this study suggest that voice therapy and Injection laryngoplasty at the initial stage is a very useful method to improve voice quality of vocal fold paralysis and improve laryngeal function.

  • PDF

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.150-159
    • /
    • 2005
  • This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.