• Title/Summary/Keyword: voice parameter

Search Result 179, Processing Time 0.03 seconds

Voice Activity Detection with Run-Ratio Parameter Derived from Runs Test Statistic

  • Oh, Kwang-Cheol
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.95-105
    • /
    • 2003
  • This paper describes a new parameter for voice activity detection which serves as a front-end part for automatic speech recognition systems. The new parameter called run-ratio is derived from the runs test statistic which is used in the statistical test for randomness of a given sequence. The run-ratio parameter has the property that the values of the parameter for the random sequence are about 1. To apply the run-ratio parameter into the voice activity detection method, it is assumed that the samples of an inputted audio signal should be converted to binary sequences of positive and negative values. Then, the silence region in the audio signal can be regarded as random sequences so that their values of the run-ratio would be about 1. The run-ratio for the voiced region has far lower values than 1 and for fricative sounds higher values than 1. Therefore, the parameter can discriminate speech signals from the background sounds by using the newly derived run-ratio parameter. The proposed voice activity detector outperformed the conventional energy-based detector in the sense of error mean and variance, small deviation from true speech boundaries, and low chance of missing real utterances

  • PDF

Robust Entropy Based Voice Activity Detection Using Parameter Reconstruction in Noisy Environment

  • Han, Hag-Yong;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • v.1 no.4
    • /
    • pp.205-208
    • /
    • 2003
  • Voice activity detection is a important problem in the speech recognition and speech communication. This paper introduces new feature parameter which are reconstructed by spectral entropy of information theory for robust voice activity detection in the noise environment, then analyzes and compares it with energy method of voice activity detection and performance. In experiments, we confirmed that spectral entropy and its reconstructed parameter are superior than the energy method for robust voice activity detection in the various noise environment.

Voice Activity Detection Based on Signal Energy and Entropy-difference in Noisy Environments (엔트로피 차와 신호의 에너지에 기반한 잡음환경에서의 음성검출)

  • Ha, Dong-Gyung;Cho, Seok-Je;Jin, Gang-Gyoo;Shin, Ok-Keun
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.32 no.5
    • /
    • pp.768-774
    • /
    • 2008
  • In many areas of speech signal processing such as automatic speech recognition and packet based voice communication technique, VAD (voice activity detection) plays an important role in the performance of the overall system. In this paper, we present a new feature parameter for VAD which is the product of energy of the signal and the difference of two types of entropies. For this end, we first define a Mel filter-bank based entropy and calculate its difference from the conventional entropy in frequency domain. The difference is then multiplied by the spectral energy of the signal to yield the final feature parameter which we call PEED (product of energy and entropy difference). Through experiments. we could verify that the proposed VAD parameter is more efficient than the conventional spectral entropy based parameter in various SNRs and noisy environments.

A Cepstral Analysis of Breathy Voice with Vocal Fold Paralysis (성대마비로 인한 기식 음성에 대한 Cepstral 분석)

  • Kang, Young-Ae;Seong, Cheol-Jae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.89-94
    • /
    • 2012
  • The aim of this study is to investigate the usefulness of the parameter CPP (cepstral peak prominence) and LTAS (long term average spectrum) band energy for an analysis of breathy voice with vocal fold paralysis. Thirty-four female subjects who have vocal paralysis after thyroidectomy participated in this study. According to the perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 13). Maximum sustained phonation task was measured for acoustic analysis. CPP-related (i.e. mean F0, mean CPP, and mean CPPs) and LTAS-related (i.e. minimum, maximum, and mean) parameters were used. Independent samples t-test was conducted. Regarding CPP, there are significant differences in mean CPP and mean CPPs between groups. The values of mean CPP and CPPs in the non-breathy voice group are higher than those in the breathy voice group. The CPP could be regarded as the useful parameter for breathy voice analysis in the clinic. When it comes to LTAS, energy from 0 to 2 kHz are significantly different between groups. The minimum value of non-breathy group is lower than that of breathy group, whereas the maximum value of non-breathy group is higher. The frequency band below 2 kHz seems to be related to breathy voice.

Analysis of the Voice Quality in Emotional Speech Using Acoustical Parameters (음향 파라미터에 의한 정서적 음성의 음질 분석)

  • Jo, Cheol-Woo;Li, Tao
    • MALSORI
    • /
    • v.55
    • /
    • pp.119-130
    • /
    • 2005
  • The aim of this paper is to investigate some acoustical characteristics of the voice quality features from the emotional speech database. Six different parameters are measured and compared for 6 different emotions (normal, happiness, sadness, fear, anger, boredom) and from 6 different speakers. Inter-speaker variability and intra-speaker variability are measured. Some intra-speaker consistency of the parameter change across the emotions are observed, but inter-speaker consistency are not observed.

  • PDF

Reconstruction Effect of the Spectral Entropy for the Voice Activity Detection (음성 활동 구간 검출을 위한 스펙트랄 엔트로피의 재구성 효과)

  • Kwon HO-Min;Han Hag-Yong;Lee Kwang-Seok;Koh Si-Young;Hur Kang-In
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.25-28
    • /
    • 2002
  • Voice activity detection is important Problem in the speech recognition and communication. This paper introduces feature parameter which is reconstructed by the spectral entropy of information theory for the robust voice activity detection in the noise environment, analyzes and compares it with the energy method of voice activity detection and performance. In experiment, we confirmed that the spectral entropy is more feature parameter than the energy method for the robust voice activity detection in the various noise environment.

  • PDF

Classification of Pathological Voice Using Artigicial Neural Network with Normalized Parameters

  • Li, Tao;Bak, Il-Suh;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.21-29
    • /
    • 2004
  • In this paper we examined the effect of normalization on discriminating the pathological voice into normal and abnormal classes using artificial neural network. Average values per each parameter were used to normalize each set of parameter values. Artificial neural networks were used as classifiers. And the effect of normalization was evaluated by comparing the discrimination results between original and normalized parameter sets.

  • PDF

Korean isolated word recognizer using new time alignment method of speech signal (새로운 시간축 정규화 방법을 이용한 한국어 고립단어 인식기)

  • Nam, Myeong-U;Park, Gyu-Hong;No, Seung-Yong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.5
    • /
    • pp.567-575
    • /
    • 2001
  • This paper suggests new method to get fixed size parameter from different length of voice signals. The efficiency of speech recognizer is determined by how to compare the similarity(distance of each pattern) of the parameter from voice signal. But the variation of voice signal and the difference of speech speed make it difficult to extract the fixed size parameter from the voice signal. The method suggested in this paper is to normalize the parameter at fixed size by using the 2 dimension DCT(Discrete Cosine Transform) after representing the parameter by spectrogram. To prove validity of the suggested method, parameter extracted from 32 auditory filter-bank(it estimates auditory nerve firing probabilities) is used for the input of neural network after being processed by 2 dimension DCT. And to compare with conventional methods, we used one of conventional methods which solve time alignment problem. The result shows more efficient performance and faster recognition speed in the speaker dependent and independent isolated word recognition than conventional method.

  • PDF

Significance of Acoustic Parameter - RAP, PPQ, APQ- in Hoarseness (애성환자에서 음향지표인 RAP, PPQ 및 APQ의 유용성)

  • 안철민;이종혁;강현국;이용배
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.6 no.1
    • /
    • pp.22-26
    • /
    • 1995
  • Change of voice, espicially hoarseness show irregular vibration of vocal cord. So, computerized acoustic analysis has presented many acoustic parameters for objective evaluation of voice. We objectively investigated the vocal vibration of normal persons and hoarseness patients in Korea. The RAP(relative average perturbation), PPQ(pitch period perturbation quotient) and APQ(amplitude perturbation quotient) of normal persons were compared with that of hoarseness patients with multidimensional voice program for the possibility of distinguishing the pathologic vocal vibration from normal. Authors agree that RAP, PPQ and APQ showed interesting differences between the normal and the hoarseness patients by the multivariate statistical analysis. In conculusion, relative average perturbation, pitch period perturbation and amplitude perturbation quotient might be meangingful screening parameters distinguishing hoarseness patients from normal.

  • PDF

Acoustic Parameter for an Objective Assessment of Breathiness : The Significance of Voice Turbulance Index(VTI) (기식성 애성 판정을 위한 객관적 음향지표 : VTI(Voice Turbulance Index)의 유용성)

  • 김형태;김민식;조승호
    • Proceedings of the KSLP Conference
    • /
    • 1996.11a
    • /
    • pp.78-78
    • /
    • 1996
  • 기식성 애성을 객관적으로 평가할 수 있는 음향지표는 아직 많은 연구가 되어 있지 않고 단지 청각심리검사에 의존하고 있는 실정이다. 본 저자들은 컴퓨터음향분석의 한 지표로서 기식성 애성에 대한 객관적인 음향지표로 이용될 수 있는 Multi-Dimensional Voice Program(mode1 4305, Kay Elemtrics Corp, USA)의 VTI(voice turbulance index)를 정상인과 성대병변 환자에서 비교 분석함으로써 기식성 애성의 객관적인 음향지표로서의 유용성을 확인하고자 하였다. (중략)

  • PDF