• 제목/요약/키워드: speech parameter

검색결과 373건 처리시간 0.023초

잡음에 강인한 음성 인식을 위한 환경 파라미터 보상에 관한 연구 (A Study on Environment Parameter Compensation Method for Robust Speech Recognition)

  • 홍미정;이호웅
    • 한국ITS학회 논문지
    • /
    • 제5권2호
    • /
    • pp.1-10
    • /
    • 2006
  • 본 논문에서는 강인한 음성인식 기술의 하나인 모델 파라미터 변환 기법 중 Carnegie Mellon University(1996)에서 Moreno가 제안한 최신 VTS(Vector Taylor Series) 알고리즘을 이용하여 주어진 잡음 환경에서 실험하였다. 이러한 VTS 알고리즘의 성능평가를 위해서 기존의 잡음 처리 방법 중 CMN(Cepstral Mean Normalization) 기법을 도입하였으며, 데시벨별로 설정한 백색 잡음과 거리잡음을 환경잡음으로 주어졌을 때의 인식률을 비교하였다. 또한 기존 Moreno가 제안한 실험환경의 인식 결과와 본 논문에서의 실험결과를 비교 분석하였다. 인식 알고리즘으로는 실시간 구현이 가능한 이산HMM(Hidden Markov Model)을 사용하였다.

  • PDF

음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구 (A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting)

  • 김종국;조왕래;배명진
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.85-95
    • /
    • 2003
  • In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

Dual MAC을 이용한 음성 부호화기용 피치 매개변수 검색 구조 설계 (Design of pitch parameter search architecture for a speech coder using dual MACs)

  • 박주현;심재술;김영민
    • 전자공학회논문지A
    • /
    • 제33A권5호
    • /
    • pp.172-179
    • /
    • 1996
  • In the paper, QCELP (qualcomm code excited linear predictive), CDMA (code division multiple access)'s vocoder algorithm, was analyzed. And then, a ptich parameter seaarch architecture for 16-bit programmable DSP(digital signal processor) for QCELP was designed. Because we speed up the parameter search through high speed DSP using two MACs, we can satisfy speech codec specifiction for the digital celluar. Also, we implemented in FIFO(first-in first-out) memory using register file to increase the access time of data. This DSP was designed using COMPASS, ASIC design tool, by top-down design methodology. Therefore, it is possible to cope with rapid change at mobile communication market.

  • PDF

SMV와 G.723.1 음성부호화기를 위한 파라미터 직접 변환 방식의 상호부호화 알고리듬 (Transcoding Algorithm for SMV and G.723.1 Vocoders via Direct Parameter Transformation)

  • 서성호;장달원;이선일;유창동
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2228-2231
    • /
    • 2003
  • In this paper, a transcoding algorithm for the Selectable Mode Vocoder (SMV) and the G.723.1 speech coder via direct parameter transformation is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm converts the parameters of one coder to the Other Without going through the decoding md encoding process. The proposed algorithm is composed of four parts: the parameter decoding, line spectral pair (LSP) conversion, pitch period conversion, excitation conversion and rate selection. The evaluation results show that the proposed algorithm achieves equivalent speech quality to that of tandem transcoding with reduced computational complexity and delay.

  • PDF

망각소자를 갖는 t-분포 강인 연속 추정을 이용한 음성 신호 추정에 관한 연구 (Robust Sequential Estimation based on t-distribution with forgetting factor for time-varying speech)

  • 이주헌
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
    • /
    • pp.470-474
    • /
    • 1998
  • In this paper, to estimate the time-varying parameters of speech signal, we use the robust sequential estimator based on t-distribution and, for time-varying signal, introduce the forgetting factor. By using the RSE based on t-distribution with small degree of freedom, we can alleviate efficiently the effects of outliers to obtain the better performance of parameter estimation. Moreover, by the forgetting factor, the proposed algorithm can estimate the accurate parameters under the rapid variation of speech signal.

  • PDF

조음 합성과 연결 합성 방식을 결합한 개선된 문서-음성 합성 시스템 (Improved Text-to-Speech Synthesis System Using Articulatory Synthesis and Concatenative Synthesis)

  • 이근희;김동주;홍광석
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(4)
    • /
    • pp.369-372
    • /
    • 2002
  • In this paper, we present an improved TTS synthesis system using articulatory synthesis and concatenative synthesis. In concatenative synthesis, segments of speech are excised from spoken utterances and connected to form the desired speech signal. We adopt LPC as a parameter, VQ to reduce the memory capacity, and TD-PSOLA to solve the naturalness problem.

  • PDF

스펙트럴 피크 트랙 분석을 이용한 음성/음악 분류 (Speech/Music Discrimination Using Spectral Peak Track Analysis)

  • 금지수;이현수
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2006년도 하계종합학술대회
    • /
    • pp.243-244
    • /
    • 2006
  • In this study, we propose a speech/music discrimination method using spectral peak track analysis. The proposed method uses the spectral peak track's duration at the same frequency channel for feature parameter. And use the duration threshold to discriminate the speech/music. Experiment result, correct discrimination ratio varies according to threshold, but achieved a performance comparable to another method and has a computational efficient for discrimination.

  • PDF

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

  • Lee, Soo-Jeong;Kim, Soon-Hyob
    • International Journal of Control, Automation, and Systems
    • /
    • 제6권6호
    • /
    • pp.818-827
    • /
    • 2008
  • In this paper, we propose a new noise estimation and reduction algorithm for stationary and nonstationary noisy environments. This approach uses an algorithm that classifies the speech and noise signal contributions in time-frequency bins. It relies on the ratio of the normalized standard deviation of the noisy power spectrum in time-frequency bins to its average. If the ratio is greater than an adaptive estimator, speech is considered to be present. The propose method uses an auto control parameter for an adaptive estimator to work well in highly nonstationary noisy environments. The auto control parameter is controlled by a linear function using a posteriori signal to noise ratio(SNR) according to the increase or the decrease of the noise level. The estimated clean speech power spectrum is obtained by a modified gain function and the updated noisy power spectrum of the time-frequency bin. This new algorithm has the advantages of much more simplicity and light computational load for estimating the stationary and nonstationary noise environments. The proposed algorithm is superior to conventional methods. To evaluate the algorithm's performance, we test it using the NOIZEUS database, and use the segment signal-to-noise ratio(SNR) and ITU-T P.835 as evaluation criteria.

음성의 피치 파라메터를 사용한 감정 인식 (Emotion Recognition using Pitch Parameters of Speech)

  • 이규현;김원구
    • 한국지능시스템학회논문지
    • /
    • 제25권3호
    • /
    • pp.272-278
    • /
    • 2015
  • 본 논문에서는 음성신호 피치 정보를 이용한 감정 인식 시스템 개발을 목표로 피치 정보로부터 다양한 파라메터 추출방법을 연구하였다. 이를 위하여 다양한 감정이 포함된 한국어 음성 데이터베이스를 이용하여 피치의 통계적인 정보와 수치해석 기법을 사용한 피치 파라메터를 생성하였다. 이러한 파라메터들은 GMM(Gaussian Mixture Model) 기반의 감정 인식 시스템을 구현하여 각 파라메터의 성능을 비교되었다. 또한 순차특징선택 방법을 사용하여 최고의 감정 인식 성능을 나타내는 피치 파라메터들을 선정하였다. 4개의 감정을 구별하는 실험 결과에서 총 56개의 파라메터중에서 15개를 조합하였을 때 63.5%의 인식 성능을 나타내었다. 또한 감정 검출 여부를 나타내는 실험에서는 14개의 파라메터를 조합하였을 때 80.3%의 인식 성능을 나타내었다.

성대마비로 인한 기식 음성에 대한 Cepstral 분석 (A Cepstral Analysis of Breathy Voice with Vocal Fold Paralysis)

  • 강영애;성철재
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.89-94
    • /
    • 2012
  • The aim of this study is to investigate the usefulness of the parameter CPP (cepstral peak prominence) and LTAS (long term average spectrum) band energy for an analysis of breathy voice with vocal fold paralysis. Thirty-four female subjects who have vocal paralysis after thyroidectomy participated in this study. According to the perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 13). Maximum sustained phonation task was measured for acoustic analysis. CPP-related (i.e. mean F0, mean CPP, and mean CPPs) and LTAS-related (i.e. minimum, maximum, and mean) parameters were used. Independent samples t-test was conducted. Regarding CPP, there are significant differences in mean CPP and mean CPPs between groups. The values of mean CPP and CPPs in the non-breathy voice group are higher than those in the breathy voice group. The CPP could be regarded as the useful parameter for breathy voice analysis in the clinic. When it comes to LTAS, energy from 0 to 2 kHz are significantly different between groups. The minimum value of non-breathy group is lower than that of breathy group, whereas the maximum value of non-breathy group is higher. The frequency band below 2 kHz seems to be related to breathy voice.