• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.023 seconds

A Study on the Fast Search Algorithm for Vector Quantization (벡터 양자화를 위한 고속 탐색 알고리듬에 관한 연구)

  • 지상현;김용석;이남일;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.293-298
    • /
    • 2003
  • In this paper. we propose a fast search algorithm for nearest neighbor vector quantization (NNVQ). The proposed algorithm rejects those codewords which can not be the nearest codeword and reduces the search range of codebook. Hence it reduces computational time and complexity in encoding process, while it provides the same SD performance as the conventional full search algorithm. We apply the proposed algorithm to the adaptive multi-rate (AMR) speech coder and a general vector quantizer designed by LBG. algorithm. Simulation results show effectiveness of the proposed algorithm.

Input-Output Gains of Linear Periodic Time-Varying Systems with Applications to Multirate Signal Processing (다중비 신호처리에 적용한 선형 주기적 시변 시스템의 입출력 이득)

  • 이상철;박계원
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.5
    • /
    • pp.963-969
    • /
    • 2000
  • In this paper, we define two input-output gains of linear periodic time-varying systems. One is the ratio of output with worst-case l2-norm over all inputs with unit 12-norm. It denotes G($\iota_2,\iota_2$.The other is the ratio of output with worst-case RMS value over all inputs with unit RMS value. It denotes G(RMS, RMS) .It is fact that these two gains are equivalent for linear time-invariant system. In this paper, we prove these two gains are also equivalent for linear periodic time-varying system. In addition, the relationship between two method of obtaining the generalized frequency responses for linear periodic time-varying system is derived. Finally, we apply the defined input-output gains to M-channel filter-bank which is multi-rate signal Processing system, used to speech coding. In the filter-bank, generally, aliasing distortion, magnitude distortion, and phase distortion are present. It is shown that these are kept small if the filter-bank is designed by a method that optimizes the gain G($\iota_2,\iota_2$ of an error system.

  • PDF

Noise-Biased Compensation of Minimum Statistics Method using a Nonlinear Function and A Priori Speech Absence Probability for Speech Enhancement (음질향상을 위해 비선형 함수와 사전 음성부재확률을 이용한 최소통계법의 잡음전력편의 보상방법)

  • Lee, Soo-Jeong;Lee, Gang-Seong;Kim, Sun-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.1
    • /
    • pp.77-83
    • /
    • 2009
  • This paper proposes a new noise-biased compensation of minimum statistics(MS) method using a nonlinear function and a priori speech absence probability(SAP) for speech enhancement in non-stationary noisy environments. The minimum statistics(MS) method is well known technique for noise power estimation in non-stationary noisy environments. It tends to bias the noise estimate below that of true noise level. The proposed method is combined with an adaptive parameter based on a sigmoid function and a priori speech absence probability (SAP) for biased compensation. Specifically. we apply the adaptive parameter according to the a posteriori SNR. In addition, when the a priori SAP equals unity, the adaptive biased compensation factor separately increases ${\delta}_{max}$ each frequency bin, and vice versa. We evaluate the estimation of noise power capability in highly non-stationary and various noise environments, the improvement in the segmental signal-to-noise ratio (SNR), and the Itakura-Saito Distortion Measure (ISDM) integrated into a spectral subtraction (SS). The results shows that our proposed method is superior to the conventional MS approach.

On a Pitch Alteration Method by Time-axis Scaling Compensated with the Spectrum for High Quality Speech Synthesis (고음질 합성용 스펙트럼 보상된 시간축조절 피치 변경법)

  • Bae, Myung-Jin;Lee, Won-Cheol;Im, Sung-Bin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.89-95
    • /
    • 1995
  • The waveform coding technique has concerned with simply preserving the waveform shape of speech signal through a redundancy reduction process. In the case of speech synthesis, the waveform coding with high sound quality is mainly used to the synthesis by analysis. However, since the parameters of this coding are not classified into either excitation or vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In order to apply the waveform coding to the synthesis by rule, the pitch alteration technique is required in prosody control. In this paper, we propose a new pitch alteration method that can change the pitch period in waveform coding by scaling the time-axis and compensating the spectrum. This is relevant to the time-frequency domain method were the phase components of the waveform is preserved with a little spectrum distortion of 2.5 % and less for 50% pitch change.

  • PDF

A Study on an Improvement of the Performance by Spectrum Analysis with Variable Window in CELP Vocoder (CELP 부호화기에서 가변 윈도우 스펙트럼 분석에 의한 성능 향상에 관한 연구)

  • Min So-Yeon;Kim Eun-Hwan;Bae Myung-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.6 s.38
    • /
    • pp.233-238
    • /
    • 2005
  • In general CELP(Code Excited Linear Prediction) type vocoders provide good speech qualify around 4.8kbps. Among them, G.723.1 developed for Internet Phone and video-conferencing includes two vocoders, 5.3kbps ACELP(Algebraic-CELP) and 6.3kbps MP-MLQ(Multi-Pulse Maximum Likelihood Quantization) In order to improve the speech qualify in CELP vocoder, in this paper. we proposed a new spectrum analysis algorithm with variable window In CELP vocoder, the spectrum of the synthesised speech signal is distorted because the fixed size windows is used for spectrum analysis. So we have measured the spectral leakage and in order to minimize the spectral leakage have adjusted the window size. Applying this method G.723.1 ACELP, we can got SD(Spectral Distortion) reduction 0.084(dB), residual energy reduction 6.3$\%$ and MOS(Mean Opinion Score) improvement 0.1.

  • PDF

A Study on the Possibility of Drinking through speech Waveform Compensation in Wireless Communication Environments (무선통신 환경에서 음성파형 보상을 통한 음주가능성 여부에 관한 연구)

  • Lee, Won-Hee;Park, Hyungwoo;Bae, Seong-Geon;Bae, Myung-Jin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.3
    • /
    • pp.47-53
    • /
    • 2017
  • There is a difficulty in preventing drunken driving by enforcing alcohol control on the sea due to the environment of Marine transportation rather than roads. In the previous study, we proposed the algorithm, that was developed to identify the voices changed according to be drunk. Using the developed algorithm, it became possible to know the possibility of drinking by long distance ship operators and crew members. In that method drinking can be measured in real time, no matter how far the distance is, if the interception is through a voice that can be transmitted over a distance, rather than a short distance. When communicating voice using the VTS wireless devices, clipping occurs when that environment is uneven, and the rate of judgment of the possibility of drinking may be lowered. Therefore, in this paper, we proposed an enhanced method to compensate the signal in order to reduce the error rate of the possibility of drinking due to distortion of the speech signal.

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing (다이폰 군집화와 개선된 스펙트럼 완만화에 의한 음성합성)

  • Jang, Hyo-Jong;Kim, Kwan-Jung;Kim, Gye-Young;Choi, Hyung-Il
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.665-672
    • /
    • 2003
  • This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human's acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human's acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.

A study on the Method of the Keyword Spotting Recognition in the Continuous speech using Neural Network (신경 회로망을 이용한 연속 음성에서의 keyword spotting 인식 방식에 관한 연구)

  • Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.43-49
    • /
    • 1996
  • This research proposes a system for speaker independent Korean continuous speech recognition with 247 DDD area names using keyword spotting technique. The applied recognition algorithm is the Dynamic Programming Neural Network(DPNN) based on the integration of DP and multi-layer perceptron as model that solves time axis distortion and spectral pattern variation in the speech. To improve performance, we classify word model into keyword model and non-keyword model. We make an experiment on postprocessing procedure for the evaluation of system performance. Experiment results are as follows. The recognition rate of the isolated word is 93.45% in speaker dependent case. The recognition rate of the isolated word is 84.05% in speaker independent case. The recognition rate of simple dialogic sentence in keyword spotting experiment is 77.34% as speaker dependent, and 70.63% as speaker independent.

  • PDF

Target Speech Detection Using Gaussian Mixture Model of Frequency Bandwise Power Ratio for GSC-Based Beamforming (GSC 기반 빔포밍을 위한 주파수 밴드별 전력비 분포의 혼합 가우시안 모델을 이용한 목표 음성신호의 검출)

  • Chang, Hyungwook;Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.61-68
    • /
    • 2015
  • Noise reduction is necessary to compensate for the degradation of recognition performance by various types of noises. Among many noise reduction techniques using microphone array, generalized sidelobe canceller (GSC) has been widely applied to reduce nonstationary noises. The performance of GSC is directly affected by its adaptation mode controller (AMC). That is, accurate target speech detection is essential to guarantee the sufficient noise reduction in pure noise intervals and the less distortion in target speech intervals. Thus, this paper proposes an improved AMC design technique in which the power ratio of the output of fixed beamforming to that of blocking matrix is calculated frequency bandwise and probabilistically modeled by mixture Gaussians for each class. Experimental results show that the proposed algorithm outperforms conventional AMCs in receiver operating curves (ROC) and output SNRs.

A Study on Word Recognition Using Neural-Fuzzy Pattern Matching (뉴럴-퍼지패턴매칭에 의한 단어인식에 관한 연구)

  • 이기영;최갑석
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.11
    • /
    • pp.130-137
    • /
    • 1992
  • This paper presents the word recognition method using a neural-fuzzy pattern matching, in order to make a proper speech pattern for a spectrum sequence and to improve a recognition rate. In this method, a frequency variation is reduced by generating binary spectrum patterns through associative memory using a neural network, and a time variation is decreased by measuring the simillarity using a fuzzy pattern matching. For this method using binary spectrum patterns and logic algebraic operations to measure the simillarity, memory capacity and computation requirements are far less than those of DTW using a conventional distortion measure. To show the validity of the recognition performance for this method, word recognition experiments are carried out using 28 DDD city names and compared with DTW and a fuzzy pattern matching. The results show that our presented method is more excellent in the recognition performance than the other methods.

  • PDF