Search | Korea Science

Frame Reliability Weighting for Robust Speech Recognition (프레임 신뢰도 가중에 의한 강인한 음성인식)

조훈영;김락용;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.3
- /
- pp.323-329
- /
- 2002
This paper proposes a frame reliability weighting method to compensate for a time-selective noise that occurs at random positions of speech signal contaminating certain parts of the speech signal. Speech frames have different degrees of reliability and the reliability is proportional to SNR (signal-to noise ratio). While it is feasible to estimate frame Sl? by using the noise information from non-speech interval under a stationary noisy situation, it is difficult to obtain noise spectrum for a time-selective noise. Therefore, we used statistical models of clean speech for the estimation of the frame reliability. The proposed MFR (model-based frame reliability) approximates frame SNR values using filterbank energy vectors that are obtained by the inverse transformation of input MFCC (mal-frequency cepstral coefficient) vectors and mean vectors of a reference model. Experiments on various burnt noises revealed that the proposed method could represent the frame reliability effectively. We could improve the recognition performance by using MFR values as weighting factors at the likelihood calculation step.
PDF KSCI

Speech Spectrum Enhancement Combined with Frequency-weighted Spectrum Shaping Filter and Wiener Filter (주파수가중 스펙트럼성형필터와 위너필터를 결합한 음성 스펙트럼 강조)

Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.20 no.10
- /
- pp.1867-1872
- /
- 2016
In the area of digital signal processing, it is necessary to improve the quality of the speech signal after removing the background noise which exists in a various real environments. The important thing to consider when removing the background noise acoustically is that to solve the problem, depending on the information of the human auditory mechanism is mainly the amplitude spectrum of the speech signal. This paper introduces the characteristics of a frequency-weighted spectrum shaping filter for the extraction of the amplitude spectrum of the speech signal with the primary purpose. Therefore, this paper proposes an algorithm using the methods of a Wiener filter and the frequency-weighted spectrum shaping filter according to the acoustic model, after extracted the amplitude spectral information in the noisy speech signal. The spectral distortion (SD) output of the proposed algorithm is experimentally improved more than 5.28 dB compared to a conventional method.
https://doi.org/10.6109/jkiice.2016.20.10.1867 인용 PDF KSCI

The Magnitude Distribution method of U/V decision (음성신호의 전폭분포를 이용한 유/무성음 검출에 대한 연구)

배성근
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1993.06a
- /
- pp.249-252
- /
- 1993
In speech signal processing, The accurate detection of the voiced/unvoiced is important for robust word recognition and analysis. This algorithm is based on the MD in the frame of speech signals that does not require statistical information about either signal or background-noise to decide a voiced/unvoiced. This paper presents a method of estimation the Characteristic of Magnitude Distribution from noisy speech and also of estimation the optimal threshold based on the MD of the voiced/unvoiced decision. The performances of this detectors is evaluated and compared to that obtained from classifying other paper.
PDF

Speech signal processing in the auditory system (청각 계통에서의 음성신호처리)

이재혁;심재성;백승화;박상희
- 제어로봇시스템학회:학술대회논문집
- /
- 1987.10b
- /
- pp.680-683
- /
- 1987
The speech signal processing in the auditory system can be analysized based on two representations : Average discharge rate and Temporal discharge pattern. But the average discharge rate representation is restricted by the narrow dynamic range because of the rate saturation and the two tone suppression phenomena, and the temporal discharge pattern representation needs a sophisticate frequency analysis and synchrony measure. In this paper, a simple representation is proposed : using a model considering the interaction of Cochlear fluid-BM movement and a haircell model, the feature of speech signals (formant frequency and pitch of vowels) is easily estimated in the Average Synchronized Rate.
PDF

A 4800 BPS LPS Vocoder with Improved Exitation (개선된 여기신호의 4800BPS LPC 보코우터)

은종관;성원용
- The Journal of the Acoustical Society of Korea
- /
- v.1 no.1
- /
- pp.54-59
- /
- 1982
We present an improved 4800 bps LPC vocoder system that virtually eleminates the buzzy effect from synthetic speech. Excitation signal in the new system is formed by adding high-pass filtered pitch pulses or random noise to a baseband residual signal that has been coded by pitch predictive PCM. Since the baseband residual is used as a part of excitation, the system is also robust to V/UV and pitch errors. According to our informal listening tests, the synthetic speech of the new system does not have the buzzy effect. As a result the vocoder speech quality is more natural than that of a conventioinal LPC vocoder.
PDF

Speaker and Context Independent Emotion Recognition using Speech Signal (음성을 이용한 화자 및 문장독립 감정인식)

강면구;김원구
- Proceedings of the IEEK Conference
- /
- 2002.06d
- /
- pp.377-380
- /
- 2002
In this paper, speaker and context independent emotion recognition using speech signal is studied. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy and to evaluate the performance of the conventional pattern matching algorithms. The vector quantization based emotion recognition system is proposed for speaker and context independent emotion recognition. Experimental results showed that vector quantization based emotion recognizer using MFCC parameters showed better performance than that using the Pitch and energy Parameters.
PDF

Implementation of Quad Variable Rates ADPCM Speech CODEC on C6000 DSP considering the Environmental Noise (배경잡음을 고려한 4배 가변 압축률을 갖는 ADPCM의 C6000 DSP 실시간 구현)

Kim Dae-Sung;Han Kyong-ho
- Proceedings of the KIPE Conference
- /
- 2002.07a
- /
- pp.727-729
- /
- 2002
In this paper, we proposed quad variable rates ADPCM coding method and its implementation on C6000 DSP, which is modified from the standard ADPCM of ITU G.726 for speech quality improvement considering the environmental noise Four coding rates, 16Kbps, 24Kbps, 32Kbps and 40Kbps are used for speech window samples and the rate decision threshold is decided by the environmental noise level. The object of the proposed method is to reduce the coding rate while retaining the speech quality and the speech quality is considerably close to 40Kbps single rate coder with the coding rate close to 16Kbps single rate coder under the environmental noise. The environmental noise level affects the coding rate and the noise level is calculated per every speech window samples. At high noise level, more samples are coded at higher rates to enhance the quality, but at low noise level, only the big speech signals are coded at higher rates and more speech samples are coded at lower coding rates to reduce the coding rates. The influence of the noise on tile speech signal is considerably high for small signals and the small signal has the higher ZCR (zero crossing rate). The method is simulated in PC and to be implemented on C6000 floating point DSP board in real time operations.
PDF

A Study on Objective Speech Quality Measure under CDMA Telephone Networks Environment (CDMA 통신망에서의 객관적 음질 평가 척도에 관한 연구)

김광수;김민정;석수영;정호열;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.2 no.4
- /
- pp.53-58
- /
- 2001
In this paper to develop objective speech quality measure for CDMA telephone network environments, recent developed measures are investigated first. But those measures show low performances in CDMA telephone networks. To solve this problem, new objective speech quality measure adopting noise masking threshold is proposed and studied. To acquire better performance, scaled noise masking threshold calculation for speech signals is employed instead of conventional tone signals. To verify effectiveness of proposed method performance comparison experiments are carried out for CDMA telephone network speech databases, for the results proposed methods show improved performances compared to existing meaures.
PDF

Noise Suppression Algorithm using Neural Network based Amplitude and Phase Spectrum (진폭 및 위상스펙트럼이 도입된 신경회로망에 의한 잡음억제 알고리즘)

Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.13 no.4
- /
- pp.652-657
- /
- 2009
This paper proposes an adaptive noise suppression system based on human auditory model to enhance speech signal that is degraded by various background noises. The proposed system detects voiced, unvoiced and silence sections for each frame and implements an adaptive auditory process, then reduces the noise speech signal using a neural network including amplitude component and phase component. Based on measuring signal-to-noise ratios, experiments confirm that the proposed system is effective for speech signal that is degraded by various noises.
https://doi.org/10.6109/JKIICE.2009.13.4.652 인용 PDF KSCI

Error Analysis of the Exponential RLS Algorithms Applied to Speech Signal Processing

Yoo, Kyung-Yul
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.3E
- /
- pp.78-85
- /
- 1996
The set of admissible time-variations in the input signal can be separated into two categories : slow parameter changes and large parameter changes which occur infrequently. A common approach used in the tracking of slowly time-varying parameters is the exponential recursive least-squares(RLS) algorithm. There have been a variety of research works on the error analysis of the exponential RLS algorithm for the slowly time-varying parameters. In this paper, the focus has been given to the error analysis of exponential RLS algorithms for the input data with abrupt property changes. The voiced speech signal is chosen as the principal application. In order to analyze the error performance of the exponential RLS algorithm, deterministic properties of the exponential RLS algorithms is first analyzed for the case of abrupt parameter changes, the impulsive input(or error variance) synchronous to the abrupt change of parameter vectors actually enhances the convergence of the exponential RLS algorithm. The analysis has also been verified through simulations on the synthetic speech signal.
PDF

Search Result 1,172, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)