Search | Korea Science

Intelligibility Improvement of Low Bit-Rate Speech Coder Using Stochastic Spectral Equalizer (통계적 스펙트럼 이퀄라이저를 이용한 저 비트율 음성부호화기의 명료도 향상)

Lee, Jeong Hun;Yun, Deokgyu;Choi, Seung Ho
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.41 no.10
- /
- pp.1183-1185
- /
- 2016
Low bit-rate speech coder in digital speech communications synthesizes speech using vocal tract model parameters. In this case, the spectra of the synthesized speech can be much distorted since the allocated bits for the parameters are considerably limited, which results in the degradation of speech intelligibility. In this paper, we propose a speech intelligibility improvement method using stochastic spectral equalizer. This method stochastically obtains the weight vector of each speech coder using spectral ratios between original and synthesized speech, then applies this weight vector to synthesized speech. From the experiments of objective speech intelligibility tests, we found that the performance of the proposed method is better than that of the conventional method.
https://doi.org/10.7840/kics.2016.41.10.1183 인용 PDF KSCI

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

Jung, Ho-Young
- ETRI Journal
- /
- v.29 no.3
- /
- pp.300-304
- /
- 2007
This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.
PDF

A Study on a Improvement of the Speech Quality by Spectrum Analysis with Variable Window in CELP Vocoder (가변 윈도우 스펙트럼 분석을 이용한 CELP 부호화기의 음질 향상에 관한 연구)

나덕수;민소연;배명진
- Proceedings of the IEEK Conference
- /
- 2000.06d
- /
- pp.106-109
- /
- 2000
There have been proposed two types of low bit rate vocoder upto now : One is MBE type using the spectrum modeling and another is CELP type using the hybrid coding method. CELP type vocoder has mainly studied between them. Specially, much of intensity is concentrated in CELP vocoder due to the emergence of Internet Phone and PCS in a domestic. In order to improve the speech quality in CELP vocoder, in this paper, we proposed a new spectrum analysis algorithm with variable window, In CELP vocoder, the spectrum of the synthesised speech signal is distorted because the fixed size windows is used for spectrum analysis. So we have measured the spectral leakage and in order to minimize the spectral leakage have adjusted the window size. Applying this method G.723.1 ACELP, we can get SD(Spectral Distortion) reduction 0.084(dB), residual energy reduction 6.3% and MOS(Mean Opinion Score) improvement 0.1.
PDF

A Robust Speaker Identification Using Optimized Confidence and Modified HMM Decoder (최적화된 관측 신뢰도와 변형된 HMM 디코더를 이용한 잡음에 강인한 화자식별 시스템)

Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-Yu
- MALSORI
- /
- no.64
- /
- pp.121-135
- /
- 2007
Speech signal is distorted by channel characteristics or additive noise and then the performances of speaker or speech recognition are severely degraded. To cope with the noise problem, we propose a modified HMM decoder algorithm using SNR-based observation confidence, which was successfully applied for GMM in speaker identification task. The modification is done by weighting observation probabilities with reliability values obtained from SNR. Also, we apply PSO (particle swarm optimization) method to the confidence function for maximizing the speaker identification performance. To evaluate our proposed method, we used the ETRI database for speaker recognition. The experimental results showed that the performance was definitely enhanced with the modified HMM decoder algorithm.
PDF

A Study on a Improvement of the Speech Quality with Variable Window in CELP Vocoder (가변 윈도우를 이용한 CELP 부호화기의 음질 향상에 관한 연구)

Ju, Sang-Gyu
- Proceedings of the KAIS Fall Conference
- /
- 2010.05a
- /
- pp.265-268
- /
- 2010
There have been proposed two types of low bit rate vocoder upto now : One is MBE type using the spectrum modeling and another is CELP type using the hybrid coding method. CELP type vocoder has mainly studied between them. Specially, much of intensity is concentrated in CELP vocoder due to the emergence of Internet Phone and PCS in a domestic. In order to improve the speech quality in CELP vocoder, in this paper, we proposed a new spectrum analysis algorithm with variable window. In CELP vocoder, the spectrum of the synthesised speech signal is distorted because the fixed size windows is used for spectrum analysis. So we have measured the spectral leakage and in order to minimize the spectral leakage have adjusted the window size. Applying this method G.723.1 ACELP, we can get SD(Spectral Distortion) reduction 0.084(dB), residual energy reduction 6.3% and MOS(Mean Opinion Score) improvement 0.1.
PDF

A Single Channel Adaptive Noise Cancellation for Speech Signals (음성신호의 단일입력 적응잡음제거)

Gahng, Hae-Dong;Bae, Keun-Sung
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.3
- /
- pp.16-24
- /
- 1994
A single channel adaptive noise canceling (ANC) technique is presented for removing effects of additive noise on the speech signal. The conventional method obtains a reference signal using the pitch estimated on a frame basis from the input speech. The proposed method, however, gets the reference signal using the delay estimated recursively on a sample by sample basis. To estimate the delay, we derive recursion formula of autocorrelation function and average magnitude difference function. The performance of the proposed method is evaluated for the speech signals distorted by the additive white Gaussian noise. Experimental results with normalized least mean square (NLMS) adaptive algorithm demonstrate that the proposed method improves the perceived speech quality quite well besides the signal-to-noise ratio.
PDF

Speech Recognition with Image Information (영상정보 보완에 의한 음성인식)

이천우;이상원;양근모;박인정
- Proceedings of the IEEK Conference
- /
- 1999.06a
- /
- pp.511-515
- /
- 1999
The main factor decreasing speech recognition rate is the surrounding noise. To lower the noise effect, we generally used the filter bank at preprocessing stage. But, in this paper, we tried to recognize the 10 numeral numbers using 2-D LPC to extract image feature. At first, we obtained the result of speech-only recognition using 13th-order LPC coefficients and then, for distorted speech recognition results of ‘0’, ‘4’, ‘5’, ‘6’ and 9’, we added image parameters such as 12th-order 2-D LPC coefficients. At each frame, we extracted the 2-D LPC coefficients, and simulated recognizer with two parameters such as speech and image. Finally, for the numbers, such as ‘4’and ‘9’, the better results were obtained.
PDF

A Study on Real-time Implementing of Time-Scale Modification (음성 신호 시간축 변환의 실시간 구현에 관한 연구)

Han, Dong-Chul;Lee, Ki-Seung;Cha, Il-Hawan;Youn, Dae-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.2
- /
- pp.50-61
- /
- 1995
A time scale modification method yielding rate-modified speech while conserving the characteristic of speech was implemented in real-time using a goneral purpose digital signal processor. Time scale modification changed pronunciation speed only, producing a time difference between the input signal and the modified signal, making it impossible to implement it in real-time. In this thesis, a system was implemented to remove the time difference between the input and modified signals. Speech signals slowed down or speeded up by a physical time scale modification method, such as adjusting the motor speed of the cassett tape recorder, was used as the input signal. Physical modification that controled only the inter speed of the cassette tape player distorted the pitch period of the original speech. In this study, a real-time system was implemented so that the pitch-distorted speech was reconstructed back to the original by fractional sampling pitch shifting using an FIR filter, and this signal was time scale modified to match the cassette tape recorder motor speed using SOLA time-scale medification. In experiments using speech signals medifiedby the proposed method, results obtained using a 16-bit resolution ADSP2101 processor and using computer simulations employing floating point operations showed about the same average frame signal-to-noise ratio of about 20 dB.
PDF

A Phase-related Feature Extraction Method for Robust Speaker Verification (열악한 환경에 강인한 화자인증을 위한 위상 기반 특징 추출 기법)

Kwon, Chul-Hong
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.3
- /
- pp.613-620
- /
- 2010
Additive noise and channel distortion strongly degrade the performance of speaker verification systems, as it introduces distortion of the features of speech. This distortion causes a mismatch between the training and recognition conditions such that acoustic models trained with clean speech do not model noisy and channel distorted speech accurately. This paper presents a phase-related feature extraction method in order to improve the robustness of the speaker verification systems. The instantaneous frequency is computed from the phase of speech signals and features from the histogram of the instantaneous frequency are obtained. Experimental results show that the proposed technique offers significant improvements over the standard techniques in both clean and adverse testing environments.
https://doi.org/10.6109/jkiice.2010.14.3.613 인용 PDF KSCI

Perceptual Characteristics of Korean Consonants Distorted by the Frequency Band Limitation (주파수 대역 제한에 의한 한국어 자음의 지각 특성 분석)

Kim, YeonWhoa;Choi, DaeLim;Lee, Sook-Hyang;Lee, YongJu
- Phonetics and Speech Sciences
- /
- v.6 no.1
- /
- pp.95-101
- /
- 2014
This paper investigated the effects of frequency band limitation on perceptual characteristics of Korean consonants. Monosyllabic speech (144 syllables of CV type, 56 syllables of VC type, 8 syllables of V type) produced by two announcers were low- and high-pass filtered with cutoff frequencies ranging from 300 to 5000 Hz. Six listeners with normal hearing performed perception test by types of filter and cutoff frequencies. We reported phoneme recognition rates and types of perception error of band-limited Korean consonants to examine how frequency distortion in the process of speech transmission affect listener's perception. The results showed that recognition rates varied with the following factors: position in a syllable, manner of articulation, place of articulation, and phonation types. Consonants in the final position were stronger to the frequency band limitation than those in the initial position. Fricatives and Affricates are stronger than stops. Fortis consonants were less stronger than their lenis or aspirated counterparts. Types of perception error also varied depending on such factors as consonant's place of articulation: In case of bilabial stops, they were perceived as alveolar stops with while in cases of alveolar and velar stops, there were changes in phonation types without any change in the place of articulation.
https://doi.org/10.13064/KSSS.2014.6.1.095 인용 PDF KSCI

Search Result 37, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)