Search | Korea Science

Performance Analysis of A Variable Bit Rate Speech Coder (가변 비트율 음성 부호화기의 성능분석)

Iem, Byeong-Gwan
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.62 no.12
- /
- pp.1750-1754
- /
- 2013
A variable bit rate speech coder is presented. The coder is based on the observation that a speech signal can be viewed as a combination of piecewise linear signals in a short time period. The encoder detects the sample points where the slope of the signal changes, which are called the inflection points in this paper. The coder transmits the location and value for the detected inflection sample, but only the location information for the noninflection samples. In the decoder, the noninflection samples are estimated with interpolation of the received information. Several factors affecting the performance of the coder have been tested through simulation. Simulation results show that the linear interpolation produces 1 ~ 5 dB improvement over the cubic spline interpolation. And the -law companding does not provide any benefit when it is applied before the inflection detection. With low threshold values in the inflection point detection, the coder shows better MOS and more than 16 dB improvement in SNR compared to the continuously variable slope delta modulation (CVSDM).
https://doi.org/10.5370/KIEE.2013.62.12.1750 인용 PDF KSCI KPUBS HTML

Two cases of Combination Therapy of Acupuncture, Herbal medication and Speech Therapy for Aphasic Stroke Patients (중풍 후유증으로 인한 실어증 환자에 한방치료와 언어치료를 병행한 경험2례)

양태규;박정미
- The Journal of Korean Medicine
- /
- v.23 no.4
- /
- pp.196-202
- /
- 2002
Aphasia is frequent in stroke patients and most patients with aphasia exhibit spontaneous progressive improvement in language abilities over time, but few recover completely. Neurological variables, especially initial severity of aphasia and time post-onset, appear to have influence on improvement. Effect of speech therapy and pharmacotherapy has been studied and some drugs, like amphetamine are proved to be benefit for recovery of aphasia following stroke. But there has been few evidence to facilitate recovery from aphasia by acupuncture or herbal medication therapy. So we report two cases of aphasic stroke patients who treated by combination therapy of acupuncture, herbal medication(Cheongsinhaeo-tang) and speech therapy over 6 months and improved in language abilities. Further clinical studies will be needed to explore the effects of acupuncture and herbal medication therapy for aphasia. Researchers should examine the long term effect of these treatment, and whether it is more effective than speech therapy and western pharmacotherapy or not..
PDF

Emotion Recognition based on Multiple Modalities

Kim, Dong-Ju;Lee, Hyeon-Gu;Hong, Kwang-Seok
- Journal of the Institute of Convergence Signal Processing
- /
- v.12 no.4
- /
- pp.228-236
- /
- 2011
Emotion recognition plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between humans and computer. Most of previous work on emotion recognition focused on extracting emotions from face, speech or EEG information separately. Therefore, a novel approach is presented in this paper, including face, speech and EEG, to recognize the human emotion. The individual matching scores obtained from face, speech, and EEG are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. In the experiment results, the proposed approach gives an improvement of more than 18.64% when compared to the most successful unimodal approach, and also provides better performance compared to approaches integrating two modalities each other. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.
PDF KSCI

Improvement of an Automatic Segmentation for TTS Using Voiced/Unvoiced/Silence Information (유/무성/묵음 정보를 이용한 TTS용 자동음소분할기 성능향상)

Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
- MALSORI
- /
- no.58
- /
- pp.67-81
- /
- 2006
For a large corpus of time-aligned data, HMM based approaches are most widely used for automatic segmentation, providing a consistent and accurate phone labeling scheme. There are two methods for training in HMM. Flat starting method has a property that human interference is minimized but it has low accuracy. Bootstrap method has a high accuracy, but it has a defect that manual segmentation is required In this paper, a new algorithm is proposed to minimize manual work and to improve the performance of automatic segmentation. At first phase, voiced, unvoiced and silence classification is performed for each speech data frame. At second phase, the phoneme sequence is aligned dynamically to the voiced/unvoiced/silence sequence according to the acoustic phonetic rules. Finally, using these segmented speech data as a bootstrap, phoneme model parameters based on HMM are trained. For the performance test, hand labeled ETRI speech DB was used. The experiment results showed that our algorithm achieved 10% improvement of segmentation accuracy within 20 ms tolerable error range. Especially for the unvoiced consonants, it showed 30% improvement.
PDF

Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model (혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선)

Choi Mu Yeol;Kim Hyung Soon
- MALSORI
- /
- no.52
- /
- pp.133-144
- /
- 2004
The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.
PDF

SPEECH ENHANCEMENT BY FREQUENCY-WEIGHTED BLOCK LMS ALGORITHM

Cho, D.H.
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1985.10a
- /
- pp.87-94
- /
- 1985
In this paper, enhancement of speech corrupted by additive white or colored noise is stuided. The nuconstrained frequency-domain block least-mean-square (UFBLMS) adaptation algorithm and its frequency-weighted version are newly applied to speech enhancement. For enhancement of speech degraded by white noise, the performance of the UFBLMS algorithm is superior to the spectral subtraction method or Wiener filtering technique by more than 3 dB in segmented frequency-weighted signal-to-noise ratio(FWSNERSEG) when SNR of speech is in the range of 0 to 10 dB. As for enhancement of noisy speech corrupted by colored noise, the UFBLMS algorithm is superior to that of the spectral subtraction method by about 3 to 5 dB in FWSNRSEG. Also, it yields better performance by about 2 dB in FWSNR and FWSNRSEG than that of time-domain least-mean-square (TLMS) adaptive prediction filter(APF). In view of the computational complexity and performance improvement in speech quality and intelligibility, the frequency-weighted UFBLMS algorithm appears to yield the best performance among various algorithms in enhancing noisy speech corrupted by white or colored noise.
PDF

Performance Improvement of Speech/Music Discrimination Based on Cepstral Distance (켑스트럼 거리 기반의 음성/음악 판별 성능 향상)

Park Seul-Han;Choi Mu Yeol;Kim Hyung Soon
- MALSORI
- /
- no.56
- /
- pp.195-206
- /
- 2005
Discrimination between speech and music is important in many multimedia applications. In this paper, focusing on the spectral change characteristics of speech and music, we propose a new method of speech/music discrimination based on cepstral distance. Instead of using cepstral distance between the frames with fixed interval, the minimum of cepstral distances among neighbor frames is employed to increase discriminability between fast changing music and speech. And, to prevent misclassification of speech segments including short pause into music, short pause segments are excluded from computing cepstral distance. The experimental results show that proposed method yields the error rate reduction of$68\%$, in comparison with the conventional approach using cepstral distance.
PDF

A Query-by-Speech Scheme for Photo Albuming (음성 질의 기반 디지털 사진 검색 기법)

Kim Tae-Sung;Suh Young-Joo;Lee Yong-Ju;Kim Hoi-Rin
- MALSORI
- /
- no.57
- /
- pp.99-112
- /
- 2006
In this paper, we introduce two retrieval methods for photos with speech documents. We compare the pattern of speech query with those of speech documents recorded in digital cameras, and measure the similarities, and retrieve photos corresponding to the speech documents which have high similarity scores. As the first approach, a phoneme recognition scheme is used as the pre-processor for the pattern matching, and in the second one, the vector quantization (VQ) and the dynamic time warping (DTW) are applied to match the speech query with the documents in signal domain itself. Experimental results show that the performance of the first approach is highly dependent on that of phoneme recognition while the processing time is short. The second method provides a great improvement of performance. While the processing time is longer than that of the first method due to DTW, but we can reduce it by taking approximated methods.
PDF

Two-Microphone Generalized Sidelobe Canceller with Post-Filter Based Speech Enhancement in Composite Noise

Park, Jinsoo;Kim, Wooil;Han, David K.;Ko, Hanseok
- ETRI Journal
- /
- v.38 no.2
- /
- pp.366-375
- /
- 2016
This paper describes an algorithm to suppress composite noise in a two-microphone speech enhancement system for robust hands-free speech communication. The proposed algorithm has four stages. The first stage estimates the power spectral density of the residual stationary noise, which is based on the detection of nonstationary signal-dominant time-frequency bins (TFBs) at the generalized sidelobe canceller output. Second, speech-dominant TFBs are identified among the previously detected nonstationary signal-dominant TFBs, and power spectral densities of speech and residual nonstationary noise are estimated. In the final stage, the bin-wise output signal-to-noise ratio is obtained with these power estimates and a Wiener post-filter is constructed to attenuate the residual noise. Compared to the conventional beamforming and post-filter algorithms, the proposed speech enhancement algorithm shows significant performance improvement in terms of perceptual evaluation of speech quality.
https://doi.org/10.4218/etrij.16.0115.0472 인용 PDF KSCI

CASA-based Front-end Using Two-channel Speech for the Performance Improvement of Speech Recognition in Noisy Environments (잡음환경에서의 음성인식 성능 향상을 위한 이중채널 음성의 CASA 기반 전처리 방법)

Park, Ji-Hun;Yoon, Jae-Sam;Kim, Hong-Kook
- Proceedings of the IEEK Conference
- /
- 2007.07a
- /
- pp.289-290
- /
- 2007
In order to improve the performance of a speech recognition system in the presence of noise, we propose a noise robust front-end using two-channel speech signals by separating speech from noise based on the computational auditory scene analysis (CASA). The main cues for the separation are interaural time difference (ITD) and interaural level difference (ILD) between two-channel signal. As a result, we can extract 39 cepstral coefficients are extracted from separated speech components. It is shown from speech recognition experiments that proposed front-end has outperforms the ETSI front-end with single-channel speech.
PDF

Search Result 609, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)