Search | Korea Science

Endpoint Detection of Speech Signal Using Lyapunov Exponent (리아프노프 지수를 이용한 음성신호 종점 탐색 방법)

Zang, Xian;Kim, Jeong-Yeon;Chong, Kil-To
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.46 no.1
- /
- pp.28-33
- /
- 2009
In the research of speech recognition, locating the beginning and end of a speech utterance in a background of noise is of great importance. The conventional methods for speech endpoint detection are based on two simple time-domain measurements-short-time energy, and short-time zero-crossing rate, which couldn't guarantee the precise results if in the low signal-to-noise ratio environments. This paper proposes a novel approach that finds the Lyapunov exponent of time-domain waveform. This proposed method has no use for obtaining the frequency-domain parameters for endpoint detection process, e.g. Mel-Scale Features, which have been introduced in other paper. Accordingly, this algorithm is low complexity and suitable for Digital Isolated Word Recognition System.
PDF KSCI

The Error Pattern Analysis of the HMM-Based Automatic Phoneme Segmentation (HMM기반 자동음소분할기의 음소분할 오류 유형 분석)

Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.25 no.5
- /
- pp.213-221
- /
- 2006
Phone segmentation of speech waveform is especially important for concatenative text to speech synthesis which uses segmented corpora for the construction of synthetic units. because the quality of synthesized speech depends critically on the accuracy of the segmentation. In the beginning. the phone segmentation was manually performed. but it brings the huge effort and the large time delay. HMM-based approaches adopted from automatic speech recognition are most widely used for automatic segmentation in speech synthesis, providing a consistent and accurate phone labeling scheme. Even the HMM-based approach has been successful, it may locate a phone boundary at a different position than expected. In this paper. we categorized adjacent phoneme pairs and analyzed the mismatches between hand-labeled transcriptions and HMM-based labels. Then we described the dominant error patterns that must be improved for the speech synthesis. For the experiment. hand labeled standard Korean speech DB from ETRI was used as a reference DB. Time difference larger than 20ms between hand-labeled phoneme boundary and auto-aligned boundary is treated as an automatic segmentation error. Our experimental results from female speaker revealed that plosive-vowel, affricate-vowel and vowel-liquid pairs showed high accuracies, 99%, 99.5% and 99% respectively. But stop-nasal, stop-liquid and nasal-liquid pairs showed very low accuracies, 45%, 50% and 55%. And these from male speaker revealed similar tendency.
https://doi.org/10.7776/ASK.2006.25.5.213 인용 PDF KSCI

On Realizing the Voice Response and Recoding System for a Home Visitor - A Predictor for the waveform Coding of Speech Signals by using the Dual First-Order Difference Values- (음성응답과 기록을 통한 가정 방문객 관리 시스템의 구현 -쌍 1차 차분을 통한 음성 파형부호화용 예측기-)

Bae, Myung-Jin;Lee, Mi-Suk;Lim, Un-Chun
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.1
- /
- pp.60-66
- /
- 1992
We can see the fact in the autocorrelation of the speech samples that the autocorrelation of adjacent past and next sample is larger than the autocorrelation of several order time delayed samples. It is more effective to use the adjacent past and next sample for prediction of present sample than only use the several order time delayed past. Thus, in this paper, we proposed a new predictor for the wave form coding that predict the present sample by using the one past and next samples. The proposed predictor has higher prediction gain up to 9dB than that of the CCITT-ADPCM.
PDF

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification (WSOLA 기반의 음성 시간축 변환을 위한 고속의 정규상호상관도 계산)

Lim, Sangjun;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.7
- /
- pp.427-434
- /
- 2012
The overlap-add technique based on waveform similarity (WSOLA) method is known to be an efficient high-quality algorithm for time scaling of speech signal. The computational load of WSOLA is concentrated on the repeated normalized cross-correlation (NCC) calculation to evaluate the similarity between two signal waveforms. To reduce the computational complexity of WSOLA, this paper proposes a fast NCC computation method, in which NCC is obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations in the adjacent regions. While the denominator part of NCC has much redundancy irrespective of the time-scale factor, the numerator part of NCC has less redundancy and the amount of redundancy is dependent on both the time-scale factor and optimal shift value, thereby requiring more sophisticated algorithm for fast computation. The simulation results show that the proposed method reduces about 40%, 47% and 52% of the WSOLA execution time for the time-scale compression, 2 and 3 times time-scale expansions, respectively, while maintaining exactly the same speech quality of the conventional WSOLA.
https://doi.org/10.7776/ASK.2012.31.7.427 인용 PDF KSCI

Low Rate Speech Coding Using the Harmonic Coding Combined with CELP Coding (하모닉 코딩과 CELP방법을 이용한 저 전송률 음성 부호화 방법)

김종학;이인성
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.3
- /
- pp.26-34
- /
- 2000
In this paper, we propose a 4kbps speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding in the voiced frame and uses the vector excitation coding with the structure of analysis-by-synthesis in the unvoiced frame, respectively. But two mode coding method is not effective for transition frame mixed in voiced and unvoiced signal and a new method beyond using unvoiced/voiced mode coding is needed. Thus, we designed a time-separated transition coding method for transition frame in which a voiced/unvoiced decision algorithm separates unvoiced and voiced duration in a frame, and harmonic-harmonic excitation coding and vector-harmonic excitation coding method is selectively used depending on the previous frame U/V decision. In the decoder, the voiced excitation signals are generated efficiently through the inverse FFT of harmonic magnitudes and the unvoiced excitation signals are made by the inverse vector quantization. The reconstructed speech signal are synthesized by the Overlap/Add method.
PDF

Correlation of acoustic features and electrophysiological outcomes of stimuli at the level of auditory brainstem (자극음의 음향적 특성과 청각 뇌간에서의 전기생리학적 반응의 상관성)

Chun, Hyungi;Han, Woojae
- The Journal of the Acoustical Society of Korea
- /
- v.35 no.1
- /
- pp.63-73
- /
- 2016
It is widely acknowledged that the human auditory system is organized tonotopically and people generally listen to sounds as a function of frequency distribution through the auditory system. However, it is still unclear how acoustic features of speech sounds are indicated to the human brain in terms of speech perception. Thus, the purpose of this study is to investigate whether two sounds with similar high-frequency characteristics in the acoustic analysis show similar results at the level of auditory brainstem. Thirty three young adults with normal hearing participated in the study. As stimuli, two Korean monosyllables (i.e., /ja/ and /cha/) and four frequencies of toneburst (i.e., 500, 1000, 2000, and 4000 Hz) were used to elicit the auditory brainstem response (ABR). Measures of monosyllable and toneburst were highly replicable and the wave V of waveform was detectable in all subjects. In the results of Pearson correlation analysis, the /ja/ syllable had a high correlation with 4000 Hz of toneburst which means that its acoustic characteristics (i.e., 3671~5384 Hz) showed the same results in the brainstem. However, the /cha/ syllable had a high correlation with 1000 and 2000 Hz of toneburst although it has acoustical distribution of 3362~5412 Hz. We concluded that there was disagreement between acoustic features and physiology outcomes at the auditory brainstem level. This finding suggests that an acoustical-perceptual mapping study is needed to scrutinize human speech perception.
https://doi.org/10.7776/ASK.2016.35.1.063 인용 PDF KSCI

Gender Analysis in Elderly Speech Signal Processing (노인음성신호처리에서의 젠더 분석)

Lee, JiYeoun
- Journal of Digital Convergence
- /
- v.16 no.10
- /
- pp.351-356
- /
- 2018
Changes in vocal cords due to aging can change the frequency of speech, and the speech signals of the elderly can be automatically distinguished from normal speech signals through various analyzes. The purpose of this study is to provide a tool that can be easily accessed by the elderly and disabled people who can be excluded from the rapidly changing technological society and to improve the voice recognition performance. In the study, the gender of the subjects was reported as sex analysis, and the number of female and male voice samples was used equally. In addition, the gender analysis was applied to set the voices of the elderly without using voices of all ages. Finally, we applied a review methodology of standards and reference models to reduce gender difference. 10 Korean women and 10 men aged 70 to 80 years old are used in this study. Comparing the F0 value extracted directly with the waveform and the F0 extracted with TF32 and the Wavesufer speech analysis program, Wavesufer analyzed the F0 of the elderly voice better than TF32. However, there is a need for a voice analysis program for elderly people. In conclusions, analyzing the voice of the elderly will improve speech recognition and synthesis capabilities of existing smart medical systems.
https://doi.org/10.14400/JDC.2018.16.10.351 인용 PDF KSCI

Improvement of Packet Loss Concealment Algorithm by Utilizing Next Good Frame Info. (손실이후 프레임 정보에 의한 패킷손실은닉 알고리즘 개선)

Kim Jae-Hyun;Hahn Min-Soo
- MALSORI
- /
- no.43
- /
- pp.101-112
- /
- 2002
In real time packetized voice application, missing packets are major source of voice quality degradation. Thus packet loss concealment (PLC) algorithms are needed to guarantee QoS of VoIP. In this paper, we describe packet loss concealment scheme utilizing the next good frame which follows loss packets. When this scheme is combined with other PLC algorithms, such as G.711 pitch waveform replication recommended by ITU-T LP based PLC algorithm, additional voice quality improvement is obtained for consecutive packet loss larger than 60 msec.
PDF

Voice Source Modeling Using Weighted Sum-of-Basis-Functions Model (기저함수의 가중합을 이용한 음원의 모델링)

강상기
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.171-174
- /
- 1998
본 논문에서는 음성합성(speech synthesis) 및 부호화(coding) 시스템에 있어서 음원(voice source) 모델링에 관한 문제를 살펴보고자 한다. 기존의 음원 모델링 시스템이 가지고 있는 여러 문제들을 극복하고자 기저함수(basis function) 의 가중 합(weighted-sum)으로 음원을 모델링 하는 새로운 기법을 제안하고자 한다. 제안한 방법에서는 음원 파형(voice source waveform)을 적절히 표현하기 위해서 필터뱅크(filter bank)에 기초한 기저함수의 가중 합으로 나타낸다. 다양한 음원 특성을 효과적으로 나타내는 음원 파라미터를 구하기 위하여 EM(estimate maximize)에 기초한 구조에 관해 조사한다. 제안한 방법을 이용하여 다양한 유성음에 대해 실험을 수행하였다. 실험결과 제안한 추정(estimation) 방법 및 모델링 방법을 이용하면 기존의 방법에 비해 더 정확한 음원 파형을 추정할 수 있고, 다양한 음원 특성을 나타낼 수 있다. 또한 음성합성 및 부호화에서도 음성품질(voice quality)를 개선시킬 수 있으리라 기대된다.
PDF

On Realizing the Predictor for the Waveform Coding of Speech Signals by using the Dual First Order Autocorrelation (쌍 1차 자기상관관계를 이용한 음성 파형부호화용 예측기의 구현 -쌍 1차 차분값과 시그마-델타 기법을 적용 -)

이미숙;배명진;이주헌
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.1E
- /
- pp.23-29
- /
- 1992
음성파형은 인근 표본값들 사이에 높은 상관관계를 나타낸다. 음성신호의 상관관계를 증가시키 기 위한 한 방법으로는 부호화하기 전에 입력신호를 단순히 적분시키는 방법이다. 이 적분된 rqkt들은 수신기에서 일반 미분기에 의해 제거될 수 있다. 이렇게 하면 음성신호의 저역주파수가 강조되고 인근 표본값의 자기 상관관계가 증가된다. 이런 과정을 시그마-델타 기법이라 한다. 이 논문에서는 그러한 시 그마-델타의 특성을 사용하는 예측기를 새로이 제안한다. 즉, 부호화하기 전에 입력신호를 적분하고 인 근한 과거 및 미래의 두 표본을 사용하여 적분된 현재표본을 예측한다. 제안된 예측기는 CCITT-권고 형 ADPCM의 평균 예측이득보다 8.65db 높게 얻어졌다.
PDF

Search Result 135, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)