Search | Korea Science

A Study on the Durational Characteristics of Korean Distant-Talking Speech (한국어 원거리 음성의 지속시간 연구)

Kim, Sun-Hee
- MALSORI
- /
- no.54
- /
- pp.1-14
- /
- 2005
This paper presents durational characteristics of Korean distant-talking speech using speech data, which consist of 500 distant-talking utterances and 500 normal utterances of 10 speakers (5 males and 5 females). Each file was segmented and labeled manually and the duration of each segment and each word was extracted. Using a statistical method, the durational change of distant-talking speech in comparison with normal speech was analyzed. The results show that the duration of words with distant-talking speech is increased in comparison with normal style, and that the average unvoiced consonantal duration is reduced while the average vocalic duration is increased. Female speakers show a stronger tendency towards lengthening the duration in distant-talking speech. Finally, this study also shows that the speakers of distant-talking speech could be classified according to their different duration rate.
PDF

An Analysis of Acoustic Features Caused by Articulatory Changes for Korean Distant-Talking Speech

Kim Sunhee;Park Soyoung;Yoo Chang D.
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.2E
- /
- pp.71-76
- /
- 2005
Compared to normal speech, distant-talking speech is characterized by the acoustic effect due to interfering sound and echoes as well as articulatory changes resulting from the speaker's effort to be more intelligible. In this paper, the acoustic features for distant-talking speech due to the articulatory changes will be analyzed and compared with those of the Lombard effect. In order to examine the effect of different distances and articulatory changes, speech recognition experiments were conducted for normal speech as well as distant-talking speech at different distances using HTK. The speech data used in this study consist of 4500 distant-talking utterances and 4500 normal utterances of 90 speakers (56 males and 34 females). Acoustic features selected for the analysis were duration, formants (F1 and F2), fundamental frequency, total energy and energy distribution. The results show that the acoustic-phonetic features for distant-talking speech correspond mostly to those of Lombard speech, in that the main resulting acoustic changes between normal and distant-talking speech are the increase in vowel duration, the shift in first and second formant, the increase in fundamental frequency, the increase in total energy and the shift in energy from low frequency band to middle or high bands.
PDF KSCI

Acoustic Characteristics of Vowels in Korean Distant-Talking Speech (한국어 원거리 음성의 모음의 음향적 특성)

Lee Sook-hyang;Kim Sunhee
- MALSORI
- /
- v.55
- /
- pp.61-76
- /
- 2005
This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.
PDF

MLLR-Based Environment Adaptation for Distant-Talking Speech Recognition (원거리 음성인식을 위한 MLLR적응기법 적용)

Kwon, Suk-Bong;Ji, Mi-Kyong;Kim, Hoi-Rin;Lee, Yong-Ju
- MALSORI
- /
- no.53
- /
- pp.119-127
- /
- 2005
Speech recognition is one of the user interface technologies in commanding and controlling any terminal such as a TV, PC, cellular phone etc. in a ubiquitous environment. In controlling a terminal, the mismatch between training and testing causes rapid performance degradation. That is, the mismatch decreases not only the performance of the recognition system but also the reliability of that. Therefore, the performance degradation due to the mismatch caused by the change of the environment should be necessarily compensated. Whenever the environment changes, environment adaptation is performed using the user's speech and the background noise of the changed environment and the performance is increased by employing the models appropriately transformed to the changed environment. So far, the research on the environment compensation has been done actively. However, the compensation method for the effect of distant-talking speech has not been developed yet. Thus, in this paper we apply MLLR-based environment adaptation to compensate for the effect of distant-talking speech and the performance is improved.
PDF

Prosodic Characteristics of Korean Distance Speech (한국어 원거리 음성의 운율적 특성)

Lee, Sook-hyang;Kim, Sun-Hee;Kim, Jong-Jin
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.87-90
- /
- 2005
The aim of this paper is to investigate the prosodic characteristics of Korean distant speech. 36 2-syllable words of 4 speakers (2 males and 2 females) produced in both distant-talking and normal environments were used. The results showed that ratios of second syllable to first syllable in vowel duration and vowel energy were significantly larger in the distant-talking environment compared to the normal environment and f0 range also bigger in the distant-talking environment. In addition, 'HL%' contour boundary tone in the second syllable and/or 'L +H' contour tone in the first syllable were used in the distant-talking environment.
PDF

Prosodic Characteristics of Korean Distant Speech (한국어 원거리 음성의 운율적 특성)

Kim Sun-Hee;Kim Jong-Jin;Lee Sook-Hyang
- The Journal of the Acoustical Society of Korea
- /
- v.25 no.3
- /
- pp.137-143
- /
- 2006
The aim of this paper is to investigate the prosodic characteristics of Korean distant speech. Four speakers (2 males and 2 females) produced 36 2-syllable words in both distant-talking and normal environments. totaling 288 spoken 2-syllable words. The results showed that ratios of second syllable to first syllable in vowel duration and vowel energy were significantly larger in the distant-talking environment compared to the normal environment and f0 range also bigger in the distant-talking environment. In addition, 'HL%' contour boundary tone in the second syllable and/or 'L+H' contour tone in the first syllable were used in the distant-talking environment.
https://doi.org/10.7776/ASK.2006.25.3.137 인용 PDF KSCI

ARMA Filtering of Speech Features Using Energy Based Weights (에너지 기반 가중치를 이용한 음성 특징의 자동회귀 이동평균 필터링)

Ban, Sung-Min;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.2
- /
- pp.87-92
- /
- 2012
In this paper, a robust feature compensation method to deal with the environmental mismatch is proposed. The proposed method applies energy based weights according to the degree of speech presence to the Mean subtraction, Variance normalization, and ARMA filtering (MVA) processing. The weights are further smoothed by the moving average and maximum filters. The proposed feature compensation algorithm is evaluated on AURORA 2 task and distant talking experiment using the robot platform, and we obtain error rate reduction of 14.4 % and 44.9 % by using the proposed algorithm comparing with MVA processing on AURORA 2 task and distant talking experiment, respectively.
https://doi.org/10.7776/ASK.2012.31.2.087 인용 PDF KSCI

Distant-talking of Speech Interface for Humanoid Robots (휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구)

Lee, Hyub-Woo;Yook, Dong-Suk
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.39-40
- /
- 2007
For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.
PDF

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique

Nguyen, Dinh Cuong;Choi, Suk-Nam;Chung, Hyun-Yeol
- IEMEK Journal of Embedded Systems and Applications
- /
- v.8 no.4
- /
- pp.185-194
- /
- 2013
Abstract- In distant-talking environments, speech recognition performance degrades significantly due to noise and reverberation. Recent work of Michael L. Selzer shows that in microphone array speech recognition, the word error rate can be significantly reduced by adapting the beamformer weights to generate a sequence of features which maximizes the likelihood of the correct hypothesis. In this approach, called Likelihood Maximizing Beamforming algorithm (Limabeam), one of the method to implement this Limabeam is an UnSupervised Limabeam(USL) that can improve recognition performance in any situation of environment. From our investigation for this USL, we could see that because the performance of optimization depends strongly on the transcription output of the first recognition step, the output become unstable and this may lead lower performance. In order to improve recognition performance of USL, some post-filter techniques can be employed to obtain more correct transcription output of the first step. In this work, as a post-filtering technique for first recognition step of USL, we propose to add a Wiener-Filter combined with Feature Weighted Malahanobis Distance to improve recognition performance. We also suggest an alternative way to implement Limabeam algorithm for Hidden Markov Network (HM-Net) speech recognizer for efficient implementation. Speech recognition experiments performed in real distant-talking environment confirm the efficacy of Limabeam algorithm in HM-Net speech recognition system and also confirm the improved performance by the proposed method.
https://doi.org/10.14372/IEMEK.2013.8.4.185 인용 PDF KSCI

Interference Suppression Using Principal Subspace Modification in Multichannel Wiener Filter and Its Application to Speech Recognition

Kim, Gi-Bak
- ETRI Journal
- /
- v.32 no.6
- /
- pp.921-931
- /
- 2010
It has been shown that the principal subspace-based multichannel Wiener filter (MWF) provides better performance than the conventional MWF for suppressing interference in the case of a single target source. It can efficiently estimate the target speech component in the principal subspace which estimates the acoustic transfer function up to a scaling factor. However, as the input signal-to-interference ratio (SIR) becomes lower, larger errors are incurred in the estimation of the acoustic transfer function by the principal subspace method, degrading the performance in interference suppression. In order to alleviate this problem, a principal subspace modification method was proposed in previous work. The principal subspace modification reduces the estimation error of the acoustic transfer function vector at low SIRs. In this work, a frequency-band dependent interpolation technique is further employed for the principal subspace modification. The speech recognition test is also conducted using the Sphinx-4 system and demonstrates the practical usefulness of the proposed method as a front processing for the speech recognizer in a distant-talking and interferer-present environment.
https://doi.org/10.4218/etrij.10.0110.0045 인용 PDF KSCI

Search Result 13, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)