Search | Korea Science

An Analysis of Acoustic Features Caused by Articulatory Changes for Korean Distant-Talking Speech

Kim Sunhee;Park Soyoung;Yoo Chang D.
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.2E
- /
- pp.71-76
- /
- 2005
Compared to normal speech, distant-talking speech is characterized by the acoustic effect due to interfering sound and echoes as well as articulatory changes resulting from the speaker's effort to be more intelligible. In this paper, the acoustic features for distant-talking speech due to the articulatory changes will be analyzed and compared with those of the Lombard effect. In order to examine the effect of different distances and articulatory changes, speech recognition experiments were conducted for normal speech as well as distant-talking speech at different distances using HTK. The speech data used in this study consist of 4500 distant-talking utterances and 4500 normal utterances of 90 speakers (56 males and 34 females). Acoustic features selected for the analysis were duration, formants (F1 and F2), fundamental frequency, total energy and energy distribution. The results show that the acoustic-phonetic features for distant-talking speech correspond mostly to those of Lombard speech, in that the main resulting acoustic changes between normal and distant-talking speech are the increase in vowel duration, the shift in first and second formant, the increase in fundamental frequency, the increase in total energy and the shift in energy from low frequency band to middle or high bands.
PDF KSCI

Gender difference in speech intelligibility using speech intelligibility tests and acoustic analyses

Kwon, Ho-Beom
- The Journal of Advanced Prosthodontics
- /
- v.2 no.3
- /
- pp.71-76
- /
- 2010
PURPOSE. The purpose of this study was to compare men with women in terms of speech intelligibility, to investigate the validity of objective acoustic parameters related with speech intelligibility, and to try to set up the standard data for the future study in various field in prosthodontics. MATERIALS AND METHODS. Twenty men and women were served as subjects in the present study. After recording of sample sounds, speech intelligibility tests by three speech pathologists and acoustic analyses were performed. Comparison of the speech intelligibility test scores and acoustic parameters such as fundamental frequency, fundamental frequency range, formant frequency, formant ranges, vowel working space area, and vowel dispersion were done between men and women. In addition, the correlations between the speech intelligibility values and acoustic variables were analyzed. RESULTS. Women showed significantly higher speech intelligibility scores than men and there were significant difference between men and women in most of acoustic parameters used in the present study. However, the correlations between the speech intelligibility scores and acoustic parameters were low. CONCLUSION. Speech intelligibility test and acoustic parameters used in the present study were effective in differentiating male voice from female voice and their values might be used in the future studies related patients involved with maxillofacial prosthodontics. However, further studies are needed on the correlation between speech intelligibility tests and objective acoustic parameters.
https://doi.org/10.4047/jap.2010.2.3.71 인용 PDF KSCI

A Research on Time-Dependent Fundamental Frequency Variations after Waking up in the Morning (기상 후 시간에 따른 음도 변화에 대한 연구)

Ahn, Jong-Bok;Nam, Hyun-Wook;Jeong, Ok-Ran
- Speech Sciences
- /
- v.15 no.2
- /
- pp.169-176
- /
- 2008
This study was intended to analyze difference of vocal folds movements between upon wakeup and in several hours later in the morning. The difference of vocal fold movements was compared with fundamental frequency and a range of fundamental frequencies from maximum to minimum. The participants were 30 female adults between 20 and 29 years old. Voice samples were collected from their reading sentence (Jeong, 1993). The first sampling was conducted within 5 minutes after wakeup, while the second on 1 hour after the first sampling. Finally, the third voice sample was collected on 6 hours after the second sampling. The results of this study were as follows: First, fundamental frequency of the participants were by hour significantly time-dependent(F=7.843). Post-hoc multiple comparison (LSD) was conducted to determine when the difference could be observed. The result showed significant differences between upon wakeup and 6 hours later (p< .001) and between 1 hour later and 6 hours later (p< .05). Second, there were a significantly time-dependent ranges of fundamental frequencies of participants by hour (F=3.130). According to the results of the LSD analysis the significant differences in range of fundamental frequencies were found between upon wakeup and 1 hour later and also between wakeup and 6 hours later (p< .05). The results above indicate that vocal fold movements upon wakeup is different from those of several hours later.
PDF

In Search of Models in Speech Communication Research

Hiroya, Fujisaki
- Phonetics and Speech Sciences
- /
- v.1 no.1
- /
- pp.9-22
- /
- 2009
This paper first presents the author's personal view on the importance of modeling in scientific research in general, and then describes two of his works toward modeling certain aspects of human speech communication. The first work is concerned with the physiological and physical mechanisms of controlling the voice fundamental frequency of speech, which is an important parameter for expressing information on tone, accent, and intonation. The second work is concerned with the cognitive processes involved in a discrimination test of speech stimuli, which gives rise to the phenomenon of so-called categorical perception. They are meant to illustrate the power of models based on deep understanding and precise formulation of the functions of the mechanisms/processes that underlie observed phenomena. Finally, it also presents the author's view on some models that are yet to be developed.
PDF

The Role of Prosody in Dialect Synthesis and Authentication

Yoon, Kyu-Chul
- Phonetics and Speech Sciences
- /
- v.1 no.1
- /
- pp.25-31
- /
- 2009
The purpose of this paper is to examine the viability of synthesizing Masan dialect with Seoul dialect and to examine the role of prosody in the authentication of the synthesized Masan dialect. The synthesis was performed by transferring one or more of the prosodic features of the Masan utterance onto the Seoul utterance. The hypothesis is that, given an utterance composed of the phonemes shared by both dialects, as more prosodic features of the Masan utterance are transferred onto the Seoul utterance, the Seoul utterance will be identified as more authentic Masan utterance. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The synthesized Masan utterances were evaluated by thirteen native speakers of Masan dialect. The result showed that the fundamental frequency contour and the segmental durations had main effects on the perceptual shift from Seoul to Masan dialect.
PDF

A Study of Peak Finding Algorithms for the Autocorrelation Function of Speech Signal

So, Shin-Ae;Lee, Kang-Hee;You, Kwang-Bock;Lim, Ha-Young;Park, Ji Su
- Journal of the Korea Society of Computer and Information
- /
- v.21 no.12
- /
- pp.131-137
- /
- 2016
In this paper, the peak finding algorithms corresponding to the Autocorrelation Function (ACF), which are widely exploited for detecting the pitch of voiced signal, are proposed. According to various researchers, it is well known fact that the estimation of fundamental frequency (F0) in speech signal is not only very important task but quite difficult mission. The proposed algorithms, presented in this paper, are implemented by using many characteristics - such as monotonic increasing function - of ACF function. Thus, the proposed algorithms may be able to estimate both reliable and correct the fundamental frequency as long as the autocorrelation function of speech signal is accurate. Since the proposed algorithms may reduce the computational complexity it can be applied to the real-time processing. The speech data, is composed of Korean emotion expressed words, is used for evaluation of their performance. The pitches are measured to compare the performance of proposed algorithms.
https://doi.org/10.9708/jksci.2016.21.12.131 인용 PDF KSCI

Vocal acoustic characteristics of speakers with depression (우울증 화자 음성의 음향음성학적 특성)

Baek, Yeon-Sook;Kim, Se-Joo;Kim, Eun-Yeon;Choi, Yae-Lin
- Phonetics and Speech Sciences
- /
- v.4 no.1
- /
- pp.91-98
- /
- 2012
The purposes of this paper is to study the characteristics of compared to the speakers voice without depression and speakers with depression, and to propose a objective method for the measurement of the therapeutic effects as well as for diagnostics of depression based on the characteristics. The voice samples obtained from 11 female speakers with depression, aged from 20 to 40, diagnosed as having major depressive disorder by an psychiatrist were compared with those from 12 normal controls with matched sex, age, height, weight, education, smoking, and drinking. The voice samples are taken by a portable digital recorder(TASCAM DR-07, Japan) and analysed using the MDVP(Multi-Dimentional Voice Program) software module from CSL(Computerized Speech Lab, kay elemetrics, co, model 4100). The result of the investigation are as following. First, the average speaking fundamental frequency and loudness range of the speakers with depression group was statistically significantly lower than that of the control group. The pitch range of the control group was rather higher than that of the speakers with depression group, but without statistical significance. Overall speech rates have no statistical difference between two groups. Second, the average speaking fundamental frequency and loudness range have statistically significant negative correlation with Beck Depression Inventory, i. e. more severe depression exhibits lower average speaking fundamental frequency and loudness range. Other vocal parameters such as pitch range and overall speech rate have no statistically meaningful correlations with Beck Depression Inventory.
https://doi.org/10.13064/KSSS.2012.4.1.091 인용 PDF

Separation of Voiced Sounds and Unvoiced Sounds for Corpus-based Korean Text-To-Speech (한국어 음성합성기의 성능 향상을 위한 합성 단위의 유무성음 분리)

Hong, Mun-Ki;Shin, Ji-Young;Kang, Sun-Mee
- Speech Sciences
- /
- v.10 no.2
- /
- pp.7-25
- /
- 2003
Predicting the right prosodic elements is a key factor in improving the quality of synthesized speech. Prosodic elements include break, pitch, duration and loudness. Pitch, which is realized by Fundamental Frequency (F0), is the most important element relating to the quality of the synthesized speech. However, the previous method for predicting the F0 appears to reveal some problems. If voiced and unvoiced sounds are not correctly classified, it results in wrong prediction of pitch, wrong unit of triphone in synthesizing the voiced and unvoiced sounds, and the sound of click or vibration. This kind of feature is usual in the case of the transformation from the voiced sound to the unvoiced sound or from the unvoiced sound to the voiced sound. Such problem is not resolved by the method of grammar, and it much influences the synthesized sound. Therefore, to steadily acquire the correct value of pitch, in this paper we propose a new model for predicting and classifying the voiced and unvoiced sounds using the CART tool.
PDF

Acoustic Characteristics of Vowels in Korean Distant-Talking Speech (한국어 원거리 음성의 모음의 음향적 특성)

Lee Sook-hyang;Kim Sunhee
- MALSORI
- /
- v.55
- /
- pp.61-76
- /
- 2005
This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.
PDF

On the Frequency Domain Pitch Detection of Noise Corrupted Speech Signals -Minimizing the Effects of the F1 by the Spectral AMDF- (배경잡음하에서 주파수영역 피치검출에 관한 연구 -스펙트럼 AMDF에 의한 제 1포먼트 영향 제거법-)

Bae, Myung-Jin;Park, Chan-Sou;Ann, Sou-Guil
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.4
- /
- pp.12-18
- /
- 1991
Detecting the fundamental frequency(Fo) of the speech signal is a problem in many speech applications. A problem of the pitch detection method in the frequency domain is occurred by the first formant and the background noise. Thus, in this paper, we proposed a pitch detection algorithm in the frequency domain that reduces the effects of the first formant and the background noise by the spectral AMDF function. Several computer simulation results showed that the proposed algorithm was very effective for fundamental frequency detection.
PDF

Search Result 205, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)