• 제목/요약/키워드: Speech rate

Search Result 1,242, Processing Time 0.027 seconds

A study on the change of prosodic units by speech rate and frequency of turn-taking (발화 속도와 말차례 교체 빈도에 따른 운율 단위 변화에 관한 연구)

  • Won, Yugwon
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.29-38
    • /
    • 2022
  • This study aimed to analyze the speech appearing in the National Institute of Korean Language's Daily Conversation Speech Corpus (2020) and reveal how the speech rate and the frequency of turn-taking affect the change in prosody units. The analysis results showed a positive correlation between intonation phrase, word phrase frequency, and speaking duration as the speech speed increased; however, the correlation was low, and the suitability of the regression model of the speech rate was 3%-11%, which was weak in explanatory power. There was a significant difference in the mean speech rate according to the frequency of the turn-taking, and the speech rate decreased as the frequency of the turn-taking increased. In addition, as the frequency of turn-taking increased, the frequency of intonation phrases, the frequency of word phrases, and the speaking duration decreased; there was a high negative correlation. The suitability of the regression model of the turn-taking frequency was calculated as 27%-32%. The frequency of turn-taking functions as a factor in changing the speech rate and prosodic units. It is presumed that this can be influenced by the disfluency of the dialogue, the characteristics of turn-taking, and the active interaction between the speakers.

A Low Bit Rate Speech Coder Based on the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.300-304
    • /
    • 2015
  • A low bit rate speech coder based on the non-uniform sampling technique is proposed. The non-uniform sampling technique is based on the detection of inflection points (IP). A speech block is processed by the IP detector, and the detected IP pattern is compared with entries of the IP database. The address of the closest member of the database is transmitted with the energy of the speech block. In the receiver, the decoder reconstructs the speech block using the received address and the energy information of the block. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is shown. The SNR performance of the proposed method is approximately 5.27 dB with the data rate of 1.5 kbps.

A Study on the Relation among English Speech Rate, Pitch and Stress by Korean Speakers (한국인 화자의 영어 발화 속도와 피치, 강세 간의 관계 연구)

  • Kim, Ji-Eun
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.101-108
    • /
    • 2014
  • This study investigates the relation among pitch range differences, speech rate and realization of stress. To identify the realization of the stress, vowel formants and durational differences of stressed and unstressed vowels are measured. The Korean learners were asked to read a textbook passage which includes nine sentences. The major results indicate that: (1) Korean speakers' pitch range is less than 50% of the native speakers; (2) There is a significantly negative relation between high-low pitch range and speech rate; (3) The vowel qualities and durations of the stressed and unstressed vowels are related to the speech rate. But these are not related to the high-low pitch range.

Effects of gender, age, and individual speakers on articulation rate in Seoul Korean spontaneous speech

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.19-29
    • /
    • 2018
  • The present study investigated whether there are differences in articulation rate by gender, age, and individual speakers in a spontaneous speech corpus produced by 40 Seoul Korean speakers. This study measured their articulation rates using a second-per-syllable metric and a syllable-per-second metric. The findings are as follows. First, in spontaneous Seoul Korean speech, there was a gender difference in articulation rates only in age group 10-19, among whom men tended to speak faster than women. Second, individual speakers showed variability in their rates of articulation. The tendency for some speakers to speak faster than others was variable. Finally, there were metric differences in articulation rate. That is, regarding the coefficients of variation, the values of the second-per-syllable metric were much higher than those for the syllable-per-second metric. The articulation rate for the syllable-per-second metric tended to be more distinct among individual speakers. The present results imply that data gathered in a corpus of Seoul Korean spontaneous speech may reflect speaker-specific differences in articulatory movements.

Effects of Concurrent Linguistic or Cognitive Tasks on Speech Rate (언어 및 인지 과제 동시수행이 발화속도에 미치는 영향)

  • Han, Ji-Yeon;Kim, Hyo-Jeong;Kim, Moon-Jeong
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.102-105
    • /
    • 2007
  • This study was designed to examination effects of concurrent linguistic or cognitive tasks on speech rate. Eight normal speakers were repeated sentences either with or without simultaneous a linguistic task and a cognitive task. Linguistic task was conducted by generating verbs from nouns and cognitive task meaned performing mental arithmetic. Speech rate was measured from acoustic data. One-way ANOVA conducted to know speech rate difference among 3 different type of tasks. The results showed there was no significant difference between sentence repeat and linguistic tasks. But There was significant difference findings: sentence repeat and linguistic task, linguistic and cognitive task.

  • PDF

Comparison of Speech Rate and Long-Term Average Speech Spectrum between Korean Clear Speech and Conversational Speech

  • Yoo, Jeeun;Oh, Hongyeop;Jeong, Seungyeop;Jin, In-Ki
    • Korean Journal of Audiology
    • /
    • v.23 no.4
    • /
    • pp.187-192
    • /
    • 2019
  • Background and Objectives: Clear speech is an effective communication strategy used in difficult listening situations that draws on techniques such as accurate articulation, a slow speech rate, and the inclusion of pauses. Although too slow speech and improperly amplified spectral information can deteriorate overall speech intelligibility, certain amplitude of increments of the mid-frequency bands (1 to 3 dB) and around 50% slower speech rates of clear speech, when compared to those in conversational speech, were reported as factors that can improve speech intelligibility positively. The purpose of this study was to identify whether amplitude increments of mid-frequency areas and slower speech rates were evident in Korean clear speech as they were in English clear speech. Subjects and Methods: To compare the acoustic characteristics of the two methods of speech production, the voices of 60 participants were recorded during conversational speech and then again during clear speech using a standardized sentence material. Results: The speech rate and longterm average speech spectrum (LTASS) were analyzed and compared. Speech rates for clear speech were slower than those for conversational speech. Increased amplitudes in the mid-frequency bands were evident for the LTASS of clear speech. Conclusions:The observed differences in the acoustic characteristics between the two types of speech production suggest that Korean clear speech can be an effective communication strategy to improve speech intelligibility.

Adaptive Multi-Rate(AMR) Speech Coding Algorithm (Adaptive Multi-Rate(AMR) 음성부호화 알고리즘)

  • 서정욱;배건성
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.92-97
    • /
    • 2000
  • An AMR(Adaptive Multi-Rate) speech coding algorithm has been adopted as a standard speech codec for IMT-2000. It is based on the algebraic CELP, and consists of eight speech coding modes having the bit rate from 4.75 kbit/s to 12.2 kbit/s. It also contains the VAD(Voice Activity Detector), SCR (Source Controlled Rate) operation, and error concealment scheme for robustness in a radio channel. The bit rate of AMR is changed on a frame basis depending on the channel condition. In this paper, we introduced AMR speech coding algorithm and performed the real-time implementation using TMS320C6201, i.e., a Texas Instrument's fixed-point DSP. With the ANSI C source code released from ETSI and 3GPP, we convert and optimize the program to make it run in real time using the C compiler and assembly language. It is verified that the decoded result of the implemented speech codec on the DSP is identical with the PC simulation result using ANSI C code for test sequences. Also, actual sound input/output test using microphone and speaker demonstrates its proper real-time operation without distortions or delays.

  • PDF

Digit Recognition using Speech and Image Information (음성과 영상 정보를 이용한 우리말 숫자음 인식)

  • 이종혁;최재원
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.1
    • /
    • pp.83-88
    • /
    • 2002
  • In the majority of case, speech recognition method tried recognition using only speech information In order to highten the recognition rate, we proposed recognition system that recognige digit using speech and image information. Through an experiment, this paper compared the recognition rate performed by existent speech recognition method and speech recognition method that includes image information. When we added the image information to the speech information, the speech recognition rate was increased about 6%. This paper shows that adding image information to speech information is more effective than using only speech information In digit recognition.

Comparison of HMM models and various cepstral coefficients for Korean whispered speech recognition (은닉 마코프 모델과 켑스트럴 계수들에 따른 한국어 속삭임의 인식 비교)

  • Park, Chan-Eung
    • 전자공학회논문지 IE
    • /
    • v.43 no.2
    • /
    • pp.22-29
    • /
    • 2006
  • Recently the use of whispered speech has increased due to mobile phone and the necessity of whispered speech recognition is increasing. So various feature vectors, which are mainly used for speech recognition, are applied to their HMMs, normal speech models, whispered speech models, and integrated models with normal speech and whispered speech so as to find out suitable recognition system for whispered speech. The experimental results of recognition test show that the recognition rate of whispered speech applied to normal speech models is too low to be used in practical applications, but separate whispered speech models recognize whispered speech with the highest rates at least 85%. And also integrated models with normal speech and whispered speech score acceptable recognition rate but more study is needed to increase recognition rate. MFCE and PLCC feature vectors score higher recognition rate when applied to separate whispered speech models, but PLCC is the best when a lied to integrated models with normal speech and whispered speech.

A Half Rate Speech Soder using Trellis Excitation (Trellis excitation을 이용한 half rate 음성부호화기)

  • 강상원;이형수;김영수;정진욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.88-94
    • /
    • 1996
  • In this paper, we present a half rate speech coder using trellis excitation. The coder combines code-excited linear prediction (CELP) system and trellis quantization method using the codebook expansion, and it produces higher speech quality than the typical CELP coder for the same transmission rate. A subjective comparison with 3~8 bit .$\mu$-law PCM indicates that the half rate coder provides speech quality between 5-bit and 6-bit $\mu$-law PCM .

  • PDF