• Title/Summary/Keyword: Speech Perception

Search Result 398, Processing Time 0.022 seconds

Variable Time-Scale Modification of Speech Using Transient Information based on LPC Cepstral Distance (LPC 켑스트럼 거리 기반의 천이구간 정보를 이용한 음성의 가변적인 시간축 변환)

  • Lee, Sung-Joo;Kim, Hee-Dong;Kim, Hyung-Soon
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.167-176
    • /
    • 1998
  • Conventional time-scale modification methods have the problem that as the modification rate gets higher the time-scale modified speech signal becomes less intelligible, because they ignore the effect of articulation rate on speech characteristics. Results of research on speech perception show that the timing information of transient portions of a speech signal plays an important role in discriminating among different speech sounds. Inspired by this fact, we propose a novel scheme for modifying the time-scale of speech. In the proposed scheme, the timing information of the transient portions of speech is preserved, while the steady portions of speech are compressed or expanded somewhat excessively for maintaining overall time-scale change. In order to identify the transient and steady portions of a speech signal, we employ a simple method using LPC cepstral distance between neighboring frames. The result of the subjective preference test indicates that the proposed method produces performance superior to that of the conventional SOLA method, especially for very fast playback case.

  • PDF

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

  • Miran Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.13-20
    • /
    • 2023
  • This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.

Production and perception of Korean word-initial stops from a sound change perspective (음 변화 관점에서 바라본 한국어 어두 폐쇄음의 발화 및 지각)

  • Kim, Jin-Woo
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.39-51
    • /
    • 2021
  • Based on spontaneous speech data collected in 2020, this study examined the production and perception of Korean lenis, aspirated, and fortis stops. Unlike the controlled experiments of previous studies, lenis and aspirated stops of males in their 30s were not distinguished by voice onset time (VOT) in spontaneous speech. Perceptual experiments were conducted on young females, the leaders of language change. F0 was found to serve as the primary cue for the perception of lenis stops, and then VOT distinguished the aspirated and fortis stops. The fact that the sounds were always perceived as lenis stops when F0 was low, irrespective of whether VOT was short or long, showed that F0 plays an absolute role in the perception of lenis stops. However, in some cases the aspirated and lenis stops were distinguished only by VOT, which does not happen in production. In terms of sound change, disagreement between production and perception systems occurs when sound change is in progress. In particular, when production change precedes perception change, it indicates that the sound change is in its latter stages. Young females still maintain the previous system in perception because the distinction of lenis and aspirated stops by VOT was valid in their parents' generation. In other words, VOT is still used for perception to communicate with other groups.

The Study on Asymmetry between Acoustics and Perception of the Temporal Cues of English Plosives (영어파열음 시구간신호의 음향과 지각 비대칭성 연구)

  • Kang Seok-Han
    • MALSORI
    • /
    • v.55
    • /
    • pp.15-31
    • /
    • 2005
  • This study tests the hypothesis that the voiced-voiceless distinction is influenced by the relationship between acoustics and perception. Production and perception tests are conducted with temporal cues in different environments(CV, VCV, VC). The result showed that acoustic cues indicating significant difference between voiceless/voiced plosives do not behave just as do in perception. The result also showed that there existed an asymmetry between acoustics and perception.

  • PDF

The Effects of Speaking Mode on Intelligibility of Dysarthric Speech (뇌성마비 성인의 발화유형에 따른 명료도)

  • Kim, Soo-Jin;Ko, Hyun-Ju
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.171-176
    • /
    • 2009
  • Intelligibility measurement is one criterion for the assessment of the severity of speech disorders especially of dysarthric persons. Rate control, usually rate reduction, is used with many dysarthric speakers to improve their intelligibility. The purpose of this study is to compare how change intelligibility of speech produced by cerebral palsic speakers according to three speaking conditions. Speech samples were collected from 10 adults with cerebral palsy were asked to speak under three speaking conditions-(1) naturally(control), (2) more slowly(rate control), (3) louder and accurately(clear speech). In a perception test, after listening to the speech samples, a group of three judges were to write down whatever they heard. The result showed that total cerebral palsic subjects were divided into two subgroups according to their intelligibility according to three speaking conditions. Some subjects showed that speech intelligibility increased greatly if asked to speak 'louder and more accurately'. and the others showed no difference of intelligibility according to the speaking conditions. This study suggested that it would be useful clinically to find out the best instruction to improve intelligibility suitable for each speaker with cerebral palsy.

  • PDF

Specifics of Speech Development of Children with Cerebral Palsy

  • Zavitrenko, Dolores;Rizhniak, Renat;Snisarenko, Iryna;Pasichnyk, Natalia;Babenko, Tetyana;Berezenko, Natalia
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.157-162
    • /
    • 2022
  • Cerebral palsy is one of the most serious forms of disorders of the psychophysical development of children, which manifests itself in disturbances of motor functions, which are often combined with speech disorders, other complications of the formation of higher mental functions, and often with a decrease in intelligence. The article will discuss the speech disorder in children with cerebral palsy. Emphasis is placed on some important aspects, which should bear in mind, investigating the problem of specifics of speech development of children with cerebral palsy. In particular at the heart of speech disorders in the cerebral palsy is not only damage to certain structures of the brain, but also the later formation or underdevelopment of those parts of the cerebral cortex, which are of major importance in linguistic and mental activity. This is an ontogenetically young region of the cerebral cortex, which is most rapidly developing after birth (premotor, frontal, temmono-temporal). It is important to take into account, that children with cerebral palsy have disturbances of phonemic perception. Often, children do not distinguish between hearing sounds, cannot repeat component rows, allocate sounds in words. At dysarthria, there are violations of pronunciation of vowel and consonant sounds, tempo of speech, modulation of voice, breathing, phonation, as well as asynchronous breathing, alignment and articulation. As a result, we identified the main features and specifics of the speech development of children with cerebral palsy and described the conditions necessary for the full development of language. Language disturbances in children's cerebral palsy depend on the localization and severity of brain damage. Great importance in the mechanism of speech disorders has a pathology that limits the ability of movement and knowledge of the world.

Sensitive Period of Auditory Perception and Linguistic Discrimination

  • Cha, Kyung-Whan;Jo, Hannah
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.59-67
    • /
    • 2014
  • The purpose of this study is to scientifically examine Kuhl's (2011), originally Johnson and Newport's (1989) critical period graph, from a perspective of auditory perception and linguistic discrimination. This study utilizes two types of experiments (auditory perception and linguistic phoneme discrimination) with five different age groups (5 years, 6-8 years, 9-13 years, 15-17 years, and 20-26 years) of Korean English learners. Auditory perception is examined via ultrasonic sounds that are commonly used in the medical field. In addition, each group is measured in terms of their ability to discriminate minimal pairs in Chinese. Since almost all Korean students already have some amount of English exposure, the researchers selected phonemes in Chinese, an unexposed foreign language for all of the subject groups. The results are almost completely in accordance with Kuhl's critical period graph for auditory perception and linguistic discrimination; a sensitive age is found at 8. The results show that the auditory capability of kindergarten children is significantly better than that of other students, measured by their ability to perceive ultrasonic sounds and to distinguish ten minimal pairs in Chinese. This finding strongly implies that human auditory ability is a key factor for the sensitive period of language acquisition.

Sound change of /o/ in modern Seoul Korean: Focused on relations with acoustic characteristics and perception

  • Igeta, Takako;Sonu, Mee;Arai, Takayuki
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.109-119
    • /
    • 2014
  • This article represents a first step in a large study aimed at elucidating the relationship between production and perception involved in sound change of /o/ in (Seoul) Korean. In this paper we present the results of a production study and a perception experiment. For the production study we examined vowel production data of 20 young adult speakers, measuring the first and second formants, then conducted a discriminant analysis based on those values. In terms of their F1-F2 values, the distribution of /o/ and /u/ were close, and even overlapping in some circumstances, which is consistent with the literature. This tendency was more apparent among the female speakers than the males. Moreover, with the females' distributions, /o/ was frequently categorized as /u/, suggesting that the direction of the sound change is indeed increasing from /o/ to /u/. Next, to investigate the effects of this proximity on perception, we used the production data of five randomly selected speakers from the production study as stimuli for a perception experiment in which 21 young adult native speakers of (Seoul) Korean performed a vowel identification task and provided a Goodness rating on a 5-point scale. We found that while rates of correctness were high, when these correctness scores were weighted by the Goodness rating, these "weighted correctness" scores were lower in some cases, indicating a degree of confusion in distinguishing between the two vowels.

Simulation of speech processing and coding strategy for cochlear implants (인공 청각 장치의 음성신호 처리와 자극방법의 시뮬레이션)

  • Kim, Young-Hoon;Park, Kwang-Suk
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1991 no.11
    • /
    • pp.30-33
    • /
    • 1991
  • The object of speech processor for cochlear implants is to deliver speech information to the central nerve system. In this study we have presented the method which simulate speech processing and coding strategy for cochlear implants and simulated two different processing methods to the 12 adults with normal ears. The formant sinusoidal coding was better than the formant pulse coding In the consonant perception test and learning effects.(p < 0.05)

  • PDF

Speech processing strategy and executive function: Korean children's stop perception

  • Kong, Eun Jong;Yoo, Jeewon
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.57-65
    • /
    • 2017
  • The current study explored how Korean-speaking children processed the multiple acoustic cues (VOT and f0) for the stop laryngeal contrast (/t'/, /t/, and /$t^h$/) and examined whether individual perceptual strategies could be related to a general cognitive ability performing executive functions (EF). 15 children (aged from 7 to 8) participated in the speech perception task identifying the three Korean laryngeal stops (3AFC) on listening to the auditory stimuli of C-/a/ with synthetically varying VOT and f0. They completed a series of EF tasks to measure working memory, inhibition, and cognitive shifting ability. The findings showed that children used the two cues in a highly correlated manner. While children utilized VOT consistently for the three laryngeal categories, their use of f0 was either reduced or enhanced depending on the phonetic categories. Importantly, the children's processing strategies of a f0 suppression for a tense-aspirated contrast were meaningfully associated with children's better cognitive abilities such as working memory, inhibition, and attentional shifting. As a preliminary experimental investigation, the current research demonstrated that listeners with inefficient processing strategies were poor at the EF skills, suggesting that cognitive skills might be responsible for developmental variations of processing sub-phonemic information for the linguistic contrast.