• Title/Summary/Keyword: speech dimensions

Search Result 28, Processing Time 0.023 seconds

Perceptual Dimensions of Korean Vowel: A Link between Perception and Production (한국어 모음의 지각적 차원 -지각과 산출간의 연동-)

  • Choi, Yang-Gyu
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.181-191
    • /
    • 2001
  • The acoustic quality of a vowel is known to be mostly determined by the frequencies of the first formant(Fl) and the second formant(F2). The perceptual(or psychological) dimensions of vowel perception were examined in this study. Also the relationships among perceptual dimensions, acoustical dimensions(Fl & F2), and articulatory gestures of vowel were discussed. Using multi-dimensional scaling(MDS) technique, the experiment was performed in order to identify the perceptual dimensions of the perception of Korean vowel. In the experiment 8 Seoul standard speakers performed the similarity rating task of 10 synthesized Korean vowels. Two-dimensional MDS solution based. on the similarity rating scores was obtained. The results showed that two perceptual dimensions, D1 and D2 were correlated strongly with F2 and F1(r = -.895 and .878 respectively), and were so interpreted as 'vowel advancement' and 'vowel height' respectively. The relationship between the perceptual dimensions of vowel and the articulatory positions of tongue suggested that perception may be directly linked to production. Further research problems were discussed in the .final section.

  • PDF

The Evaluation of the Fuzzy-Chaos Dimension and the Fuzzy-Lyapunov Ddimension (화자인식을 위한 퍼지-상관차원과 퍼지-리아프노프차원의 평가)

  • Yoo, Byong-Wook;Park, Hyun-Sook;Kim, Chang-Seok
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.167-183
    • /
    • 2000
  • In this paper, we propose two kinds of chaos dimensions, the fuzzy correlation and fuzzy Lyapunov dimensions, for speaker recognition. The proposal is based on the point that chaos enables us to analyze the non-linear information contained in individual's speech signal and to obtain superior discrimination capability. We confirm that the proposed fuzzy chaos dimensions play an important role in enhancing speaker recognition ratio, by absorbing the variations of the reference and test pattern attractors. In order to evaluate the proposed fuzzy chaos dimensions, we suggest speaker recognition using the proposed dimensions. In other words, we investigate the validity of the speaker recognition parameters, by estimating the recognition error according to the discrimination error of an individual speaker from the reference pattern.

  • PDF

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

Information Dimensions of Speech Phonemes

  • Lee, Chang-Young
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.148-155
    • /
    • 1998
  • As an application of dimensional analysis in the theory of chaos and fractals, we studied and estimated the information dimension for various phonemes. By constructing phase-space vectors from the time-series speech signals, we calculated the natural measure and the Shannon's information from the trajectories. The information dimension was finally obtained as the slope of the plot of the information versus space division order. The information dimension showed that it is so sensitive to the waveform and time delay. By averaging over frames for various phonemes, we found the information dimension ranges from 1.2 to 1.4.

  • PDF

An Auditory-perceptual Rating Scale of Dysarthric Speech of Patients with Parkinsonism (파킨슨증으로 인한 마비말장애에 대한 청지각적 평가척도)

  • Kim, Hyang-Hee;Lee, Mi-Sook;Kim, Sun-Woo;Choi, Sung-Hee;Lee, Won-Yong
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.39-49
    • /
    • 2004
  • An auditory-perceptual rating scale has long been utilized in an evaluation procedure of Parkinsonian speech. This study attempted to investigate various variables and appropriate equal-interval rating scale for each variable. We collected speech samples from 27 patients with Parkinsonian speech disorders. A total of 25 variables and descriptions for each variable across phonatory, resonatory, and articulatory dimensions were included in the rating scale. The descriptive parts of each variable could increase the objectivity of the rating scale.

  • PDF

SPATIAL EXPLANATIONS OF SPEECH PERCEPTION: A STUDY OF FRICATIVES

  • Choo, Won;Mark Huckvale
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.399-403
    • /
    • 1996
  • This paper addresses issues of perceptual constancy in speech perception through the use of a spatial metaphor for speech sound identity as opposed to a more conventional characterisation with multiple interacting acoustic cues. This spatial representation leads to a correlation between phonetic, acoustic and auditory analyses of speech sounds which can serve as the basis for a model of speech perception based on the general auditory characteristics of sounds. The correlations between the phonetic, perceptual and auditory spaces of the set of English voiceless fricatives /f $\theta$ s $\int$ h / are investigated. The results show that the perception of fricative segments may be explained in terms of 2-dimensional auditory space in which each segment occupies a region. The dimensions of the space were found to be the frequency of the main spectral peak and the 'peakiness' of spectra. These results support the view that perception of a segment is based on its occupancy of a multi-dimensional parameter space. In this way, final perceptual decisions on segments can be postponed until higher level constraints can also be met.

  • PDF

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

  • Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.7 no.3
    • /
    • pp.116-121
    • /
    • 2006
  • In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.

  • PDF

C-to-V coarticulation in horizontal and vertical dimensions and its implications for phonology

  • Lee, Joo-Kyeong
    • Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.107-121
    • /
    • 2000
  • In this paper, I investigate the acoustic correlates of a vowel's coarticulatory dynamics manifested in preceding and following consonants along two dimensions of the vocal tract: place of articulation and degree of constriction. Two dimensional coarticulation is not necessarily executed either concomitantly or proportionally, and the modification induced by coarticulation with a vowel in CVC structures is merely restricted to the CV portion; that is, the prevocalic consonant is modified solely in its constriction location. This is consistent with the observation that C-to-V place assimilation does not accompany consonant lenition in phonology, which suggests that phonetic nature is effectively reflected in phonological patterns.

  • PDF

Reduction of Dimension of HMM parameters in MLLR Framework for Speaker Adaptation (화자적응시스템을 위한 MLLR 알고리즘 연산량 감소)

  • Kim Ji Un;Jeong Jae Ho
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.123-126
    • /
    • 2003
  • We discuss how to reduce the number of inverse matrix and its dimensions requested in MLLR framework for speaker adaptation. To find a smaller set of variables with less redundancy, we employ PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible. The amount of additional computation when PCA or ICA is applied is as small as it can be disregarded. The dimension of HMM parameters is reduced to about 1/3 ~ 2/7 dimensions of SI(speaker independent) model parameter with which speech recognition system represents word recognition rate as much as ordinary MLLR framework. If dimension of SI model parameter is n, the amount of computation of inverse matrix in MLLR is proportioned to O($n^4$). So, compared with ordinary MLLR, the amount of total computation requested in speaker adaptation is reduced to about 1/80~1/150.

  • PDF

Relationship between executive function and cue weighting in Korean stop perception across different dialects and ages

  • Kong, Eun Jong;Lee, Hyunjung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.21-29
    • /
    • 2021
  • The present study investigated how one's cognitive resources are related to speech perception by examining Korean speakers' executive function (EF) capacity and its association with voice onset time (VOT) and f0 sensitivity in identifying Korean stop laryngeal categories (/t'/ vs. /t/ vs. /th/). Previously, Kong et al. (under revision) reported that Korean listeners (N = 154) in Seoul and Changwon (Gyeongsang) showed differential group patterns in dialect-specific cue weightings across educational institutions (college, high school, and elementary school). We follow up this study by further relating their EF control (working memory, mental flexibility, and inhibition) to their speech perception patterns to examine whether better cognitive ability would control attention to multiple acoustic dimensions. Partial correlation analyses revealed that better EFs in Korean listeners were associated with greater sensitivity to available acoustic details and with greater suppression of irrelevant acoustic information across subgroups, although only a small set of EF components turned out to be relevant. Unlike Seoul participants, Gyeongsang listeners' f0 use was not correlated with any EF task scores, reflecting dialect-specific cue primacy using f0 as a secondary cue. The findings confirm the link between speech perception and general cognitive ability, providing experimental evidence from Korean listeners.