• 제목/요약/키워드: speech dimensions

검색결과 28건 처리시간 0.026초

한국어 모음의 지각적 차원 -지각과 산출간의 연동- (Perceptual Dimensions of Korean Vowel: A Link between Perception and Production)

  • 최양규
    • 음성과학
    • /
    • 제8권2호
    • /
    • pp.181-191
    • /
    • 2001
  • The acoustic quality of a vowel is known to be mostly determined by the frequencies of the first formant(Fl) and the second formant(F2). The perceptual(or psychological) dimensions of vowel perception were examined in this study. Also the relationships among perceptual dimensions, acoustical dimensions(Fl & F2), and articulatory gestures of vowel were discussed. Using multi-dimensional scaling(MDS) technique, the experiment was performed in order to identify the perceptual dimensions of the perception of Korean vowel. In the experiment 8 Seoul standard speakers performed the similarity rating task of 10 synthesized Korean vowels. Two-dimensional MDS solution based. on the similarity rating scores was obtained. The results showed that two perceptual dimensions, D1 and D2 were correlated strongly with F2 and F1(r = -.895 and .878 respectively), and were so interpreted as 'vowel advancement' and 'vowel height' respectively. The relationship between the perceptual dimensions of vowel and the articulatory positions of tongue suggested that perception may be directly linked to production. Further research problems were discussed in the .final section.

  • PDF

화자인식을 위한 퍼지-상관차원과 퍼지-리아프노프차원의 평가 (The Evaluation of the Fuzzy-Chaos Dimension and the Fuzzy-Lyapunov Ddimension)

  • 유병욱;박현숙;김창석
    • 음성과학
    • /
    • 제7권3호
    • /
    • pp.167-183
    • /
    • 2000
  • In this paper, we propose two kinds of chaos dimensions, the fuzzy correlation and fuzzy Lyapunov dimensions, for speaker recognition. The proposal is based on the point that chaos enables us to analyze the non-linear information contained in individual's speech signal and to obtain superior discrimination capability. We confirm that the proposed fuzzy chaos dimensions play an important role in enhancing speaker recognition ratio, by absorbing the variations of the reference and test pattern attractors. In order to evaluate the proposed fuzzy chaos dimensions, we suggest speaker recognition using the proposed dimensions. In other words, we investigate the validity of the speaker recognition parameters, by estimating the recognition error according to the discrimination error of an individual speaker from the reference pattern.

  • PDF

음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류 (Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features)

  • 여은정;김선희;정민화
    • 말소리와 음성과학
    • /
    • 제13권2호
    • /
    • pp.57-66
    • /
    • 2021
  • 본 논문은 말 명료도 기준의 마비말장애 중증도 자동 분류 문제에 초점을 둔다. 말 명료도는 호흡, 발성, 공명, 조음, 운율 등 다양한 말 기능 특징의 영향을 받는다. 그러나 대부분의 선행연구는 한 개의 말 기능 특징만을 중증도 자동분류에 사용하였다. 본 논문에서는 음성의 장애 특성을 효과적으로 포착하기 위해 마비말장애 중증도 자동 분류에서 음질, 운율, 발음의 다양한 말 기능 특징을 반영하고자 하였다. 음질은 jitter, shimmer, HNR, voice breaks 개수, voice breaks 정도로 구성된다. 운율은 발화 속도(전체 길이, 말 길이, 말 속도, 조음 속도), 음높이(F0 평균, 표준편차, 최솟값, 최댓값, 중간값, 25 사분위값, 75 사분위값), 그리고 리듬(% V, deltas, Varcos, rPVIs, nPVIs)을 포함한다. 발음에는 음소 정확도(자음 정확도, 모음 정확도, 전체 음소 정확도)와 모음 왜곡도[VSA(vowel space area), FCR (formant centralized ratio), VAI(vowel articulatory index), F2 비율]가 있다. 본 논문에서는 다양한 특징 조합을 사용하여 중증도 자동 분류를 시행하였다. 실험 결과, 음질, 운율, 발음 특징 세 가지 말 기능 특징 모두를 분류에 사용했을 때 F1-score 80.15%로 가장 높은 성능이 나타났다. 이는 마비말장애 중증도 자동 분류에는 음질, 운율, 발음 특징이 모두 함께 고려되어야 함을 시사한다.

Information Dimensions of Speech Phonemes

  • Lee, Chang-Young
    • 음성과학
    • /
    • 제3권
    • /
    • pp.148-155
    • /
    • 1998
  • As an application of dimensional analysis in the theory of chaos and fractals, we studied and estimated the information dimension for various phonemes. By constructing phase-space vectors from the time-series speech signals, we calculated the natural measure and the Shannon's information from the trajectories. The information dimension was finally obtained as the slope of the plot of the information versus space division order. The information dimension showed that it is so sensitive to the waveform and time delay. By averaging over frames for various phonemes, we found the information dimension ranges from 1.2 to 1.4.

  • PDF

파킨슨증으로 인한 마비말장애에 대한 청지각적 평가척도 (An Auditory-perceptual Rating Scale of Dysarthric Speech of Patients with Parkinsonism)

  • 김향희;이미숙;김선우;최성희;이원용
    • 음성과학
    • /
    • 제11권2호
    • /
    • pp.39-49
    • /
    • 2004
  • An auditory-perceptual rating scale has long been utilized in an evaluation procedure of Parkinsonian speech. This study attempted to investigate various variables and appropriate equal-interval rating scale for each variable. We collected speech samples from 27 patients with Parkinsonian speech disorders. A total of 25 variables and descriptions for each variable across phonatory, resonatory, and articulatory dimensions were included in the rating scale. The descriptive parts of each variable could increase the objectivity of the rating scale.

  • PDF

SPATIAL EXPLANATIONS OF SPEECH PERCEPTION: A STUDY OF FRICATIVES

  • Choo, Won;Mark Huckvale
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.399-403
    • /
    • 1996
  • This paper addresses issues of perceptual constancy in speech perception through the use of a spatial metaphor for speech sound identity as opposed to a more conventional characterisation with multiple interacting acoustic cues. This spatial representation leads to a correlation between phonetic, acoustic and auditory analyses of speech sounds which can serve as the basis for a model of speech perception based on the general auditory characteristics of sounds. The correlations between the phonetic, perceptual and auditory spaces of the set of English voiceless fricatives /f $\theta$ s $\int$ h / are investigated. The results show that the perception of fricative segments may be explained in terms of 2-dimensional auditory space in which each segment occupies a region. The dimensions of the space were found to be the frequency of the main spectral peak and the 'peakiness' of spectra. These results support the view that perception of a segment is based on its occupancy of a multi-dimensional parameter space. In this way, final perceptual decisions on segments can be postponed until higher level constraints can also be met.

  • PDF

음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법 (Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection)

  • 박진영;이광석;허강인
    • 융합신호처리학회논문지
    • /
    • 제7권3호
    • /
    • pp.116-121
    • /
    • 2006
  • 본 논문에서는 다양한 잡음환경에서의 실시간 적응화 기법을 적용하기 위한 선결 과제로 다차원 음성 특정 벡터를 저차원으로 축소하는 방법을 제안한다. 제안된 방법은 특징 벡터를 확률 우도 값으로 매핑시켜 비선형적으로 축소하는 방법으로 음성 / 비음성의 분류는 우도비 검증 (Likelihood Ratio Test; LRT) 을 이용하여 분류하였다. 실험 결과 고차원 특징 벡터를 이용하여 분류한 결과와 대등하게 분류됨을 확인할 수 있었다. 그리고, 제안된 방법에 의해 검출된 음성 데이터를 이용한 음성인식 실험에서도 10차 MFCC(Mel-Frequency Cepstral Coefficient)를 사용하여 분류한 경우와 대등한 인식률을 보여주었다.

  • PDF

C-to-V coarticulation in horizontal and vertical dimensions and its implications for phonology

  • Lee, Joo-Kyeong
    • 음성과학
    • /
    • 제7권4호
    • /
    • pp.107-121
    • /
    • 2000
  • In this paper, I investigate the acoustic correlates of a vowel's coarticulatory dynamics manifested in preceding and following consonants along two dimensions of the vocal tract: place of articulation and degree of constriction. Two dimensional coarticulation is not necessarily executed either concomitantly or proportionally, and the modification induced by coarticulation with a vowel in CVC structures is merely restricted to the CV portion; that is, the prevocalic consonant is modified solely in its constriction location. This is consistent with the observation that C-to-V place assimilation does not accompany consonant lenition in phonology, which suggests that phonetic nature is effectively reflected in phonological patterns.

  • PDF

화자적응시스템을 위한 MLLR 알고리즘 연산량 감소 (Reduction of Dimension of HMM parameters in MLLR Framework for Speaker Adaptation)

  • 김지운;정재호
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.123-126
    • /
    • 2003
  • We discuss how to reduce the number of inverse matrix and its dimensions requested in MLLR framework for speaker adaptation. To find a smaller set of variables with less redundancy, we employ PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible. The amount of additional computation when PCA or ICA is applied is as small as it can be disregarded. The dimension of HMM parameters is reduced to about 1/3 ~ 2/7 dimensions of SI(speaker independent) model parameter with which speech recognition system represents word recognition rate as much as ordinary MLLR framework. If dimension of SI model parameter is n, the amount of computation of inverse matrix in MLLR is proportioned to O($n^4$). So, compared with ordinary MLLR, the amount of total computation requested in speaker adaptation is reduced to about 1/80~1/150.

  • PDF

Relationship between executive function and cue weighting in Korean stop perception across different dialects and ages

  • Kong, Eun Jong;Lee, Hyunjung
    • 말소리와 음성과학
    • /
    • 제13권3호
    • /
    • pp.21-29
    • /
    • 2021
  • The present study investigated how one's cognitive resources are related to speech perception by examining Korean speakers' executive function (EF) capacity and its association with voice onset time (VOT) and f0 sensitivity in identifying Korean stop laryngeal categories (/t'/ vs. /t/ vs. /th/). Previously, Kong et al. (under revision) reported that Korean listeners (N = 154) in Seoul and Changwon (Gyeongsang) showed differential group patterns in dialect-specific cue weightings across educational institutions (college, high school, and elementary school). We follow up this study by further relating their EF control (working memory, mental flexibility, and inhibition) to their speech perception patterns to examine whether better cognitive ability would control attention to multiple acoustic dimensions. Partial correlation analyses revealed that better EFs in Korean listeners were associated with greater sensitivity to available acoustic details and with greater suppression of irrelevant acoustic information across subgroups, although only a small set of EF components turned out to be relevant. Unlike Seoul participants, Gyeongsang listeners' f0 use was not correlated with any EF task scores, reflecting dialect-specific cue primacy using f0 as a secondary cue. The findings confirm the link between speech perception and general cognitive ability, providing experimental evidence from Korean listeners.