• 제목/요약/키워드: speech features

검색결과 652건 처리시간 0.029초

음성 특징 추출을 위한 스트레인지 어트랙터의 분석 방법 (An Analysis Method of Strange Attractor for the Feature Extraction)

  • 김태식
    • 음성과학
    • /
    • 제9권2호
    • /
    • pp.147-155
    • /
    • 2002
  • In the area of speech processing, raw signals used to be presented into 2D format. However, such kind of presentation methods have limitation to extract characteristics from the signal because of the presentation method. Generally, not much information can be detected from the 2D signal. Strange attractor in the field of chaos theory provides a 3D presentation method. In the area of recognition problem, signal presentation method is very important because good features can be detected from a good presentation. This paper discusses a new feature extraction method that extracts features from a cycle of the strange attractor. A neural network is used to check whether the method extracts suitable features or not. The result shows very good points that can be applied to some areas of signal processing.

  • PDF

지적장애 아동의 롬바드 효과에 따른 말산출 특성 (The Lombard effect on the speech of children with intellectual disability)

  • 이현주;이지윤;김유경
    • 말소리와 음성과학
    • /
    • 제8권4호
    • /
    • pp.115-122
    • /
    • 2016
  • This study investigates the acoustic-phonetic features and speech intelligibility of Lombard speech in children with intellectual disability, by examining the effect of Lombard speech at 3 levels of non-noise, 55dB, and 65dB. Eight children with intellectual disability read sentences and played speaking games, and their speech were analyzed in terms of intensity, pitch, vowel space of /a/, /i/, and /u/, VAI(3), articulation rate and speech intelligibility. Results showed, first, that intensity and pitch increased as noise level increased; second, that VAI(3) increased as the noise level increased; third, that articulation rate decreased as noise intensity increased; finally, that speech intelligibility increased as noise intensity increased. The Lombard speech changed the VAI(3), vowel space, articulation rate, speech intelligibility of the children with intellectual disability as well. This study suggests that the Lombard speech will be clinically useful for the persons who have intellectual disability and difficulties in self-control.

치료 받은 말더듬 성인의 느린 구어에서 나타나는 휴지 특성 (Pauses Characteristics in Slowed Speech of Treated Stutterer)

  • 전희숙
    • 음성과학
    • /
    • 제15권4호
    • /
    • pp.189-197
    • /
    • 2008
  • In the process of speech therapy, fluency is acquired and speech rate increases in the process when the behavioral modification strategy, inducing speech fluency by making speech rate slower intentionally in an early stage, is applied. Therefore, the purpose of this study was to investigate the pause characteristics in slowed speech intentionally of treated stutterer. In this study, 10 developmental stutterers who had well established fluency in speech were involved. We had collected each 200 syllables sample of intentionally much slowed speech and a little slowed one in reading task. To measure the features of pause, total frequency of pauses, total durations of pauses, average duration of pauses and proportions of pause were investigated. The findings were as follows: Both the total durations and total frequency of pauses of much slowed speech were higher than that of a little slowed one. However, both the average duration and proportions of pauses of much slowed speech were not significantly higher than that of a little slowed one.

  • PDF

Korean speakers hyperarticulate vowels in polite speech

  • Oh, Eunhae;Winter, Bodo;Idemaru, Kaori
    • 말소리와 음성과학
    • /
    • 제13권3호
    • /
    • pp.15-20
    • /
    • 2021
  • In line with recent attention to the multimodal expression of politeness, the present study examined the association between polite speech and acoustic features through the analysis of vowels produced in casual and polite speech contexts in Korean. Fourteen adult native speakers of Seoul Korean produced the utterances in two social conditions to elicit polite (professor) and casual (friend) speech. Vowel duration and the first (F1) and second formants (F2) of seven sentence- and phrase-initial monophthongs were measured. The results showed that polite speech shares acoustic similarities with vowel production in clear speech: speakers showed greater vowel space expansion in polite than casual speech in an effort to enhance perceptual intelligibility. Especially, female speakers hyperarticulated (front) vowels for polite speech, independent of speech rate. The implications for the acoustic encoding of social stance in polite speech are further discussed.

화자확인에서 특징벡터의 순시 정보와 선형 변환의 효과적인 적용 (Effective Combination of Temporal Information and Linear Transformation of Feature Vector in Speaker Verification)

  • 서창우;조미화;임영환;전성채
    • 말소리와 음성과학
    • /
    • 제1권4호
    • /
    • pp.127-132
    • /
    • 2009
  • The feature vectors which are used in conventional speaker recognition (SR) systems may have many correlations between their neighbors. To improve the performance of the SR, many researchers adopted linear transformation method like principal component analysis (PCA). In general, the linear transformation of the feature vectors is based on concatenated form of the static features and their dynamic features. However, the linear transformation which based on both the static features and their dynamic features is more complex than that based on the static features alone due to the high order of the features. To overcome these problems, we propose an efficient method that applies linear transformation and temporal information of the features to reduce complexity and improve the performance in speaker verification (SV). The proposed method first performs a linear transformation by PCA coefficients. The delta parameters for temporal information are then obtained from the transformed features. The proposed method only requires 1/4 in the size of the covariance matrix compared with adding the static and their dynamic features for PCA coefficients. Also, the delta parameters are extracted from the linearly transformed features after the reduction of dimension in the static features. Compared with the PCA and conventional methods in terms of equal error rate (EER) in SV, the proposed method shows better performance while requiring less storage space and complexity.

  • PDF

Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

  • Park, Taejin;Beack, SeungKwan;Lee, Taejin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제3권5호
    • /
    • pp.259-266
    • /
    • 2014
  • In this paper, we propose a novel technique for noise robust automatic speech recognition (ASR). The development of ASR techniques has made it possible to recognize isolated words with a near perfect word recognition rate. However, in a highly noisy environment, a distinct mismatch between the trained speech and the test data results in a significantly degraded word recognition rate (WRA). Unlike conventional ASR systems employing Mel-frequency cepstral coefficients (MFCCs) and a hidden Markov model (HMM), this study employ histogram of oriented gradient (HOG) features and a Support Vector Machine (SVM) to ASR tasks to overcome this problem. Our proposed ASR system is less vulnerable to external interference noise, and achieves a higher WRA compared to a conventional ASR system equipped with MFCCs and an HMM. The performance of our proposed ASR system was evaluated using a phonetically balanced word (PBW) set mixed with artificially added noise.

A Comparison of Front-Ends for Robust Speech Recognition

  • Kim, Doh-Suk;Jeong, Jae-Hoon;Lee, Soo-Young;Kil, Rhee M.
    • The Journal of the Acoustical Society of Korea
    • /
    • 제17권3E호
    • /
    • pp.3-11
    • /
    • 1998
  • Zero-crossings with Peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features form speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating conventional speech processing techniques into the model output. Spectral and cepstral representations of the ZCPA model output are compared, and the incorporation of dynamic features with several different lengths of time-derivative window are evaluated. Also, comparative evaluations with other front-ends in real-world noisy environments are performed, and result in the superiority of the ZCPA model.

  • PDF

히스토그램 변환에서 기준분포의 표준편차 변경에 따른 강인한 화자인증 성능 개선 (Performance Improvement of Robust Speaker Verification According to Various Standard Deviations of a Reference Distribution in Histogram Transformation)

  • 권철홍
    • 말소리와 음성과학
    • /
    • 제2권3호
    • /
    • pp.127-134
    • /
    • 2010
  • Additive noise and channel mismatch strongly degrade the performance of speaker verification systems, as they distort the features of speech. In this paper a histogram transformation technique is presented to improve the robustness of text-independent speaker verification systems. The technique transforms the features extracted from speech such that their histogram is conformed to a reference distribution. The effect of different standard deviations for the reference distribution is investigated. Experimental results indicate that, in channel mismatched environments, the proposed technique offers significant improvements over existing techniques. We also verify performance improvement of the proposed method using statistics.

  • PDF

연령세대에 따른 말 산출의 시간적 특성: 말속도와 쉼을 중심으로 (The effects of speakers' age on temporal features of speech among healthy young, middle-aged, and older adults)

  • 김예지;이송민;최민경;정상민;성지은;이영미
    • 말소리와 음성과학
    • /
    • 제14권1호
    • /
    • pp.37-47
    • /
    • 2022
  • 본 연구의 목적은 정상 성인 화자의 연령세대에 따른 말 산출의 시간적 특성 간에 유의한 차이가 있는지를 분석하고, 말 산출 변수들 중에서 청년 화자와 노년 화자를 유의하게 분류할 수 있는 변수가 무엇인지 살펴보고자 하였다. 이를 위해 청년, 장년, 노년의 말속도(전체 말속도, 조음속도)와 발화당 쉼 빈도, 쉼 지속시간, 쉼의 실현 위치를 살펴보았다. 국립국어원에서 배포하는 오픈 코퍼스인 서울말 낭독 발화 말뭉치에서 청년층, 장년층, 노년층 각 10명씩 총 30명 화자의 발화를 선별해 말 산출의 시간적 특성을 분석하였다. 그 결과, 전체 말속도, 조음속도, 전체 쉼 빈도, 어절 간 쉼 빈도, 전체 쉼 지속시간, 어절 간 쉼 지속시간에 집단 간 유의한 차이가 발생했다. 사후 검정 결과, 장년층이 청년층보다, 노년층이 청년층보다 느린 말속도, 잦은 쉼 빈도, 긴 쉼 지속시간을 보였다. 반면 정상 성인에게서는 부적절한 쉼인 어절 내 쉼 빈도, 어절 내 쉼 지속시간에는 집단 간 유의한 차이가 없었다. 이중 청년층과 노년층을 유의하게 구별하는 변수는 전체 말속도로 나타났다. 노년층이 한 번 쉼을 가질 때 청·장년층과 비슷한 길이지만, 훨씬 더 빈번하게 가진다는 것을 보여주었다. 이러한 결과는 연령세대에 따라 말 산출의 시간적 특성에 변화가 나타난다는 것을 시사한다.

음성기술을 이용한 정신피로 측정에 관한 타당성 연구 (A Validity Study on Measurement of Mental Fatigue Using Speech Technology)

  • 송승규;김종열;장준수;권철홍
    • 말소리와 음성과학
    • /
    • 제5권1호
    • /
    • pp.3-10
    • /
    • 2013
  • This study proposes a method to measure mental fatigue using speech technology, which has not been used in previous research and is easier than existing complex and difficult methods. It aims at establishing a relationship between the human voice and mental fatigue based on experiments to measure the influence of mental fatigue on the human voice. Two monotonous tasks of simple calculation such as finding the sum of three one digit numbers were used to measure the feeling of monotony and two sets of subjective questionnaires were used to measure mental fatigue. While thirty subjects perform the experiment, responses to the questionnaire and speech data were collected. Speech features related to speech source and the vocal tract filter were extracted from the speech data. According to the results, speech parameters deeply related to mental fatigue are a mean and standard deviation of fundamental frequency, jitter, and shimmer. This study shows that speech technology is a useful method for measuring mental fatigue.