• 제목/요약/키워드: speech rates

검색결과 271건 처리시간 0.029초

감정 인식을 위한 음성의 특징 파라메터 비교 (The Comparison of Speech Feature Parameters for Emotion Recognition)

  • 김원구
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2004년도 춘계학술대회 학술발표 논문집 제14권 제1호
    • /
    • pp.470-473
    • /
    • 2004
  • In this paper, the comparison of speech feature parameters for emotion recognition is studied for emotion recognition using speech signal. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy. MFCC parameters and their derivatives with or without cepstral mean subfraction are also used to evaluate the performance of the conventional pattern matching algorithms. Pitch and energy Parameters were used as a Prosodic information and MFCC Parameters were used as phonetic information. In this paper, In the Experiments, the vector quantization based emotion recognition system is used for speaker and context independent emotion recognition. Experimental results showed that vector quantization based emotion recognizer using MFCC parameters showed better performance than that using the Pitch and energy parameters. The vector quantization based emotion recognizer achieved recognition rates of 73.3% for the speaker and context independent classification.

  • PDF

Comfort Noise를 이용한 다중 적응 코드북 기반 패킷 손실 은닉 알고리즘 (A Packet Loss Concealment Algorithm Based on Multiple Adaptive Codebooks Using Comfort Noise)

  • 박남인;김홍국
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2008년도 하계종합학술대회
    • /
    • pp.873-874
    • /
    • 2008
  • In this paper, we propose a packet loss concealment (PLC) algorithm for CELP speech coders, which is based on multiple adaptive codebooks by using comfort noise for the lost packet recovery. The multiple adaptive codebooks are composed of a conventional adaptive codebook to model periodic excitation of speech and another adaptive codebook to provide a better estimate of excitation when packets are lost in the speech onset region. The performance of the proposed PLC algorithm is evaluated by implementing it into the G.729 decoder and compared with that of the PLC algorithm employed in the G.729 decoder by means of perceptual evaluation of speech quality (PESQ). It is shown from the experiments under different burstiness of packet loss rates of 3% and 5% that the proposed PLC algorithm provides higher PESQ scores than the G.729 PLC algorithm.

  • PDF

일본인 한국어 학습자의 분절음 실현과 발음 평가의 상관성 (The relationship between segmental production by Japanese learners of Korean and pronunciation evaluation)

  • 홍혜진;류혁수;정민화
    • 말소리와 음성과학
    • /
    • 제6권4호
    • /
    • pp.101-108
    • /
    • 2014
  • This study investigates the effects of Japanese learners' Korean segmental production on pronunciation evaluation by Korean native raters. Read speech from 24 learners whose native language is Japanese are transcribed at the phonemic level, and confusion matrices are generated based on the phonemic transcriptions. The deviance from the canonical pronunciation found in the learners' speech is analyzed in terms of phoneme substitutions, vowel insertions, and consonant deletions. Each learner's pronunciation is rated impressionistically by 5 Korean native raters. The result shows that the deviance from the canonical pronunciation is strongly correlated with the pronunciation evaluation scores. Especially, the rates of phoneme substitutions and vowel insertions which are very strongly correlated with the pronunciation evaluation scores.

초기 언어발달에 있어 환경적 영향의 특수성 - 중국 조선족 아동의 가정환경에 따른 단어발달에서 어머니 언어의 매개효과 - (The Specificity of Environmental Influence - Home Environment Affects Korean-Chinese Children's Early Language Development via Maternal Speech -)

  • 전효정;이귀옥;박혜원
    • 아동학회지
    • /
    • 제25권5호
    • /
    • pp.163-178
    • /
    • 2004
  • The hypothesis was tested that children whose families differ in socioeconomic status(SES) and educational level differ in their rates of productive language development because they have different language-learning experiences. Naturalistic interaction between mothers and their children was video taped. Transcripts of these interactions provided the basis for estimating the growth in children's productive vocabularies and properties of maternal speech. The sixty children from age 1 to 3 were selected in Yanji, China. The results show that the high educated mothers' children grew more than the low educated mothers' children in their mean length of utterances. Properties of maternal speech that differed as a function of mother's educational level fully accounted for this difference. Implications of these findings for mechanisms of environmental influence on child development are discussed.

  • PDF

Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System

  • 김태형
    • 한국통신학회논문지
    • /
    • 제34권3C호
    • /
    • pp.257-267
    • /
    • 2009
  • This paper presents a training method for neural networks and the employment of MSE (mean scare error) values as the basis of a decision regarding the identity claim of a speaker in a recurrent neural networks based speaker verification system. Recurrent neural networks (RNNs) are employed to capture temporally dynamic characteristics of speech signal. In the process of supervised learning for RNNs, target outputs are automatically generated and the generated target outputs are made to represent the temporal variation of input speech sounds. To increase the capability of discriminating between the true speaker and an impostor, a discriminative training method for RNNs is presented. This paper shows the use and the effectiveness of the MSE value, which is obtained from the Euclidean distance between the target outputs and the outputs of networks for test speech sounds of a speaker, as the basis of speaker verification. In terms of equal error rates, results of experiments, which have been performed using the Korean speech database, show that the proposed speaker verification system exhibits better performance than a conventional hidden Markov model based speaker verification system.

음성 자료에 대한 규칙 기반 Named Entity 인식 (Rule-based Named Entity (NE) Recognition from Speech)

  • 김지환
    • 대한음성학회지:말소리
    • /
    • 제58호
    • /
    • pp.45-66
    • /
    • 2006
  • In this paper, a rule-based (transformation-based) NE recognition system is proposed. This system uses Brill's rule inference approach. The performance of the rule-based system and IdentiFinder, one of most successful stochastic systems, are compared. In the baseline case (no punctuation and no capitalisation), both systems show almost equal performance. They also have similar performance in the case of additional information such as punctuation, capitalisation and name lists. The performances of both systems degrade linearly with the number of speech recognition errors, and their rates of degradation are almost equal. These results show that automatic rule inference is a viable alternative to the HMM-based approach to NE recognition, but it retains the advantages of a rule-based approach.

  • PDF

개량된 음성매개변수를 사용한 지속시간이 짧은 잡음음성 중의 배경잡음 분류 (Background Noise Classification in Noisy Speech of Short Time Duration Using Improved Speech Parameter)

  • 최재승
    • 한국정보통신학회논문지
    • /
    • 제20권9호
    • /
    • pp.1673-1678
    • /
    • 2016
  • 음성인식처리 분야에서 배경잡음으로 인하여 음성입력이 배경잡음으로 잘못 판단되는 원인이 되어 음성인식율의 저하를 초래한다. 이러한 종류의 잡음대책은 단순하지 않으므로 보다 고도한 잡음처리기술이 필요하게 된다. 따라서 본 논문에서는 잡음환경 중에서 정상적인 배경잡음 혹은 비정상적인 배경잡음과 지속 시간이 짧은 음성을 구별하는 알고리즘에 대하여 기술한다. 본 알고리즘은 다른 종류의 잡음과 음성을 구별하는 중요한 수단으로서 개량된 음성의 특징파리미터를 사용한다. 다음으로 다층퍼셉트론 네트워크에 의하여 잡음의 종류를 추정하는 알고리즘에 대해서 기술한다. 본 실험에서는 잡음과 음성이 구별이 가능하도록 실험적으로 확인하였다.

연령세대에 따른 말 산출의 시간적 특성: 말속도와 쉼을 중심으로 (The effects of speakers' age on temporal features of speech among healthy young, middle-aged, and older adults)

  • 김예지;이송민;최민경;정상민;성지은;이영미
    • 말소리와 음성과학
    • /
    • 제14권1호
    • /
    • pp.37-47
    • /
    • 2022
  • 본 연구의 목적은 정상 성인 화자의 연령세대에 따른 말 산출의 시간적 특성 간에 유의한 차이가 있는지를 분석하고, 말 산출 변수들 중에서 청년 화자와 노년 화자를 유의하게 분류할 수 있는 변수가 무엇인지 살펴보고자 하였다. 이를 위해 청년, 장년, 노년의 말속도(전체 말속도, 조음속도)와 발화당 쉼 빈도, 쉼 지속시간, 쉼의 실현 위치를 살펴보았다. 국립국어원에서 배포하는 오픈 코퍼스인 서울말 낭독 발화 말뭉치에서 청년층, 장년층, 노년층 각 10명씩 총 30명 화자의 발화를 선별해 말 산출의 시간적 특성을 분석하였다. 그 결과, 전체 말속도, 조음속도, 전체 쉼 빈도, 어절 간 쉼 빈도, 전체 쉼 지속시간, 어절 간 쉼 지속시간에 집단 간 유의한 차이가 발생했다. 사후 검정 결과, 장년층이 청년층보다, 노년층이 청년층보다 느린 말속도, 잦은 쉼 빈도, 긴 쉼 지속시간을 보였다. 반면 정상 성인에게서는 부적절한 쉼인 어절 내 쉼 빈도, 어절 내 쉼 지속시간에는 집단 간 유의한 차이가 없었다. 이중 청년층과 노년층을 유의하게 구별하는 변수는 전체 말속도로 나타났다. 노년층이 한 번 쉼을 가질 때 청·장년층과 비슷한 길이지만, 훨씬 더 빈번하게 가진다는 것을 보여주었다. 이러한 결과는 연령세대에 따라 말 산출의 시간적 특성에 변화가 나타난다는 것을 시사한다.

자발화에 나타난 형태소 유형에 따른 3-4세 아동의 치경마찰음 오류 (Alveolar Fricative Sound Errors by the Type of Morpheme in the Spontaneous Speech of 3- and 4-Year-Old Children)

  • 김수진;김정미;윤미선;장문수;차재은
    • 말소리와 음성과학
    • /
    • 제4권3호
    • /
    • pp.129-136
    • /
    • 2012
  • Korean alveolar fricatives are late-developing speech sounds. Most previous research on phonemes used individual words or pseudo words to produce sounds, but word-level phonological analysis does not always reflect a child's practical articulation ability. Also, there has been limited research on articulation development looking at speech production by grammatical morphemes despite its importance in Korean language. Therefore, this research examines the articulation development and phonological patterns of the /s/ phoneme in terms of morphological types produced in children's spontaneous conversational speech. The subjects were twenty-two typically developing 3- and 4-year-old Koreans. All children showed normal levels in three screening tests: hearing, vocabulary, and articulation. Spontaneous conversational samples were recorded at the children's homes. The results are as follows. The error rates decreased with increasing age in all morphological contexts. Also, error percentages within an age group were significantly lower in lexical morphemes than in grammatical morphemes. The stopping of fricative sounds was the main error pattern in all morphological contexts and reduced as age increased. This research shows that articulation performance can differ significantly by morphological contexts. The present study provides data that can be used to identify the difficult context for articulatory evaluation and therapy of alveolar fricative sounds.

음성인식 로봇을 위한 동시통화검출 기반의 강인한 음성 끝점 검출 (Robust End Point Detection for Robot Speech Recognition Using Double Talk Detection)

  • 문성규;박진수;고한석
    • 한국음향학회지
    • /
    • 제31권3호
    • /
    • pp.161-169
    • /
    • 2012
  • 본 논문에서는 반향이 큰 로봇 환경에 강인한 음성 끝점 검출 방법을 제안한다. 양방향 대화 로봇과 같이 반향대 신호 비가 -5 dB 이하인 반향환경에서는, 반향제거기의 성능이 저하되어 사용자 음성 에너지와 비슷한 크기의 에너지를 갖는 잔여반향이 생긴다. 잡음에 강인한 기존의 음성 끝점검출 방법이라도, 사용자 음성과 비슷한 수준의 에너지를 갖는 잔여반향은 음성으로 오검출하기 때문에 정확한 음성 끝점검출이 어렵다. 반향 환경에 강인한 끝점검출을 위해, 본 논문에서는 음성/반향 구간 판별에 좋은 성능을 보이는 동시통화검출의 결과를 기존의 음성끝점검출 방법과 AND 연산하여 음성끝점검출기를 구성하였다. 제안하는 방법의 평가를 위해 반향이 큰 환경에서 고립단어 인식을 실험하였고, 다양한 실험환경에서 기존 음성 끝점검출 방법보다 평균 30 % 이상의 인식 성능 향상을 확인할 수 있었다.