• 제목/요약/키워드: speech rates

검색결과 271건 처리시간 0.195초

순환 신경망 모델을 이용한 한국어 음소의 음성인식에 대한 연구 (A Study on the Speech Recognition of Korean Phonemes Using Recurrent Neural Network Models)

  • 김기석;황희영
    • 대한전기학회논문지
    • /
    • 제40권8호
    • /
    • pp.782-791
    • /
    • 1991
  • In the fields of pattern recognition such as speech recognition, several new techniques using Artifical Neural network Models have been proposed and implemented. In particular, the Multilayer Perception Model has been shown to be effective in static speech pattern recognition. But speech has dynamic or temporal characteristics and the most important point in implementing speech recognition systems using Artificial Neural Network Models for continuous speech is the learning of dynamic characteristics and the distributed cues and contextual effects that result from temporal characteristics. But Recurrent Multilayer Perceptron Model is known to be able to learn sequence of pattern. In this paper, the results of applying the Recurrent Model which has possibilities of learning tedmporal characteristics of speech to phoneme recognition is presented. The test data consist of 144 Vowel+ Consonant + Vowel speech chains made up of 4 Korean monothongs and 9 Korean plosive consonants. The input parameters of Artificial Neural Network model used are the FFT coefficients, residual error and zero crossing rates. The Baseline model showed a recognition rate of 91% for volwels and 71% for plosive consonants of one male speaker. We obtained better recognition rates from various other experiments compared to the existing multilayer perceptron model, thus showed the recurrent model to be better suited to speech recognition. And the possibility of using Recurrent Models for speech recognition was experimented by changing the configuration of this baseline model.

A New Pruning Method for Synthesis Database Reduction Using Weighted Vector Quantization

  • Kim, Sanghun;Lee, Youngjik;Keikichi Hirose
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권4E호
    • /
    • pp.31-38
    • /
    • 2001
  • A large-scale synthesis database for a unit selection based synthesis method usually retains redundant synthesis unit instances, which are useless to the synthetic speech quality. In this paper, to eliminate those instances from the synthesis database, we proposed a new pruning method called weighted vector quantization (WVQ). The WVQ reflects relative importance of each synthesis unit instance when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through the objective and subjective evaluations of the synthetic speech quality: one to simply limit maximum number of instance, and the other based on normal VQ-based clustering. The proposed method showed the best performance under 50% reduction rates. Over 50% of reduction rates, the synthetic speech quality is not seriously but perceptibly degraded. Using the proposed method, the synthesis database can be efficiently reduced without serious degradation of the synthetic speech quality.

  • PDF

Overlapping of /o/ and /u/ in modern Seoul Korean: focusing on speech rate in read speech

  • Igeta, Takako;Hiroya, Sadao;Arai, Takayuki
    • 말소리와 음성과학
    • /
    • 제9권1호
    • /
    • pp.1-7
    • /
    • 2017
  • Previous studies have reported on the overlapping of $F_1$ and $F_2$ distribution for the vowels /o/ and /u/ produced by young Korean speakers of the Seoul dialect. It has been suggested that the overlapping of /o/ and /u/ occurs due to sound change. However, few studies have examined whether speech rate influences the overlapping of /o/ and /u/. On the other hand, previous studies have reported that the overlapping of /o/ and /u/ in syllable produced by male speakers is smaller than by female speakers. Few reports have investigated on the overlapping of the two vowels in read speech produced by male speakers. In the current study, we examined whether speech rates affect overlapping of /o/ and /u/ in read speech by male and female speakers. Read speech produced by twelve young adult native speakers of Seoul dialect were recorded in three speech rates. For female speakers, discriminant analysis showed that the discriminant rate became lower as the speech rate increases from slow to fast. Thus, this indicates that speech rate is one of the factors affecting the overlapping of /o/ and /u/. For male speakers, on the other hand, the discriminant rate was not correlated with speech rate, but the overlapping was larger than that of female speakers in read speech. Moreover, read speech by male speakers was less clear than by female speakers. This indicates that the overlapping may be related to unclear speech by sociolinguistic reasons for male speakers.

정상 성인의 조음밸브에 대한 내${\cdot}$외전 비율 (Fast ab/adduction Rate of Articulation Valves in Normal Adults)

  • 박희준;한지연
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.149-151
    • /
    • 2007
  • This study was designed to investigate fast ab/adduction rate of articulation valves in normal adults. The measurement of fast ab/aduction rate has traditionally been used for assessment, diagnosis and therapy in patients who suffered from dysarthria, functional articulation disorders or apraxia of speech. Fast ab/adduction rate shows the documented structural and physiological changes in the central nervous system and the peripheral components of oral and speech production mechanism. Fast ab/adduction rates were obtained from 20 normal subjects by producing the repetition of vocal function (/ihi/), tongue function (/t${\wedge}$/), velopharyngeal function (/m/), and labial function (/p${\wedge}$/). The Aerophone II was used for data recording. The results of finding as follows: average fast ab/adduction rates were vocal function(6.21cps), tongue function(7.42cps), velopharyngeal function(5.23cps), labial function (6.93cps). The results of this study are guidelines of normal diadochokinetic rates. In addition, they can indicate the severity of diseases and evaluation of treatment.

  • PDF

VQ/HMM에 의한 화자독립 음성인식에서 다수 후보자를 인식 대상으로 제출하는 방법에 관한 연구 (A Study on the Submission of Multiple Candidates for Decision in Speaker-Independent Speech Recognition by VQ/HMM)

  • 이창영;남호수
    • 음성과학
    • /
    • 제12권3호
    • /
    • pp.115-124
    • /
    • 2005
  • We investigated on the submission of multiple candidates in speaker-independent speech recognition by VQ/HMM. Submission of fixed number of multiple candidates has first been examined. As the number of candidates increases by two, three, and four, the recognition error rates were found to decrease by 41%, 58%, and 65%, respectively compared to that of a single candidate. We tried another approach that the candidates within a range of Viterbi scores are submitted. The number of candidates showed geometric increase as the admitted range becomes large. For a practical application, a combination of the above two methods was also studied. We chose the candidates within some range of Viterbi scores and limited the maximum number of candidates submitted to five. Experimental results showed that recognition error rates of less than 10% could be achieved with average number of candidates of 3.2 by this method.

  • PDF

자동차 환경에서 Oak DSP 코어 기반 음성 인식 시스템 실시간 구현 (A Real-Time Implementation of Speech Recognition System Using Oak DSP core in the Car Noise Environment)

  • 우경호;양태영;이충용;윤대희;차일환
    • 음성과학
    • /
    • 제6권
    • /
    • pp.219-233
    • /
    • 1999
  • This paper presents a real-time implementation of a speaker independent speech recognition system based on a discrete hidden markov model(DHMM). This system is developed for a car navigation system to design on-chip VLSI system of speech recognition which is used by fixed point Oak DSP core of DSP GROUP LTD. We analyze recognition procedure with C language to implement fixed point real-time algorithms. Based on the analyses, we improve the algorithms which are possible to operate in real-time, and can verify the recognition result at the same time as speech ends, by processing all recognition routines within a frame. A car noise is the colored noise concentrated heavily on the low frequency segment under 400 Hz. For the noise robust processing, the high pass filtering and the liftering on the distance measure of feature vectors are applied to the recognition system. Recognition experiments on the twelve isolated command words were performed. The recognition rates of the baseline recognizer were 98.68% in a stopping situation and 80.7% in a running situation. Using the noise processing methods, the recognition rates were enhanced to 89.04% in a running situation.

  • PDF

Speech recognition rates and acoustic analyses of English vowels produced by Korean students

  • Yang, Byunggon
    • 말소리와 음성과학
    • /
    • 제14권2호
    • /
    • pp.11-17
    • /
    • 2022
  • English vowels play an important role in verbal communication. However, Korean students tend to experience difficulty pronouncing a certain set of vowels despite extensive education in English. The aim of this study is to apply speech recognition software to evaluate Korean students' pronunciation of English vowels in minimal pair words and then to examine acoustic characteristics of the pairs in order to check their pronunciation problems. Thirty female Korean college students participated in the recording. Speech recognition rates were obtained to examine which English vowels were correctly pronounced. To compare and verify the recognition results, such acoustic analyses as the first and second formant trajectories and durations were also collected using Praat. The results showed an overall recognition rate of 54.7%. Some students incorrectly switched the tense and lax counterparts and produced the same vowel sounds for qualitatively different English vowels. From the acoustic analyses of the vowel formant trajectories, some of these vowel pairs were almost overlapped or exhibited slight acoustic differences at the majority of the measurement points. On the other hand, statistical analyses on the first formant trajectories of the three vowel pairs revealed significant differences throughout the measurement points, a finding that requires further investigation. Durational comparisons revealed a consistent pattern among the vowel pairs. The author concludes that speech recognition and analysis software can be useful to diagnose pronunciation problems of English-language learners.

잡음하의 음성인식을 위한 스펙트럴 보상과 주파수 가중 HMM (A Frequency Weighted HMM with Spectral Compensation for Noisy Speech Recognition)

  • 이광석
    • 한국정보통신학회논문지
    • /
    • 제5권3호
    • /
    • pp.443-449
    • /
    • 2001
  • 잡음환경에서의 음성인식은 실제의 환경에서의 음성인식에서 매우 중요한 애로기술로써 이를 해결하기 위한 연구는 꾸준히 연구되고 있다. 따라서 본 연구는 음성인식분야에서 가장 많이 사용하고 있는 HMM처리 시잡음처리의 문제점을 주파수 가중치 부가 HMM으로 해결하는 방법을 제안하고 그 성능을 인식실험을 통하여 검토하였다. 그 결과 SS처리를 함께 사용하는 $MCE-\mu$, MCE-$\rho$가 가장 잡음에 강한 방식임을 알 수 있었다.

  • PDF

음성명령기반 26관절 보행로봇 실시간 작업동작제어에 관한 연구 (A Study on Real-Time Walking Action Control of Biped Robot with Twenty Six Joints Based on Voice Command)

  • 조상영;김민성;양준석;구영목;정양근;한성현
    • 제어로봇시스템학회논문지
    • /
    • 제22권4호
    • /
    • pp.293-300
    • /
    • 2016
  • The Voice recognition is one of convenient methods to communicate between human and robots. This study proposes a speech recognition method using speech recognizers based on Hidden Markov Model (HMM) with a combination of techniques to enhance a biped robot control. In the past, Artificial Neural Networks (ANN) and Dynamic Time Wrapping (DTW) were used, however, currently they are less commonly applied to speech recognition systems. This Research confirms that the HMM, an accepted high-performance technique, can be successfully employed to model speech signals. High recognition accuracy can be obtained by using HMMs. Apart from speech modeling techniques, multiple feature extraction methods have been studied to find speech stresses caused by emotions and the environment to improve speech recognition rates. The procedure consisted of 2 parts: one is recognizing robot commands using multiple HMM recognizers, and the other is sending recognized commands to control a robot. In this paper, a practical voice recognition system which can recognize a lot of task commands is proposed. The proposed system consists of a general purpose microprocessor and a useful voice recognition processor which can recognize a limited number of voice patterns. By simulation and experiment, it was illustrated the reliability of voice recognition rates for application of the manufacturing process.

잡음음성 음향모델 적응에 기반한 잡음에 강인한 음성인식 (Noise Robust Speech Recognition Based on Noisy Speech Acoustic Model Adaptation)

  • 정용주
    • 말소리와 음성과학
    • /
    • 제6권2호
    • /
    • pp.29-34
    • /
    • 2014
  • In the Vector Taylor Series (VTS)-based noisy speech recognition methods, Hidden Markov Models (HMM) are usually trained with clean speech. However, better performance is expected by training the HMM with noisy speech. In a previous study, we could find that Minimum Mean Square Error (MMSE) estimation of the training noisy speech in the log-spectrum domain produce improved recognition results, but since the proposed algorithm was done in the log-spectrum domain, it could not be used for the HMM adaptation. In this paper, we modify the previous algorithm to derive a novel mathematical relation between test and training noisy speech in the cepstrum domain and the mean and covariance of the Multi-condition TRaining (MTR) trained noisy speech HMM are adapted. In the noisy speech recognition experiments on the Aurora 2 database, the proposed method produced 10.6% of relative improvement in Word Error Rates (WERs) over the MTR method while the previous MMSE estimation of the training noisy speech produced 4.3% of relative improvement, which shows the superiority of the proposed method.