• Title/Summary/Keyword: speech process

Search Result 525, Processing Time 0.029 seconds

Development of English Speech Recognizer for Pronunciation Evaluation (발성 평가를 위한 영어 음성인식기의 개발)

  • Park Jeon Gue;Lee June-Jo;Kim Young-Chang;Hur Yongsoo;Rhee Seok-Chae;Lee Jong-Hyun
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.37-40
    • /
    • 2003
  • This paper presents the preliminary result of the automatic pronunciation scoring for non-native English speakers, and shows the developmental process for an English speech recognizer for the educational and evaluational purposes. The proposed speech recognizer, featuring two refined acoustic model sets, implements the noise-robust data compensation, phonetic alignment, highly reliable rejection, key-word and phrase detection, easy-to-use language modeling toolkit, etc., The developed speech recognizer achieves 0.725 as the average correlation between the human raters and the machine scores, based on the speech database YOUTH for training and K-SEC for test.

  • PDF

Automatic Synthesis Method Using Prosody-Rich Database (대용량 운율 음성데이타를 이용한 자동합성방식)

  • 김상훈
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.87-92
    • /
    • 1998
  • In general, the synthesis unit database was constructed by recording isolated word. In that case, each boundary of word has typical prosodic pattern like a falling intonation or preboundary lengthening. To get natural synthetic speech using these kinds of database, we must artificially distort original speech. However, that artificial process rather resulted in unnatural, unintelligible synthetic speech due to the excessive prosodic modification on speech signal. To overcome these problems, we gathered thousands of sentences for synthesis database. To make a phone level synthesis unit, we trained speech recognizer with the recorded speech, and then segmented phone boundaries automatically. In addition, we used laryngo graph for the epoch detection. From the automatically generated synthesis database, we chose the best phone and directly concatenated it without any prosody processing. To select the best phone among multiple phone candidates, we used prosodic information such as break strength of word boundaries, phonetic contexts, cepstrum, pitch, energy, and phone duration. From the pilot test, we obtained some positive results.

  • PDF

HMM-based missing feature reconstruction for robust speech recognition in additive noise environments (가산잡음환경에서 강인음성인식을 위한 은닉 마르코프 모델 기반 손실 특징 복원)

  • Cho, Ji-Won;Park, Hyung-Min
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.127-132
    • /
    • 2014
  • This paper describes a robust speech recognition technique by reconstructing spectral components mismatched with a training environment. Although the cluster-based reconstruction method can compensate the unreliable components from reliable components in the same spectral vector by assuming an independent, identically distributed Gaussian-mixture process of training spectral vectors, the presented method exploits the temporal dependency of speech to reconstruct the components by introducing a hidden-Markov-model prior which incorporates an internal state transition plausible for an observed spectral vector sequence. The experimental results indicate that the described method can provide temporally consistent reconstruction and further improve recognition performance on average compared to the conventional method.

Patterns of consonant deletion in the word-internal onset position: Evidence from spontaneous Seoul Korean speech

  • Kim, Jungsun;Yun, Weonhee;Kang, Ducksoo
    • Phonetics and Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.45-51
    • /
    • 2016
  • This study examined the deletion of onset consonant in the word-internal structure in spontaneous Seoul Korean speech. It used the dataset of speakers in their 20s extracted from the Korean Corpus of Spontaneous Speech (Yun et al., 2015). The proportion of deletion of word-internal onset consonants was analyzed using the linear mixed-effects regression model. The factors that promoted the deletion of onsets were primarily the types of consonants and their phonetic contexts. The results showed that onset deletion was more likely to occur for a lenis velar stop [k] than the other consonants, and in the phonetic contexts, when the preceding vowel was a low central vowel [a]. Moreover, some speakers tended to more frequently delete onset consonants (e.g., [k] and [n]) than other speakers, which reflected individual differences. This study implies that word-internal onsets undergo a process of gradient reduction within individuals' articulatory strategies.

Error Correction and Praat Script Tools for the Buckeye Corpus of Conversational Speech (벅아이 코퍼스 오류 수정과 코퍼스 활용을 위한 프랏 스크립트 툴)

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.29-47
    • /
    • 2012
  • The purpose of this paper is to show how to convert the label files of the Buckeye Corpus of Spontaneous Speech [1] into Praat format and to introduce some of the Praat scripts that will enable linguists to study various aspects of spoken American English present in the corpus. During the conversion process, several types of errors were identified and corrected either manually or automatically by the use of scripts. The Praat script tools that have been developed can help extract from the corpus massive amounts of phonetic measures such as the VOT of plosives, the formants of vowels, word frequency information and speech rates that span several consecutive words. The script tools can extract additional information concerning the phonetic environment of the target words or allophones.

Flattening Techniques for Pitch Detection (피치 검출을 위한 스펙트럼 평탄화 기법)

  • 김종국;조왕래;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

On-line model compensation using noise masking effect for robust speech recognition (잡음 차폐를 이용한 온라인 모델 보상)

  • Jung Gue-Jun;Cho Hoon-Young;Oh Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.215-218
    • /
    • 2003
  • In this paper we apply PMC (parallel model combination) to speech recognition system online. As a representative of model based noise compensation techniques, PMC compensates environmental mismatch by combining pretrained clean speech models and real-time estimated noise information. This is very effective approach for compensating extreme environmental mismatch but is inadequate to use in on-line system for heavy computational cost. To reduce the computational cost and to apply PMC online, we use a noise masking effect - the energy in a frequency band is dominated either by clean speech energy or by noise energy - in the process of model compensation. Experiments on artificially produced noisy speech data confirm that the proposed technique is fast and effective for the on-line model compensation.

  • PDF

An Effective Storage Method During A Sampling of Speech Signals (음성신호를 표본화할 동안 효율적인 실시간 저장기법)

  • Bae, Myungjin;Lee, Inseop;ANN, Souguil
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.3
    • /
    • pp.394-399
    • /
    • 1987
  • It is necessary for the speech samples to be stored in memory buffer before speech analyzers without a real time processor process them. In this paper, we propose an algorithm that uses the buffer efficiently, when the analog speech signal is converted to the digital samples by the analog to digital converter. In order to implement this method in real time, the buffer is divided into the starting buffer and the remaining buffer. Until a voiced speech is found, the converted samples are sequentially stored in the starting buffer, and then the buffer is shifted. When a voiced speech is found, the next samples are sequentally recorded in the remaining buffer.

  • PDF

The Effects of Reading Pronunciation Training of Korean Phonological Process Words for Chinese Learners (중국인 학습자의 우리말 음운변동 단어의 읽기 발음 훈련효과)

  • Lee, Yu-Ra;Kim, Soo-Jin
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.77-86
    • /
    • 2009
  • This study observes how the combined intervention program effects on the acquisition reading pronunciation of Korean phonological process words and the acquisition aspects of each phonological process rules to four Korean learners whose first language is Chinese. The training program is the combination of multisensory Auditory, Visual and Kinethetic (AVK) approach, wholistic approach, and metalinguistic approach. The training purpose is to evaluate how accurately they read the words of the phonological process which have fortisization, nasalization, lateralization, intermediate sound /ㅅ/ (/${\int}iot"$/). We access how they read the untrained words which include the four factors above. The intervention effects are analyzed by the multiple probe across subjects design. The results indicate that the combined phonological process rule explanation and the words activity intervention affects the four Chinese subjects in every type of word. The implications of the study are these: First, it suggests the effect of Korean pronunciation intervention in a concrete way. Second, it offers how to evaluate the phonological process and how to train people who are learning Korean language.

  • PDF

Implementation of Speech Recognition Filtering at Emergency (응급상황에서의 음성인식을 위한 필터기 구현)

  • Cho, Young-Im;Jang, Sung-Soon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.2
    • /
    • pp.208-213
    • /
    • 2010
  • Generally, the mal factor for speech recognition is the background noise in speech recognition. The noise is the reason to reduce the speech recognition performance. Owing to the fact, the place to recognize is very important. To improve the recognition performance from the sound having noise, we implemented the noise filtered Wiener filter at the signal process step which adopted the FIR filter. In FIR filter, it deal with the filtered speech signal which is appropriate frequency range of human speech frequency range. Therefore, we make the recognition system distinguish between noise and speech sound from the incoming speech signal.