• Title/Summary/Keyword: speech database

Search Result 330, Processing Time 0.024 seconds

Diachronic Change of High Vowel Devoicing in Japanese Dialects (일본어 모음 무성화의 통시적 변화)

  • Byun, Hi-Gyung
    • Phonetics and Speech Sciences
    • /
    • v.5 no.4
    • /
    • pp.171-184
    • /
    • 2013
  • This study investigated the devoicing rate of Japanese high vowels, focusing on regional and generational differences by acoustically analyzing vowels from two large speech databases. The first speech database used in this study was collected between 1986 and 1988 from 41 areas (prefectures) which included 607 participants (299 high school students and 308 their grandparents). The second was taken from a 2006-2007 collection from seven areas as a follow-up investigation to the first database consisting of 463 participants ranging in age from 8-90 year olds. The results revealed there is a generational as well as regional difference in the devoicing rate in almost all areas. Based on those results, a new distribution map reflecting a current devoicing rate of the younger generation was presented. Furthermore, by comparing the two data sets, this study confirmed age difference in the devoicing rate is not age-grading but a sound change in progress. This study discusses the social factors for changes in the devoicing rate of some areas and then applies the devoicing rate of five areas to an S-curve model to predict the future devoicing rate.

Implementation of A Fast Preprocessor for Isolated Word Recognition (고립단어 인식을 위한 빠른 전처리기의 구현)

  • Ahn, Young-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.96-99
    • /
    • 1997
  • This paper proposes a very fast preprocessor for isolated word recognition. The proposed preprocessor has a small computational cost for extracting candidate words. In the preprocessor, we used a feature sorting algorithm instead of vector quantization to reduce the computational cost. In order to show the effectiveness of our preprocessor, we compared it to a speech recognition system based on semi-continuous hidden Markov Model and a VQ-based preprocessor by computing their recognition performances of a speaker independent isolated word recognition. For the experiments, we used the speech database consisting of 244 words which were uttered by 40 male speakers. The set of speech data uttered by 20 male speakers was used for training, and the other set for testing. As the results, the accuracy of the proposed preprocessor was 99.9% with 90% reduction rate for the speech database.

  • PDF

Implementation of text to speech terminal system by distributed database (데이터베이스 분산을 통한 소용량 문자-음성 합성 단말기 구현)

  • 김영길;박창현;양윤기
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2431-2434
    • /
    • 2003
  • In this research, our goal is to realize Korean Distribute TTS system with server/client function in wireless network. The speech databases and some routines of TTS system is stuck with the server which has strong functions and we made Korean speech databases and accomplished research about DB which is suitable for distributed TTS. We designed a terminal has the minimum setting which operate this TTS and designed proper protocol so we will check action of Distributed TTS.

  • PDF

GMM based Speaker Identification using Pitch Information (피치 정보를 이용한 GMM 기반의 화자 식별)

  • Park Taesun;Hahn Minsoo
    • MALSORI
    • /
    • no.47
    • /
    • pp.121-129
    • /
    • 2003
  • This paper describes the use of pitch information for speaker identification. The recognition system is a GMM based one with 4 connected Korean digits speech database. The mean of the pitch period in voiced sections of speech are shown to be ,useful at discriminating between speakers. Utilizing this feature with Gaussian mixture model in the speaker identification system gave a marked improvement, maximum 6% improvement comparing to the baseline Gaussian mixture model.

  • PDF

Development of a Baseline Platform for Spoken Dialog Recognition System (대화음성인식 시스템 구현을 위한 기본 플랫폼 개발)

  • Chung Minhwa;Seo Jungyun;Lee Yong-Jo;Han Myungsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.32-35
    • /
    • 2003
  • This paper describes our recent work for developing a baseline platform for Korean spoken dialog recognition. In our work, We have collected about 65 hour speech corpus with auditory transcriptions. Linguistic information on various levels such as mophology, syntax, semantics, and discourse is attached to the speech database by using automatic or semi-automatic tools for tagging linguistic information.

  • PDF

Phonetic Evaluation in Speech Sciences and Issues in Phonetic Transcription (음성 평가의 다학문적 현황과 표기의 과제)

  • Kim, Jong-Mi
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.259-280
    • /
    • 2003
  • The paper discusses the way in which speech sounds are being evaluated and transcribed in various fields of speech sciences, and suggests ways for a more accurate transcription. The academic fields explored are of phonetics, speech processing, speech pathology, and foreign language education. The discussion centers on the International Phonetic Alphabet (IPA), most commonly used in these fields, and other less widely-accepted transcription conventions such as the TOnes and Break Indices (ToBI), the Speech Assessment Methods Phonetic Alphabet (SAMPA), an extension of the official Korean Romanization (KORBET), and the American-English transcription system in the TIMIT database (TIMITBET). These transcription conventions are dealt with Korean, English, and Korean-accented English. The paper demonstrates that each transcription can exclusively be recommended for a specific need from different academic fields. Due to its publicity, the IPA is best suited for phonetic evaluation in the fields of phonetics, speech pathology, and foreign language education. The rest of the transcriptions are useful for keyboard-inputting the phonetically evaluated data from all these fields as well as for sound transcription in speech engineering, because they use convenient letter symbols for typing, searching, and programming. Several practical suggestions are made to maintain the transcriptional efficiency and consistency to accommodate the intra-and inter-transcriber variability.

  • PDF

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

  • Chung, Yong-Joo
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.55-65
    • /
    • 2008
  • Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.

  • PDF

Emotion Robust Speech Recognition using Speech Transformation (음성 변환을 사용한 감정 변화에 강인한 음성 인식)

  • Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.5
    • /
    • pp.683-687
    • /
    • 2010
  • This paper studied some methods which use frequency warping method that is the one of the speech transformation method to develope the robust speech recognition system for the emotional variation. For this purpose, the effect of emotional variations on the speech signal were studied using speech database containing various emotions and it is observed that speech spectrum is affected by the emotional variation and this effect is one of the reasons that makes the performance of the speech recognition system worse. In this paper, new training method that uses frequency warping in training process is presented to reduce the effect of emotional variation and the speech recognition system based on vocal tract length normalization method is developed to be compared with proposed system. Experimental results from the isolated word recognition using HMM showed that new training method reduced the error rate of the conventional recognition system using speech signal containing various emotions.

Study on Efficient Generation of Dictionary for Korean Vocabulary Recognition (한국어 음성인식을 위한 효율적인 사전 구성에 관한 연구)

  • Lee Sang-Bok;Choi Dae-Lim;Kim Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.41-44
    • /
    • 2002
  • This paper is related to the enhancement of speech recognition rate using enhanced pronunciation dictionary. Modern large vocabulary, continuous speech recognition systems have pronunciation dictionaries. A pronunciation dictionary provides pronunciation information for each word in the vocabulary in phonemic units, which are modeled in detail by the acoustic models. But in most speech recognition system based on Hidden Markov Model, actual pronunciation variations are disregarded. Without the pronunciation variations in the speech recognition system, the phonetic transcriptions in the dictionary do not match the actual occurrences in the database. In this paper, we proposed the unvoiced rule of semivowel in allophone rules to pronunciation dictionary. Experimental results on speech recognition system give higher performance than existing pronunciation dictionaries.

  • PDF

Analysis of Error Patterns in ]Korean Connected Digit Telephone Speech Recognition (한국어 연속 숫자음 전화 음성 인식에서의 오인식 유형 분석)

  • Kim Min Sung;Jung Sung Yun;Son Jong Mok;Bae Keun Sung;Kim Sang Hun
    • MALSORI
    • /
    • no.46
    • /
    • pp.77-86
    • /
    • 2003
  • Channel distortion and coarticulation effect in the Korean connected digit telephone speech make it difficult to achieve high performance of connected digit recognition in the telephone environment. In this paper, as a basic research to improve the recognition performance of Korean connected digit telephone speech, recognition error patterns are investigated and analyzed. Korean connected digit telephone speech database released by SiTEC and HTK system are used for recognition experiments. Both DWFBA and MRTCN methods are used for feature extraction and channel compensation, respectively. Experimental results are discussed with our findings.

  • PDF