Search | Korea Science

A Study on the Voice Conversion with HMM-based Korean Speech Synthesis (HMM 기반의 한국어 음성합성에서 음색변환에 관한 연구)

Kim, Il-Hwan;Bae, Keun-Sung
- MALSORI
- /
- v.68
- /
- pp.65-74
- /
- 2008
A statistical parametric speech synthesis system based on the hidden Markov models (HMMs) has grown in popularity over the last few years, because it needs less memory and low computation complexity and is suitable for the embedded system in comparison with a corpus-based unit concatenation text-to-speech (TTS) system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately. In this paper, we present experimental results of voice characteristics conversion using the HMM-based Korean speech synthesis system. The results have shown that conversion of voice characteristics could be achieved using a few sentences uttered by a target speaker. Synthetic speech generated from adapted models with only ten sentences was very close to that from the speaker dependent models trained using 646 sentences.
PDF

An Implementation of Real-Time Speaker Verification System on Telephone Voices Using DSP Board (DSP보드를 이용한 전화음성용 실시간 화자인증 시스템의 구현에 관한 연구)

Lee Hyeon Seung;Choi Hong Sub
- MALSORI
- /
- no.49
- /
- pp.145-158
- /
- 2004
This paper is aiming at implementation of real-time speaker verification system using DSP board. Dialog/4, which is based on microprocessor and DSP processor, is selected to easily control telephone signals and to process audio/voice signals. Speaker verification system performs signal processing and feature extraction after receiving voice and its ID. Then through computing the likelihood ratio of claimed speaker model to the background model, it makes real-time decision on acceptance or rejection. For the verification experiments, total 15 speaker models and 6 background models are adopted. The experimental results show that verification accuracy rates are 99.5% for using telephone speech-based speaker models.
PDF

A Study on Exceptional Pronunciations For Automatic Korean Pronunciation Generator (한국어 자동 발음열 생성 시스템을 위한 예외 발음 연구)

Kim Sunhee
- MALSORI
- /
- no.48
- /
- pp.57-67
- /
- 2003
This paper presents a systematic description of exceptional pronunciations for automatic Korean pronunciation generation. An automatic pronunciation generator in Korean is an essential part of a Korean speech recognition system and a TTS (Text-To-Speech) system. It is composed of a set of regular rules and an exceptional pronunciation dictionary. The exceptional pronunciation dictionary is created by extracting the words that have exceptional pronunciations, based on the characteristics of the words of exceptional pronunciation through phonological research and the systematic analysis of the entries of Korean dictionaries. Thus, the method contributes to improve performance of automatic pronunciation generator in Korean as well as the performance of speech recognition system and TTS system in Korean.
PDF

DialogStudio;A Spoken Dialog System Workbench (음성대화시스템 워크벤취로서의 DialogStudio 개발)

Jung, Sang-Keun;Lee, Cheon-Jae;Lee, Geun-Bae
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.311-314
- /
- 2007
Spoken dialog system development includes many laborious and inefficient tasks. Since there are many components such as speech recognizer, language understanding, dialog management and knowledge management in a spoken dialog system, a developer should take an effort to edit corpus and train each model separately. To reduce a cost for editting corpus and training each models, we need more systematic and efficent working environment. For the working environment, we propose DialogStudio as an spoken dialog system workbench.
PDF

Design of Robust Speech Recognition System Using Tandem Architecture (탠덤 구조를 이용한 강인한 음성 인식 시스템 설계)

Yun, Young-Sun;Lee, Yun-Keun
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.323-326
- /
- 2007
The various studies of combining neural network and hidden Markov models within a single system are done with expectations that it may potentially combine the advantages of both systems. With the influence of these studies, tandem approach was presented to use neural network as the classifier and hidden Markov models as the decoder. In this paper, we applied the trend information of segmental features to tandem architecture and used posterior probabilities, which are the output of neural network, as inputs of recognition system. The experiments are performed on Aurora2 database to examine the potentiality of the trend feature based tandem architecture. The proposed method shows the better results than the baseline system on very low SNR environments.
PDF

An Implementation of a 3D Audio Production System Using Stereo Loudspeakers for Virtual Reality (가상현실을 위한 스테레오 스피커 기반 3차원 입체음향 재생 시스템 구현)

Kim, Yong-Guk;Lee, Young-Han;Kim, Hong-Kook
- Proceedings of the KSPS conference
- /
- 2006.11a
- /
- pp.113-116
- /
- 2006
In this paper, we first implement an audio playback system for virtual reality by providing 3D audio effects to listeners. In general, such a 3D audio playback system utilizes a sound localization technique using head related transfer function (HRTF) to generate 3D audio effect. However, the 3D audio effect is degraded due to the crosstalk in the stereo loudspeaker environment. To enhance the 3D sound effect, we implement the crosstalk cancellation technique proposed by Atal and Schroeder and apply it to the 3D audio system.
PDF

A computational algorithm for F0 contour generation in Korean developed with prosodically labeled databases using K-ToBI system (K-ToBI 기호에 준한 F0 곡선 생성 알고리듬)

Lee YongJu;Lee Sook-hyang;Kim Jong-Jin;Go Hyeon-Ju;Kim Yeong-Il;Kim Sang-Hun;Lee Jeong-Cheol
- MALSORI
- /
- no.35_36
- /
- pp.131-143
- /
- 1998
This study describes an algorithm for the F0 contour generation system for Korean sentences and its evaluation results. 400 K-ToBI labeled utterances were used which were read by one male and one female announcers. F0 contour generation system uses two classification trees for prediction of K-ToBI labels for input text and 11 regression trees for prediction of F0 values for the labels. Evaluation results of the system showed 77.2% prediction accuracy for prediction of IP boundaries and 72.0％ prediction accuracy for AP boundaries. Information of voicing and duration of the segments was not changed for F0 contour generation and its evaluation. Evaluation results showed 23.5Hz RMS error and 0.55 correlation coefficient in F0 generation experiment using labelling information from the original speech data.
PDF

The research on agreement statistics analysis between factors of diagnosis (사상체질 진단요소들 간의 일치도 분석연구)

Jang, Eun-Su;Kim, Ho-Seok;Lee, Si-Woo;Kim, Jong-Yeol
- Korean Journal of Oriental Medicine
- /
- v.12 no.2 s.17
- /
- pp.103-113
- /
- 2006
Objectives we intended to know how much did it relate with the results between the instruments of diagnosis by using methods of three factors - QSCCII, PSSC(Phonetic System for Sasang Constitution)-2004, and body measurement which are usually used in diagnosing the Sasang Constitution in clinics Methods We diagnosed Sasang constitution through QSCCII, PSSC(Phonetic System for Sasang Constitution)-2004, Body measurement as a dignosis factors and we used Kappa coefficient to estimate simularity between diagnosis factors, and SPSS 12.0K to analyze data Results and conchusions 1. The orders of agreement statistics are different in the currency of Sasang Constitution diagnosis, Soeum-in was highest and Taeum-in lowest in the the fricency of Sasang Conctitution Diagnosis in the QSCCII, Soeum-in was highest Soyang-in lowest in the PSSC and Taeum-in highest, Soyang-in lowest in the body measurement so, we analogized incorrection in Sasang Constitution Diagnosis 2. Among 443 subjects, 156 (35.3%) had same dignosis in three Sasang Constitution factors. It means agreement statistics among factors of diagnosis are very low, so it is absolutely nessessary to research connection among those, especially Soyang-in part 3. Totally, it is not robust to apply these factors on Sasang Constitution diagnosis, especially agreement statistics between two kinds of Sasang Constitution diagnosis as $0.358{\sim}0.380$. However, we can have a possibility the more we use Sasang Constitution dignosis factors, the higher the agreement statistics is, through the ascending of agreement statistics as $0.526{\sim}0.592$, among three kinds of Sasang Constitution diagnosis To evaluate accuracy of Sasang Constitution diagnosis, it is nessessary to collect data from the subjects who are dignosed through the evidences such as herb medicine, disease and normal symption observation, etc. Using these data, we have to evaluate correction of seperated Sasang Constitution diagnosis methods and to connect those.
PDF

A Study on the Multilingual Speech Recognition using International Phonetic Language (IPA를 활용한 다국어 음성 인식에 관한 연구)

Kim, Suk-Dong;Kim, Woo-Sung;Woo, In-Sung
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.12 no.7
- /
- pp.3267-3274
- /
- 2011
Recently, speech recognition technology has dramatically developed, with the increase in the user environment of various mobile devices and influence of a variety of speech recognition software. However, for speech recognition for multi-language, lack of understanding of multi-language lexical model and limited capacity of systems interfere with the improvement of the recognition rate. It is not easy to embody speech expressed with multi-language into a single acoustic model and systems using several acoustic models lower speech recognition rate. In this regard, it is necessary to research and develop a multi-language speech recognition system in order to embody speech comprised of various languages into a single acoustic model. This paper studied a system that can recognize Korean and English as International Phonetic Language (IPA), based on the research for using a multi-language acoustic model in mobile devices. Focusing on finding an IPA model which satisfies both Korean and English phonemes, we get 94.8% of the voice recognition rate in Korean and 95.36% in English.
https://doi.org/10.5762/KAIS.2011.12.7.3267 인용 PDF KSCI

Morpheme-based Korean broadcast news transcription (형태소 기반의 한국어 방송뉴스 인식)

Park Young-Hee;Ahn Dong-Hoon;Chung Minhwa
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.123-126
- /
- 2002
In this paper, we describe our LVCSR system for Korean broadcast news transcription. The main focus is to find the most proper morpheme-based lexical model for Korean broadcast news recognition to deal with the inflectional flexibilities in Korean. There are trade-offs between lexicon size and lexical coverage, and between the length of lexical unit and WER. In our system, we analyzed the training corpus to obtain a small 24k-morpheme-based lexicon with 98.8％ coverage. Then, the lexicon is optimized by combining morphemes using statistics of training corpus under monosyllable constraint or maximum length constraint. In experiments, our system reduced the number of monosyllable morphemes from 52％ to 29％ of the lexicon and obtained 13.24％ WER for anchor and 24.97％ for reporter.
PDF

Search Result 313, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)