Search | Korea Science

The Optimum Fuzzy Vector Quantizer for Speech Synthesis

Lee, Jin-Rhee-;Kim, Hyung-Seuk-;Ko, Nam-kon;Lee, Kwang-Hyung-
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 1993.06a
- /
- pp.1321-1325
- /
- 1993
This paper investigates the use of Fuzzy vector quantizer(FVQ) in speech synthesis. To compress speech data, we employ K-means algorithm to design codebook and then FVQ technique is used to analysize input speech vectors based on the codebook in an analysis part. In FVQ synthesis part, analysis data vectors generated in FVQ analysis is used to synthesize the speech. We have fined that synthesized speech quality depends on Fuzziness values in FVQ, and the optimum fuzziness values maximized synthesized speech SQNR are related with variance values of input speech vectors. This approach is tested on a sentence, and we compare synthesized speech by a convensional VQ with synthesized speech by a FVQ with optimum Fuzziness values.
PDF

Basic consideration for assessment of Korean TTS system (한국어 TTS 시스템의 객관적인 성능평가를 위한 기초검토)

Ko, Lag-Hwan;Kim, Young-Il;Kim, Bong-Wan;Lee, Yong-Ju
- Proceedings of the KSPS conference
- /
- 2005.04a
- /
- pp.37-40
- /
- 2005
Recently due to the rapid development of speech synthesis based on the corpora, the performance of TTS systems, which convert text into speech through synthesis, has enhanced, and they are applied in various fields. However, the procedure for objective assessment of the performance of systems is not well established in Korea. The establishment of the procedure for objective assessment of the performance of systems is essential for the assessment of development systems for the developers and as the standard for choosing the suitable systems for the users. In this paper we will report on the results of the basic research for the establishment of the systematic standard for the procedure of objective assessment of the performance of Korean TTS systems with reference to the various attempts for this project in Korea and other countries.
PDF

Computerization and Application of the Korean Standard Pronunciation Rules (한국어 표준발음법의 전산화 및 응용)

이계영;임재걸
- Language and Information
- /
- v.7 no.2
- /
- pp.81-101
- /
- 2003
This paper introduces a computerized version of the Korean Standard Pronunciation Rules that can be used in speech engineering systems such as Korean speech synthesis and recognition systems. For this purpose, we build Petri net models for each item of the Standard Pronunciation Rules, and then integrate them into the sound conversion table. The reversion of the Korean Standard Pronunciation Rules regulates the way of matching sounds into grammatically correct written characters. This paper presents not only the sound conversion table but also the character conversion table obtained by reversely converting the sound conversion table. Malting use of these tables, we have implemented a Korean character into a sound system and a Korean sound into the character conversion system, and tested them with various data sets reflecting all the items of the Standard Pronunciation Rules to verify the soundness and completeness of our tables. The test results show that the tables improve the process speed in addition to the soundness and completeness.
PDF

A Comparative Study of Voice Activity Detection Algorithms in Adverse Environments (잡음 환경에서의 음성 검출 알고리즘 비교 연구)

Yang Kyong-Chul;Yook Dong-Suk
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.45-48
- /
- 2006
As the speech recognition systems are used in many emerging applications, robust performance of speech recognition systems under extremely noisy conditions become more important. The voice activity detection (VAD) has been taken into account as one of the important factors for robust speech recognition. In this paper, we investigate conventional VAD algorithms and analyze the weak and the strong points of each algorithm.
PDF

The Performance Improvement of Speech Recognition System based on Stochastic Distance Measure

Jeon, B.S.;Lee, D.J.;Song, C.K.;Lee, S.H.;Ryu, J.W.
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.4 no.2
- /
- pp.254-258
- /
- 2004
In this paper, we propose a robust speech recognition system under noisy environments. Since the presence of noise severely degrades the performance of speech recognition system, it is important to design the robust speech recognition method against noise. The proposed method adopts a new distance measure technique based on stochastic probability instead of conventional method using minimum error. For evaluating the performance of the proposed method, we compared it with conventional distance measure for the 10-isolated Korean digits with car noise. Here, the proposed method showed better recognition rate than conventional distance measure for the various car noisy environments.
https://doi.org/10.5391/IJFIS.2004.4.2.254 인용 PDF KSCI

User-customized Interaction using both Speech and Face Recognition (음성인식과 얼굴인식을 사용한 사용자 환경의 상호작용)

Kim, Sung-Ill;Oh, Se-Jin;Lee, Sang-Yong;Hwang, Seung-Gook
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2007.04a
- /
- pp.397-400
- /
- 2007
In this paper, we discuss the user-customized interaction for intelligent home environments. The interactive system is based upon the integrated techniques using both speech and face recognition. For essential modules, the speech recognition and synthesis were basically used for a virtual interaction between user and proposed system. In experiments, particularly, the real-time speech recognizer based on the HM-Net(Hidden Markov Network) was incorporated into the integrated system. Besides, the face identification was adopted to customize home environments for a specific user. In evaluation, the results showed that the proposed system was easy to use for intelligent home environments, even though the performance of the speech recognizer did not show a satisfactory results owing to the noisy environments.
PDF

Education System to Learn the Skills of Management Decision-Making by Using Business Simulator with Speech Recognition Technology

Sakata, Daiki;Akiyama, Yusuke;Kaneko, Masaaki;Kumagai, Satoshi
- Industrial Engineering and Management Systems
- /
- v.13 no.3
- /
- pp.267-277
- /
- 2014
In this paper, we propose an educational system that involves a business game simulator and related curriculum. To develop these two elements, we examined the decision-making process related to business management and identified some significant skills thereby. In addition, we created an original simulator, named BizLator (http://bizlator.com), to help students develop these skills efficiently. Next, we developed a curriculum suitable for the simulator. We confirmed the effectiveness of the simulator and curriculum in a business-game-based class at Aoyama Gakuin University in Tokyo. On the basis of this, we compared our education system with a conventional system. This allowed us to identify advantages of and issues with our proposed system. Furthermore, we proposed a speech recognition support system named BizVoice in order to provide the teachers with more meaningful feedback, such as level of students' understanding. Concretely, BizVocie fetches students' speech of discussion during the game and converts the voice data to text data with speech recognition technology. Finally, teachers can grasp students' parameters of understanding, and thereby, the students also can take more effective class using BizLator. We also confirmed the effectiveness of the system in the class of Aoyama Gakuin Universiry.
https://doi.org/10.7232/iems.2014.13.3.267 인용 PDF KSCI

A Multimodal Emotion Recognition Using the Facial Image and Speech Signal

Go, Hyoun-Joo;Kim, Yong-Tae;Chun, Myung-Geun
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.5 no.1
- /
- pp.1-6
- /
- 2005
In this paper, we propose an emotion recognition method using the facial images and speech signals. Six basic emotions including happiness, sadness, anger, surprise, fear and dislike are investigated. Facia] expression recognition is performed by using the multi-resolution analysis based on the discrete wavelet. Here, we obtain the feature vectors through the ICA(Independent Component Analysis). On the other hand, the emotion recognition from the speech signal method has a structure of performing the recognition algorithm independently for each wavelet subband and the final recognition is obtained from the multi-decision making scheme. After merging the facial and speech emotion recognition results, we obtained better performance than previous ones.
https://doi.org/10.5391/IJFIS.2005.5.1.001 인용 PDF KSCI

Pattern Recognition Methods for Emotion Recognition with speech signal

Park Chang-Hyun;Sim Kwee-Bo
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.6 no.2
- /
- pp.150-154
- /
- 2006
In this paper, we apply several pattern recognition algorithms to emotion recognition system with speech signal and compare the results. Firstly, we need emotional speech databases. Also, speech features for emotion recognition are determined on the database analysis step. Secondly, recognition algorithms are applied to these speech features. The algorithms we try are artificial neural network, Bayesian learning, Principal Component Analysis, LBG algorithm. Thereafter, the performance gap of these methods is presented on the experiment result section.
https://doi.org/10.5391/IJFIS.2006.6.2.150 인용 PDF KSCI

Consecutive Vowel Segmentation of Korean Speech Signal using Phonetic-Acoustic Transition Pattern (음소 음향학적 변화 패턴을 이용한 한국어 음성신호의 연속 모음 분할)

Park, Chang-Mok;Wang, Gi-Nam
- Proceedings of the Korea Information Processing Society Conference
- /
- 2001.10a
- /
- pp.801-804
- /
- 2001
This article is concerned with automatic segmentation of two adjacent vowels for speech signals. All kinds of transition case of adjacent vowels can be characterized by spectrogram. Firstly the voiced-speech is extracted by the histogram analysis of vowel indicator which consists of wavelet low pass components. Secondly given phonetic transcription and transition pattern spectrogram, the voiced-speech portion which has consecutive vowels automatically segmented by the template matching. The cross-correlation function is adapted as a template matching method and the modified correlation coefficient is calculated for all frames. The largest value on the modified correlation coefficient series indicates the boundary of two consecutive vowel sounds. The experiment is performed for 154 vowel transition sets. The 154 spectrogram templates are gathered from 154 words(PRW Speech DB) and the 161 test words(PBW Speech DB) which are uttered by 5 speakers were tested. The experimental result shows the validity of the method.
PDF

Search Result 105, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)