Search | Korea Science

The Smoothing Method of the Concatenation Parts in Speech Waveform by using the Forward/Backward LPC Technique (전, 후방향 LPC법에 의한 음성 파형분절의 연결부분 스므딩법)

이미숙
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1991.06a
- /
- pp.15-20
- /
- 1991
In a text-to-speech system, sound units (e. q., phonemes, words, or phrases) can be concatenated together to produce required utterance. The quality of the resulting speech is dependent on factors including the phonological/prosodic contour, the quality of basic concatenation units, and how well the units join together. Thus although the quality of each basic sound unit is high, if occur the discontinuity in the concatenation part then the quality of synthesis speech is decrease. To solve this problem, a smoothing operation should be carried out in concatenation parts. But a major problem is that, as yet, no method of parameter smoothing is availalbe for joining the segment together.
PDF

A Study on Objective Quality Assessment of Synthesized Speech by Rule (규칙 합성음의 객관적 품질평가에 관한 연구)

홍진우
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1991.06a
- /
- pp.67-72
- /
- 1991
This paper evaluates thequality of synthesized speech by rule using the LPC CD in the objective measure and then compares the result with the subjective analysis. By evaluating the quality of synthesized speech by rule objectively. We have tried to resolve the problems (Evaluation time or size expansion, variables within the analysis results) that arise when the evaluation is done subjectively. Also by comparing intelligibility-the index for the subjective quality evaluation of synthesized speech by rule-with evaluation results obtained using MOS and the objective evaluation. We have proved the validity of the objective analysis and thus provides a guide that would be useful when R&D and marketing of synthesis by rule method is done.
PDF

A Word List Construction and Measurement Method for Intelligibility Assessment of Synthesized Speech by Rule (규칙 합성음의 이해성 평가를 위한 단어표 구성 및 실험법)

김성한;홍진우;김순협
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.29B no.1
- /
- pp.43-49
- /
- 1992
As a result of recent progress in speech synthesis techniques, the those new services using new techniques are going to introduce into the telephone communication system. In setting standards, voice quality is obviously an important criterion. It is very important to develope a quality evaluation method of synthesized speech for the diagnostic assessment of system algorithm, and fair comparison of assessment values. This paper has described several basic concepts and criterions for quality assessment (intelligibility) of synthesized speech by rule, and then a word selection method and the word list to be used in word intelligibility test were proposed. Finally, a test method for word intelligibility is described.
PDF

A Study on Extraction of Pitch and TSIUVC in Continuous Speech (연속음성신호에서 피치와 TSIUVC 추출에 관한 연구)

Lee See-Woo
- Journal of Internet Computing and Services
- /
- v.6 no.4
- /
- pp.85-92
- /
- 2005
In this paper, I propose a new extraction method Pitch Pulse and TSIUVC in continuous speech, The TSIUVC searching and extraction method is based on a zero-crossing rate and individual Pitch Pulse extraction method using FIR-STREAK filter. As a result, the extraction rate of individual pitch pulses was $96{\%}$ for male voice and $85{\%}$ for female voice respectively. The TSIUVC extraction rates are $94.9{\%}$ under $88{\%}$ for male voice and $94.9{\%}$ under $84.8{\%}$ for female voice. This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis and speech synthesis.
PDF

Virtual Interaction based on Speech Recognition and Fingerprint Verification (음성인식과 지문식별에 기초한 가상 상호작용)

Kim Sung-Ill;Oh Se-Jin;Kim Dong-Hun;Lee Sang-Yong;Hwang Seung-Gook
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2006.05a
- /
- pp.192-195
- /
- 2006
In this paper, we discuss the user-customized interaction for intelligent home environments. The interactive system is based upon the integrated techniques using speech recognition and fingerprint verification. For essential modules, the speech recognition and synthesis were basically used for a virtual interaction between the user and the proposed system. In experiments, particularly, the real-time speech recognizer based on the HM-Net(Hidden Markov Network) was incorporated into the integrated system. Besides, the fingerprint verification was adopted to customize home environments for a specific user. In evaluation, the results showed that the proposed system was easy to use for intelligent home environments, even though the performance of the speech recognizer was not better than the simulation results owing to the noisy environments
PDF

Dialogic Male Voice Triphone DB Construction (남성 음성 triphone DB 구축에 관한 연구)

Kim, Yu-Jin;Baek, Sang-Hoon;Han, Min-Soo;Chung, Jae-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.2
- /
- pp.61-71
- /
- 1996
In this paper, dialogic triphone data base construction for triphone synthesis system is discussed. Particularly, in this work, dialogic speech data is collected from the broadcast media, and three different transcription steps are taken. Total 10 hours of speech data are collected. Among them, six hours of speech data are used for the triphone data base construction, and the rest four hours of data are reserved. Dialogic speech data base construction is far different from the reciting speech data base construction. This paper describes various steps that necessary for the dialogic triphone data base construction from collecting speech data to triphone unit labeling.
PDF

An Implementation of Speech DB Gathering System Using VoiceXML (VoiceXML을 이용한 음성 DB 수집 시스템 구현)

Kim Dong-Hyun;Roh Yong-Wan;Hong Kwang-Seok
- Journal of Internet Computing and Services
- /
- v.6 no.1
- /
- pp.39-50
- /
- 2005
Speech DB is basically required factor when we are study for phonetics, speech recognition and speech synthesis and so on. The quantity and quality of speech DB decide the efficiency of system that we develop. therefore. speech DB has an extremely important factor, Recently, development of the various telephone service technique such as voice portal. it is actual condition where the necessity of collection of telephone speech DB. The existing IVR application telephone speech DB collection system used C/C++ language or the exclusive development tool. Thus it is the actual condition where the recycle of each application service for resources is difficult and have a problem of many labors and time necessity. But. VoiceXML is a language having tag form ipredicated in XML. which has easy and simple grammar system. Therefore, if we make a few efforts we could draw up easily. it has a merit reducing labors and time, Also, VoiceXML has many advantages of various telephone speech DB gathering because of changing contents of DB. In this paper, we introduce telephone speech DB gathering system which is the mast important factor for development of speech information processing technique.
PDF

Measurement of the vocal tract area of vowels By MRI and their synthesis by area variation (MRI에 의한 모음의 성도 단면적 측정 및 면적 변이에 따른 합성 연구)

Yang, Byung-Gon
- Speech Sciences
- /
- v.4 no.1
- /
- pp.19-34
- /
- 1998
The author collected and compared midsagittal, coronal, coronal oblique, and transversal images of Korean monophthongs /a, i, e, o, u, i, v/ produced by a healthy male speaker using 1.5 T MR, VISION. Area was measured by computer software after tracing the cross-section at different points along the tract. Results showed that the width of the oral and pharyngeal cavities varied compensatorily from each other on the midsagittal dimension. Formant frequency values estimated from the area functions of the seven vowels showed a strong correlation (r=0.978) with those analyzed from the spoken vowels. Moreover, almost all of 35 students who listened to the synthesized vowels from area data perceived the synthesized vowels as equivalent to the spoken ones. Movement of constriction points of vowel /u/ with wider lip opening sounded /i/ and led to slight changes in vowel quality. Jaw and tongue movement led to major volume variation with an anatomical limitation. Each comer vowel varied systematically from a somewhat constant volume of the average area. Thus, the author proposed that any simulation studies related to vocal tract area variation should reflect its constant volume. The results may be helpful to verify exact measurement of the vocal tract area through vowel synthesis and a simulation study before having any operation of the vocal tract.
PDF

Overlap and Add Sinusoidal Synthesis Method of Speech Signal using Amplitude-weighted Phase Error Function (정현파 크기로 가중치 된 위상 오류 함수를 사용한 음성의 중첩합산 정현파 합성 방법)

Park, Jong-Bae;Kim, Gyu-Jin;Hyeok, Jeong-Gyu;Kim, Jong-Hark;Lee, In-Sung
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.32 no.12C
- /
- pp.1149-1155
- /
- 2007
In this paper, we propose a new overlap and add speech synthesis method which demonstrates improved continuity performance. The proposed method uses a weighted phase error function and minimizes the wave discontinuity of the synthesis signal, rather than the phase discontinuity, to estimate the mid-point phase. Experimental results show that the proposed method improves the continuity between the synthesized signals relative to the existing method.
PDF KSCI

Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation

Kwon, Hye-Jeong;Kim, Min-Jeong;Baek, Ji-Won;Chung, Kyungyong
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.2
- /
- pp.713-725
- /
- 2022
Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.
https://doi.org/10.3837/tiis.2022.02.018 인용 PDF KSCI HTML

Search Result 381, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)