Search | Korea Science

A Study on the Voice Conversion with HMM-based Korean Speech Synthesis (HMM 기반의 한국어 음성합성에서 음색변환에 관한 연구)

Kim, Il-Hwan;Bae, Keun-Sung
- MALSORI
- /
- v.68
- /
- pp.65-74
- /
- 2008
A statistical parametric speech synthesis system based on the hidden Markov models (HMMs) has grown in popularity over the last few years, because it needs less memory and low computation complexity and is suitable for the embedded system in comparison with a corpus-based unit concatenation text-to-speech (TTS) system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately. In this paper, we present experimental results of voice characteristics conversion using the HMM-based Korean speech synthesis system. The results have shown that conversion of voice characteristics could be achieved using a few sentences uttered by a target speaker. Synthetic speech generated from adapted models with only ten sentences was very close to that from the speaker dependent models trained using 646 sentences.
PDF

The Comparison of Characteristics in various Speaker Adaptation Methods (여러 화자 적응 방법들의 특성 비교)

황영수
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.339-342
- /
- 1998
In this paper, we proposed various speaker adaptation methods and studied the performance of these methods. Methods which were studied in this paper are MAPE(Maximum A Posteriori Probability Estimation), ARTMAP. In order to evaluate the performance of these methods, we used Korean isolated digits as the experimental data, the hybrid speaker adaptation method, which unfied MAPE, linear spectral estimating and outpur probability of SCHMM, showed the better recognition result than those which performed other methods. And the method using ARTMAP showed the similar result to above hybrid method.
PDF

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

LED Driving Circuit Design of Ultrasonic Speaker System for Sign Board (싸인 보드용 초음파 스피커 상태표시를 위한 LED 구동 회로의 설계)

Lee, Kyung-Ryang;Yeo, Sung-Dae;Jang, Young-Jin;Cha, Jae-Sang;Kim, Jin-Tae;Shin, Jae-Kwon;Kim, Seong-Kweon
- Journal of Satellite, Information and Communications
- /
- v.8 no.4
- /
- pp.17-20
- /
- 2013
In this study, we introduce an LED Driving circuit in order that the information state can indicate audio signal gain and radiate pattern of ultrasonic speaker system for a sign board. Ultrasonic speaker system decreases energy loss and transmits the sound farther. Ultrasonic speaker having such characteristics is useful in that it can be widely used in daily life. Additionally, Proposed LED circuit indicates the information state as linear LED brightness taken from interface of ultrasonic speaker system. Designed circuit is confirmed through $0.35{\mu}m$ CMOS process by Dong-bu.
PDF KSCI

A study on the robust speaker recognition algorithm in noise surroundings (주변 잡음 환경에 강한 화자인식 알고리즘 연구)

Jung Jong-Soon
- Journal of the Korea Society of Computer and Information
- /
- v.10 no.6 s.38
- /
- pp.47-54
- /
- 2005
In the most of speaker recognition system, speaker's characteristics is extracted from acoustic parameter by speech analysis and we make speaker's reference pattern. Parameters used in speaker recognition system are desirable expressing speaker's characteristics fully and being a few difference whenever it is spoken. Therefore we su99est following to solve this problem. This paper is proposed to use strong spectrum characteristic in non-noise circumstance and prosodic information in noise circumstance. In a stage of making code book, we make the number of data we need to combine spectrum characteristic and Prosodic information. We decide acceptance or rejection comparing test pattern and each model distance. As a result, we obtained more improved recognition rate than we use spectrum and prosodic information especially we obtained stational recognition rate in noise circumstance.
PDF

Investigation on Vibration Characteristics of Micro Speaker Diaphragms for Various Shape Designs (마이크로 스피커 진동판의 형상설계에 따른 진동특성 고찰)

Kim, Kyeong Min;Kim, Seong Keol;Park, Keun
- Journal of the Korean Society for Precision Engineering
- /
- v.30 no.8
- /
- pp.790-796
- /
- 2013
Micro-speaker diaphragms play an important role in generating a desired audio response. The diaphragm is generally a circular membrane, and the cross section is a double dome, with an inner dome and an outer dome. To improve the sound quality of the speaker, a number of corrugations may be included in the outer dome region. In this study, the role of these corrugations is investigated using two kinds of finite element method (FEM) calculations. Structural FEM modeling was carried out to investigate the change in stiffness of the diaphragm when the corrugations were included. Modal FEM modeling was then carried out to compare the natural frequencies and the resulting vibrational modes of the plain and corrugated diaphragms. The effects of the corrugations on the vibration characteristics of the diaphragm are discussed.
https://doi.org/10.7736/KSPE.2013.30.8.790 인용 PDF KSCI

A Study on the Intention to Use AI Speakers: focusing on extended technology acceptance model (인공지능(AI)스피커 사용의도에 관한 연구: 확장된 기술수용모델을 중심으로)

Kim, Bae Sung;Woo, Hyung Jin
- The Journal of the Korea Contents Association
- /
- v.19 no.9
- /
- pp.1-10
- /
- 2019
The purpose of this study is to investigate the influence of exogenous variables on the intention to use AI speaker. An online survey was administrated to 305 AI speaker users in order to examine the effect of the personal characteristics (self-efficacy, innovativeness, suitability, and enjoyment) and social impact (social conformity and social image) on perceived usefulness and easiness. The results indicate that (1) self-efficacy and social conformity have positively effect on perceived easiness; (2) suitability and social image have positively effect on perceived usefulness whereas innovativeness has negatively effect on perceived usefulness; (3) perceived usefulness and perceived easiness have significant effect on the intention to use AI speaker.
https://doi.org/10.5392/JKCA.2019.19.09.001 인용 PDF KSCI

Dynamic Characteristic Analysis and Transfer Function Estimate of Acoustic System for Transformer Noise Control (변압기 소음제어를 위한 음향 시스템의 동특성 해석 및 전달함수 추정)

김영달;정창경;심재명
- Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
- /
- v.13 no.3
- /
- pp.17-24
- /
- 1999
This paper presents a method of ANC for transfonrer noise control utilizing a sproker and microphone pair. In this study, the main focus is on identifying the dynamic characteristics of speaker - amplifier microphone path. This study presents a theoretical method to identify the dynamic characteristics of speaker-microphone pairs. The transfer functions of microphone - speaker pair have been estimated utilizing sequential least square(SLS) algorithm. We identified the estimated transfer function has stable JXlles and zeros in z-plane. This paper also propose an architecture far the noise cancellation to which we applied the estimated transfer function.nction.
PDF

Selective Adaptation of Speaker Characteristics within a Subcluster Neural Network

Haskey, S.J.;Datta, S.
- Proceedings of the KSPS conference
- /
- 1996.10a
- /
- pp.464-467
- /
- 1996
This paper aims to exploit inter/intra-speaker phoneme sub-class variations as criteria for adaptation in a phoneme recognition system based on a novel neural network architecture. Using a subcluster neural network design based on the One-Class-in-One-Network (OCON) feed forward subnets, similar to those proposed by Kung (2) and Jou (1), joined by a common front-end layer. the idea is to adapt only the neurons within the common front-end layer of the network. Consequently resulting in an adaptation which can be concentrated primarily on the speakers vocal characteristics. Since the adaptation occurs in an area common to all classes, convergence on a single class will improve the recognition of the remaining classes in the network. Results show that adaptation towards a phoneme, in the vowel sub-class, for speakers MDABO and MWBTO Improve the recognition of remaining vowel sub-class phonemes from the same speaker
PDF

Improved speech emotion recognition using histogram equalization and data augmentation techniques (히스토그램 등화와 데이터 증강 기법을 이용한 개선된 음성 감정 인식)

Heo, Woon-Haeng;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.9 no.2
- /
- pp.77-83
- /
- 2017
We propose a new method to reduce emotion recognition errors caused by variation in speaker characteristics and speech rate. Firstly, for reducing variation in speaker characteristics, we adjust features from a test speaker to fit the distribution of all training data by using the histogram equalization (HE) algorithm. Secondly, for dealing with variation in speech rate, we augment the training data with speech generated in various speech rates. In computer experiments using EMO-DB, KRN-DB and eNTERFACE-DB, the proposed method is shown to improve weighted accuracy relatively by 34.7%, 23.7% and 28.1%, respectively.
https://doi.org/10.13064/KSSS.2017.9.2.077 인용 PDF KSCI

Search Result 255, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)