• Title/Summary/Keyword: TTS system

Search Result 145, Processing Time 0.029 seconds

The implementation of database for high quality Embedded Text-to-speech system (고품질 내장형 음성합성 시스템을 위한 음성합성 DB구현)

  • Kwon, Oh-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.4 s.304
    • /
    • pp.103-110
    • /
    • 2005
  • Speech Database is one of the most important part of Text-to-speech(TTS) system Especially, the embedded TTS system needs more small size of database than that of the server TTS system So, the compression and statistical reduction or database is a very important factor in the embedded TTS system But this compression and statistical reduction of database always rise a loss of quality of the synthesised speech. In this paper, we propose a method of constructing database for high quality embedded TTS system and verify the quality of synthesised speech with MOS(Mean Opinion Score) test.

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

  • Kim, Kwang Hyeon;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.469-475
    • /
    • 2022
  • To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.

Implementation of text to speech terminal system by distributed database (데이터베이스 분산을 통한 소용량 문자-음성 합성 단말기 구현)

  • 김영길;박창현;양윤기
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2431-2434
    • /
    • 2003
  • In this research, our goal is to realize Korean Distribute TTS system with server/client function in wireless network. The speech databases and some routines of TTS system is stuck with the server which has strong functions and we made Korean speech databases and accomplished research about DB which is suitable for distributed TTS. We designed a terminal has the minimum setting which operate this TTS and designed proper protocol so we will check action of Distributed TTS.

  • PDF

Implementation of Wideband Waveform Interpolation Coder for TTS DB Compression (TTS DB 압축을 위한 광대역 파형보간 부호기 구현)

  • Yang, Hee-Sik;Hahn, Min-Soo
    • MALSORI
    • /
    • v.55
    • /
    • pp.143-158
    • /
    • 2005
  • The adequate compression algorithm is essential to achieve high quality embedded TTS system. in this paper, we Propose waveform interpolation coder for TTS corpus compression after many speech coder investigation. Unlike speech coders in communication system, compression rate and anality are more important factors in TTS DB compression than other performance criteria. Thus we select waveform interpolation algorithm because it provides good speech quality under high compression rate at the cost of complexity. The implemented coder has bit rate 6kbps with quality degradation 0.47. The performance indicates that the waveform interpolation is adequate for TTS DB compression with some further study.

  • PDF

Characteristics of directly sputtered AI cathode film using twin target sputtering system for OLEDs

  • Moon, Jong-Min;Lee, Sang-Hyeon;Kim, Han-Ki
    • 한국정보디스플레이학회:학술대회논문집
    • /
    • 2007.08a
    • /
    • pp.655-658
    • /
    • 2007
  • Characteristics of Al cathode films deposited by using specially designed twin target sputter (TTS) system were investigated. It was found that Al cathode films prepared by TTS were amorphous structure with nanocrystallines due to low substrate temperature and OLEDs fabricated using TTS system have low leakage current density at reverse bias because of effective confinement of energetic particles during sputtering process.

  • PDF

Implementation of Korean TTS Service on Android OS (안드로이드 OS 기반 한국어 TTS 서비스의 설계 및 구현)

  • Kim, Tae-Guon;Kim, Bong-Wan;Choi, Dae-Lim;Lee, Yong-Ju
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2012
  • Though Android-based smart phones are being released in Korea, Korean TTS engine is not built on them and Google has not announced service or software developer's kit related to Korean TTS officially. Thus, application developers who want to include Korean TTS capability in their application have difficulties. In this paper, we design and implement Android OS-based Korean TTS system and service. For speed, text preprocessing and synthesis libraries are implemented using Android NDK. By using Java's thread mechanism and the AudioTrack class, the response time of TTS is minimized. For the test of implemented service, an application that reads incoming SMS is developed. The test shows that synthesized speech are generated in real-time for random sentences. By using the implemented Korean TTS service, Android application developers can transmit information easily through voice. Korean TTS service proposed and implemented in this paper overcomes shortcomings of the existing restrictive synthesis methods and provides the benefit for application developers and users.

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.509-514
    • /
    • 2020
  • The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.

An end-to-end synthesis method for Korean text-to-speech systems (한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구)

  • Choi, Yeunju;Jung, Youngmoon;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.39-48
    • /
    • 2018
  • A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.

A Korean TTS System for Educational Purpose (교육용 한국어 TTS 플랫폼 개발)

  • Lee Jungchul;Lee Sangho
    • MALSORI
    • /
    • no.50
    • /
    • pp.41-50
    • /
    • 2004
  • Recently, there has been considerable progress in the natural language processing and digital signal processing components and this progress has led to the improved synthetic speech qualify of many commercial TTS systems. But there still remain many obstacles to overcome for the practical application of TTS. To resolve the problems, the cooperative research among the related areas is highly required and a common Korean TTS platform is essential to promote these activities. This platform offers a general framework for building Korean speech synthesis systems and a full C/C++ source for modules supports to implement and test his own algorithm. In this paper we described the aspect of a Korean TTS platform to be developed and a developing plan.

  • PDF

Twin Target Sputtering System with Ladder Type Magnet Array for Direct Al Cathode Sputtering on Organic Light Emitting Diodes

  • Moon, Jong-Min;Kim, Han-Ki
    • Journal of Information Display
    • /
    • v.8 no.3
    • /
    • pp.5-10
    • /
    • 2007
  • Twin target sputtering (TTS) system with a configuration of vertically parallel facing Al targets and a substrate holder perpendicular to the Al target plane has been designed to realize a direct Al cathode sputtering on organic light emitting diodes (OLEDs). The TTS system has a linear twin target gun with ladder type magnet array for effective and uniform confinement of high density plasma. It is shown that OLEDs with Al cathode deposited by the TTS show a relatvely lower leakage current density $({\sim}1{\times}10^{-5}mA/cm^2)$ at reverse bias of -6V, compared to that ($1{\times}10^{-2}{\sim}10^{-3}$ $mA/cm^2$ at -6V) of OLEDs with Al cathodes grown by conventional DC magnetron sputtering. In addition, it was found that Al cathode films prepared by TTS were amorphous structure with nanocrystallines due to low substrate temperature. This demonstrates that there is no plasma damage caused by the bombardment of energetic particles. This indicates that the TTS system with ladder type magnet array could be useful plasma damage free deposition technique for direct Al cathode sputtering on OLEDs or flexible OLEDs.