Search | Korea Science

Implementation of Korean TTS Service on Android OS (안드로이드 OS 기반 한국어 TTS 서비스의 설계 및 구현)

Kim, Tae-Guon;Kim, Bong-Wan;Choi, Dae-Lim;Lee, Yong-Ju
- The Journal of the Korea Contents Association
- /
- v.12 no.1
- /
- pp.9-16
- /
- 2012
Though Android-based smart phones are being released in Korea, Korean TTS engine is not built on them and Google has not announced service or software developer's kit related to Korean TTS officially. Thus, application developers who want to include Korean TTS capability in their application have difficulties. In this paper, we design and implement Android OS-based Korean TTS system and service. For speed, text preprocessing and synthesis libraries are implemented using Android NDK. By using Java's thread mechanism and the AudioTrack class, the response time of TTS is minimized. For the test of implemented service, an application that reads incoming SMS is developed. The test shows that synthesized speech are generated in real-time for random sentences. By using the implemented Korean TTS service, Android application developers can transmit information easily through voice. Korean TTS service proposed and implemented in this paper overcomes shortcomings of the existing restrictive synthesis methods and provides the benefit for application developers and users.
https://doi.org/10.5392/JKCA.2012.12.01.009 인용 PDF KSCI

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

Kim, Kwang Hyeon;Kwon, Chul Hong
- The Journal of the Convergence on Culture Technology
- /
- v.8 no.3
- /
- pp.469-475
- /
- 2022
To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.
https://doi.org/10.17703/JCCT.2022.8.3.469 인용 PDF KSCI

A Korean TTS System for Educational Purpose (교육용 한국어 TTS 플랫폼 개발)

Lee Jungchul;Lee Sangho
- MALSORI
- /
- no.50
- /
- pp.41-50
- /
- 2004
Recently, there has been considerable progress in the natural language processing and digital signal processing components and this progress has led to the improved synthetic speech qualify of many commercial TTS systems. But there still remain many obstacles to overcome for the practical application of TTS. To resolve the problems, the cooperative research among the related areas is highly required and a common Korean TTS platform is essential to promote these activities. This platform offers a general framework for building Korean speech synthesis systems and a full C/C++ source for modules supports to implement and test his own algorithm. In this paper we described the aspect of a Korean TTS platform to be developed and a developing plan.
PDF

Electrophysiologic Characteristics of Combined Idiopathic Carpal Tunnel Syndrome and Tarsal Tunnel Syndrome (동반이환된 특발성 수근관증후군과 족근관증후군의 전기생리학적 특징)

Kim, Sung-Hyouk;Yang, Ji-Won;Sung, Young-Hee;Park, Kee-Hyung;Park, Hyeon-Mi;Shin, Dong-Jin;Lee, Yeong-Bae
- Annals of Clinical Neurophysiology
- /
- v.13 no.1
- /
- pp.31-37
- /
- 2011
Background: Carpal tunnel syndrome (CTS) and tarsal tunnel syndrome (TTS) are thought to share a similar pathophysiology, compression of the median and plantar nerve by the carpal tunnel and flexor retinaculum. A few reports introduced the relationship between idiopathic CTS and TTS without definite evidence of coexistence. The current study was designed to analyze the electrophysiologic characteristics of combined idiopathic CTS and TTS by comparing with each idiopathic CTS or TTS. Methods: We retrospectively collected patients with combined idiopathic CTS and TTS (CTS-TTS group) from June 2001 to February 2009. Patients with each idiopathic CTS or TTS were collected as controls. Electrophysiologic data of median and plantar nerves were compared between CTS-TTS group and controls. Results: CTS-TTS group was composed of 31 patients. Control group of each CTS or TTS were 50 CTS and 49 TTS patients. In comparison of median nerve conduction study between CTS-TTS group and CTS control group, decreased compound muscle action potential amplitude (p<0.001), decreased median sensory nerve action potential amplitude (p<0.001) and sensory nerve conduction velocity at finger stimulation (p=0.013) were prominent in CTS-TTS group. Decreased medial plantar sensory nerve action potential amplitude (p=0.034) was indicated when CTS-TTS groups and TTS control group were compared. Conclusions: If the electrophysiology study of patients with CTS or TTS was suggestive of severe degree of nerve injury, concerns about the possibility of combined CTS and TTS would be helpful.
PDF KSCI

Implementation of text to speech terminal system by distributed database (데이터베이스 분산을 통한 소용량 문자-음성 합성 단말기 구현)

김영길;박창현;양윤기
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.2431-2434
- /
- 2003
In this research, our goal is to realize Korean Distribute TTS system with server/client function in wireless network. The speech databases and some routines of TTS system is stuck with the server which has strong functions and we made Korean speech databases and accomplished research about DB which is suitable for distributed TTS. We designed a terminal has the minimum setting which operate this TTS and designed proper protocol so we will check action of Distributed TTS.
PDF

Implementation of Wideband Waveform Interpolation Coder for TTS DB Compression (TTS DB 압축을 위한 광대역 파형보간 부호기 구현)

Yang, Hee-Sik;Hahn, Min-Soo
- MALSORI
- /
- v.55
- /
- pp.143-158
- /
- 2005
The adequate compression algorithm is essential to achieve high quality embedded TTS system. in this paper, we Propose waveform interpolation coder for TTS corpus compression after many speech coder investigation. Unlike speech coders in communication system, compression rate and anality are more important factors in TTS DB compression than other performance criteria. Thus we select waveform interpolation algorithm because it provides good speech quality under high compression rate at the cost of complexity. The implemented coder has bit rate 6kbps with quality degradation 0.47. The performance indicates that the waveform interpolation is adequate for TTS DB compression with some further study.
PDF

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Electrophysiological Study of Medial Plantar Nerve in Idiopathic Tarsal Tunnel Syndrome (특발성 발목터널 증후군에서 내측 발바닥 신경의 전기 생리학적 검사)

An, Jae Young;Kim, Byoung Joon
- Annals of Clinical Neurophysiology
- /
- v.8 no.2
- /
- pp.146-151
- /
- 2006
Background: Tarsal tunnel syndrome (TTS) is an entrapment neuropathy of the tibial nerve within fibrous tunnel on the medial side of the ankle. The most common cause of TTS is idiopathic. This is a retrospective study to define the electrophysiological characteristics of idiopathic TTS. Methods: We reviewed the medical and electrophysiological records of consecutive patients with foot sensory symptoms referred to electromyography laboratory. Inclusion of patients was based on clinical findings suggestive of TTS. Among them, patients with any other possible causes of sensory symptoms on the foot were excluded. Control data were obtained from 19 age-matched people with no sensory symptoms or signs. Routine motor and sensory nerve conduction study (NCS) including medial plantar nerve (MPN) using surface electrodes were performed. Result: Twenty one patients (13 women, 8 men, 9 unilateral, 12 bilateral) were enrolled to have idiopathic TTS (total 31 feet). Tinel's sign was positive in 16 feet (51.6%) of TTS and four feet (10.5%) in control group. The statistically significant electrophysiological parameter was difference of sensory conduction velocity (SCV) between sural nerve and MPN. Amplitude of sensory nerve action potential and SCV of MPN were not different significantly between idiopathic TTS feet and controls. Conclusion: Bilateral development in idiopathic TTS was more common. Tinel's sign and difference of SCV between sural nerve and MPN may be helpful for the diagnosis of idiopathic TTS.
PDF

Characteristics of directly sputtered AI cathode film using twin target sputtering system for OLEDs

Moon, Jong-Min;Lee, Sang-Hyeon;Kim, Han-Ki
- 한국정보디스플레이학회:학술대회논문집
- /
- 2007.08a
- /
- pp.655-658
- /
- 2007
Characteristics of Al cathode films deposited by using specially designed twin target sputter (TTS) system were investigated. It was found that Al cathode films prepared by TTS were amorphous structure with nanocrystallines due to low substrate temperature and OLEDs fabricated using TTS system have low leakage current density at reverse bias because of effective confinement of energetic particles during sputtering process.
PDF

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

Kwon, Chul Hong
- The Journal of the Convergence on Culture Technology
- /
- v.6 no.2
- /
- pp.509-514
- /
- 2020
The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.
https://doi.org/10.17703/JCCT.2020.6.2.509 인용 PDF KSCI

Search Result 205, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)