• Title/Summary/Keyword: Google TTS (Google Text-to-Speech)

Search Result 11, Processing Time 0.036 seconds

An end-to-end synthesis method for Korean text-to-speech systems (한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구)

  • Choi, Yeunju;Jung, Youngmoon;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.39-48
    • /
    • 2018
  • A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.

Development of Speech Recognition and Synthetic Application for the Hearing Impairment (청각장애인을 위한 음성 인식 및 합성 애플리케이션 개발)

  • Lee, Won-Ju;Kim, Woo-Lin;Ham, Hye-Won;Yun, Sang-Un
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.129-130
    • /
    • 2020
  • 본 논문에서는 청각장애인의 의사소통을 위한 안드로이드 애플리케이션 시스템 구현 결과를 보인다. 구글 클라우드 플랫폼(Google Cloud Platform)의 STT(Speech to Text) API를 이용하여 음성 인식을 통해 대화의 내용을 텍스트의 형태로 출력한다. 그리고 TTS(Text to Speech)를 이용한 음성 합성을 통해 텍스트를 음성으로 출력한다. 또한, 포그라운드 서비스(Service)에서 가속도계 센서(Accelerometer Sensor)를 이용하여 스마트폰을 2~3회 흔들었을 때 해당 애플리케이션을 실행할 수 있도록 하여 애플리케이션의 활용성을 높인 시스템을 개발하였다.

  • PDF

A Design and Implementation of Speech Recognition and Synthetic Application for Hearing-Impairment

  • Kim, Woo-Lin;Ham, Hye-Won;Yun, Sang-Un;Lee, Won Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.12
    • /
    • pp.105-110
    • /
    • 2021
  • In this paper, we design and implement an Android mobile application that helps hearing impaired people communicate based on STT(Speech-to-Text) and TTS(Text-to-Speech) APIs and accelerometer sensor of a smartphone. This application provides the ability to record what the hearing-Impairment person's interlocutor is saying with a microphone, convert it to text using the STT API, and display it to the hearing-Impairment person. In addition. In addition, when a hearing-impaired person inputs a text using the TTS API, it is converted into voice and told to the interlocutor. When a hearing-impaired person shakes their smartphone, an accelerometer based background service function is provided to run the application. The application implemented in this paper provides a function that allows hearing impaired people to communicate easily with other people when communicating with others without using sign language as a video call.

Implementation of Yoga Posture Training Application Using Google ML Kit (Google ML Kit를 이용한 요가 자세 훈련 애플리케이션 구현)

  • Kim, Hyoung Min;Yoon, Jong Hyeon;Park, Su Hyun;Yu, Yun Seop
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.178-180
    • /
    • 2022
  • An application implementation that allows users to train yoga posture based on the landmark of yoga posture of yoga instructors obtained from the Google Firebase ML Kit was introduced. Using the ML Kit, the user's posture is classified and landmarks corresponding to each joint are obtained. The accuracy measurement reference value for the yoga posture is set through the angle formed by the joints of the obtained landmark. The accuracy between the reference landmark for the yoga posture of professional yoga instructors and the landmark for the user's pose through the ML Kit was compared. According to the accuracy reference value, information on malfunction and correct motion is provided to the user through Text-to-Speech (TTS). Users are managed effectively with Firebase, and a system that displays the amount of exercise through a counter and timer when the user performs an exercise that meets the accuracy reference value was explained.

  • PDF

Designing Voice Interface for The Disabled (장애인을 위한 음성 인터페이스 설계)

  • Choi, Dong-Wook;Lee, Ji-Hoon;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.697-699
    • /
    • 2019
  • IT 기술의 발달에 따라 전자기기의 이용량은 증가하였지만, 시각장애인들이나 지체 장애인들이 이용하는 데에 어려움이 있다. 따라서 본 논문에서는 Google Cloud API를 활용하여 음성으로 프로그램을 제어할 수 있는 음성 인터페이스를 제안한다. Google Cloud에서 제공하는 STT(Speech To Text)와 TTS(Text To Speech) API를 이용하여 사용자의 음성을 인식하면 텍스트로 변환된 음성이 시스템을 통해 응용 프로그램을 제어할 수 있도록 설계한다. 이 시스템은 장애인들이 전자기기를 사용하는데 많은 편리함을 줄 것으로 예상하며 나아가 장애인들뿐 아니라 비장애인들도 활용 가능할 것으로 기대한다.

Implementation of Korean TTS Service on Android OS (안드로이드 OS 기반 한국어 TTS 서비스의 설계 및 구현)

  • Kim, Tae-Guon;Kim, Bong-Wan;Choi, Dae-Lim;Lee, Yong-Ju
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2012
  • Though Android-based smart phones are being released in Korea, Korean TTS engine is not built on them and Google has not announced service or software developer's kit related to Korean TTS officially. Thus, application developers who want to include Korean TTS capability in their application have difficulties. In this paper, we design and implement Android OS-based Korean TTS system and service. For speed, text preprocessing and synthesis libraries are implemented using Android NDK. By using Java's thread mechanism and the AudioTrack class, the response time of TTS is minimized. For the test of implemented service, an application that reads incoming SMS is developed. The test shows that synthesized speech are generated in real-time for random sentences. By using the implemented Korean TTS service, Android application developers can transmit information easily through voice. Korean TTS service proposed and implemented in this paper overcomes shortcomings of the existing restrictive synthesis methods and provides the benefit for application developers and users.

Design and Implementation of a Navigation System for Visually Impaired Persons (시각장애인을 위한 네비게이션 시스템 설계 및 구현)

  • Jang, Su-Min;Hwang, Dong-Gyo;Kang, Soo;Kim, Eun-Ju;Park, Jun-Ho;Jang, Ki-Hun;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.38-47
    • /
    • 2012
  • In order to extend the activity range of visually impaired persons, we design and implement a navigation system that supports road information services and points of interest. The proposed navigation system consists of route creation modules and storage modules for visually impaired persons. In particular, the main interface of the navigation system are implemented using TTS(Text-to-Speech) program for sound and braille module that outputs braille with sense of touch. We also use google map APIs that can provide latest map information for the navigation system.

A Study on Development of Applications which Provides Step-by-step CPR Guidelines and Learning Materials for Non Health-related Person (비보건계열 일반인을 위한 단계별 CPR 가이드라인과 학습자료 제공 어플리케이션 개발 연구)

  • Kim, Jong-Min
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.649-651
    • /
    • 2021
  • In Korea, there are around 30,000 cardiac arrest patients annually. Gradually the number is increasing. Against this background, CPR education and publicity programs were expanded nationwide, but the rate of witness CPR by the general public was 4.4%, which is significantly lower than the 20%~70% rate in other countries. Therefore, in this paper, we analyzed the factors affecting the performance of CPR by witnesses who discovered cardiac arrest patients. Based on the results, an application planning and development study was conducted to provide users with correct cardiorespiratory response tips and step-by-step CPR guidelines to help users effectively assist in increasing the rate of CPR by general eyewitnesses.

  • PDF

HearCAM Embedded Platform Design (히어 캠 임베디드 플랫폼 설계)

  • Hong, Seon Hack;Cho, Kyung Soon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.10 no.4
    • /
    • pp.79-87
    • /
    • 2014
  • In this paper, we implemented the HearCAM platform with Raspberry PI B+ model which is an open source platform. Raspberry PI B+ model consists of dual step-down (buck) power supply with polarity protection circuit and hot-swap protection, Broadcom SoC BCM2835 running at 700MHz, 512MB RAM solered on top of the Broadcom chip, and PI camera serial connector. In this paper, we used the Google speech recognition engine for recognizing the voice characteristics, and implemented the pattern matching with OpenCV software, and extended the functionality of speech ability with SVOX TTS(Text-to-speech) as the matching result talking to the microphone of users. And therefore we implemented the functions of the HearCAM for identifying the voice and pattern characteristics of target image scanning with PI camera with gathering the temperature sensor data under IoT environment. we implemented the speech recognition, pattern matching, and temperature sensor data logging with Wi-Fi wireless communication. And then we directly designed and made the shape of HearCAM with 3D printing technology.

Development of Korean-to-English and English-to-Korean Mobile Translator for Smartphone (스마트폰용 영한, 한영 모바일 번역기 개발)

  • Yuh, Sang-Hwa;Chae, Heung-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.3
    • /
    • pp.229-236
    • /
    • 2011
  • In this paper we present light weighted English-to-Korean and Korean-to-English mobile translators on smart phones. For natural translation and higher translation quality, translation engines are hybridized with Translation Memory (TM) and Rule-based translation engine. In order to maximize the usability of the system, we combined an Optical Character Recognition (OCR) engine and Text-to-Speech (TTS) engine as a Front-End and Back-end of the mobile translators. With the BLEU and NIST evaluation metrics, the experimental results show our E-K and K-E mobile translation equality reach 72.4% and 77.7% of Google translators, respectively. This shows the quality of our mobile translators almost reaches the that of server-based machine translation to show its commercial usefulness.