• Title/Summary/Keyword: TTS(Text-to-Speech)

Search Result 139, Processing Time 0.031 seconds

Implementation of Yoga Posture Training Application Using Google ML Kit (Google ML Kit를 이용한 요가 자세 훈련 애플리케이션 구현)

  • Kim, Hyoung Min;Yoon, Jong Hyeon;Park, Su Hyun;Yu, Yun Seop
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.178-180
    • /
    • 2022
  • An application implementation that allows users to train yoga posture based on the landmark of yoga posture of yoga instructors obtained from the Google Firebase ML Kit was introduced. Using the ML Kit, the user's posture is classified and landmarks corresponding to each joint are obtained. The accuracy measurement reference value for the yoga posture is set through the angle formed by the joints of the obtained landmark. The accuracy between the reference landmark for the yoga posture of professional yoga instructors and the landmark for the user's pose through the ML Kit was compared. According to the accuracy reference value, information on malfunction and correct motion is provided to the user through Text-to-Speech (TTS). Users are managed effectively with Firebase, and a system that displays the amount of exercise through a counter and timer when the user performs an exercise that meets the accuracy reference value was explained.

  • PDF

A New Vocoder based on AMR 7.4Kbit/s Mode for Speaker Dependent System (화자 의존 환경의 AMR 7.4Kbit/s모드에 기반한 보코더)

  • Min, Byung-Jae;Park, Dong-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.9C
    • /
    • pp.691-696
    • /
    • 2008
  • A new vocoder of Code Excited Linear Predictive (CELP) based on Adaptive Multi Rate (AMR) 7.4kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM(Outgoing message) and TTS(Text To Speech), which needs only one person's speech. In order to enhance the compression rate of a coder, a new Line Spectral Pairs(LSP) code-book is employed by using Centroid Neural Network (CNN) algorithm. In comparison with original(traditional) AMR 7.4 Kbit/s coder, the new coder shows 27% higher compression rate while preserving synthesized speech quality in terms of Mean Opinion Score(MOS).

Improvement of Naturalness for a HMM-based Korean TTS using the prosodic boundary information (운율경계정보를 이용한 HMM기반 한국어 TTS 자연성 향상 연구)

  • Lim, Gi-Jeong;Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.9
    • /
    • pp.75-84
    • /
    • 2012
  • HMM-based Text-to-Speech systems generally utilize context dependent tri-phone units from a large corpus speech DB to enhance the synthetic speech. To downsize a large corpus speech DB, acoustically similar tri-phone units are clustered based on the decision tree using context dependent information. Context dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, intonation pattern, and segmental duration. However, if the prosodic information was complicated, many context dependent phonemes would have no examples in the training data, and clustering would provide a smoothed feature which will generate unnatural synthetic speech. In this paper, instead of complicate prosodic information we propose a simple three prosodic boundary types and decision tree questions that use rising tone, falling tone, and monotonic tone to improve naturalness. Experimental results show that our proposed method can improve naturalness of a HMM-based Korean TTS and get high MOS in the perception test.

Primary Study for dialogue based on Ordering Chatbot

  • Kim, Ji-Ho;Park, JongWon;Moon, Ji-Bum;Lee, Yulim;Yoon, Andy Kyung-yong
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.209-214
    • /
    • 2018
  • Today is the era of artificial intelligence. With the development of artificial intelligence, machines have begun to impersonate various human characteristics today. Chatbot is one instance of this interactive artificial intelligence. Chatbot is a computer program that enables to conduct natural conversations with people. As mentioned above, Chatbot conducted conversations in text, but Chatbot, in this study evolves to perform commands based on speech-recognition. In order for Chatbot to perfectly emulate a human dialogue, it is necessary to analyze the sentence correctly and extract appropriate response. To accomplish this, the sentence is classified into three types: objects, actions, and preferences. This study shows how objects is analyzed and processed, and also demonstrates the possibility of evolving from an elementary model to an advanced intelligent system. By this study, it will be evaluated that speech-recognition based Chatbot have improved order-processing time efficiency compared to text based Chatbot. Once this study is done, speech-recognition based Chatbot have the potential to automate customer service and reduce human effort.

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

  • 이기승;김정수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.135-144
    • /
    • 2003
  • A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

Implementation of Text-to-Speech System using ABS/OLA Sinusoidal Model (ABS/OLA Sinusoidal 모델을 이용한 문서-음성 변환시스템의 구현)

  • Bae Jae-Hyun;Byeon Heo-Jin;Oh Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.17-20
    • /
    • 1999
  • 본 논문에서는 중첩 가산 Sinusoidal 합성방식에서 위상계승에 의한 단위음 연결법과 다프레임간 정현파 크기의 보간법을 제안한다. 그리고 합성 프레임의 중심이 pitch onset time이라고 가정하고, 음성에서 분리한 성도 모델의 위상을 음성 전체의 위상으로 사용하는 방법을 제안한다. 제안한 방법으로 문서-음성 변환 시스템 (Text-to-Speech System, TTS System)을 구현한 결과 단위음 연결시 연결부분의 파형 왜곡이 감소함을 알 수 있었고, 부드럽게 연결된 합성음을 얻을 수 있었다.

  • PDF

Designing Voice Interface for The Disabled (장애인을 위한 음성 인터페이스 설계)

  • Choi, Dong-Wook;Lee, Ji-Hoon;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.697-699
    • /
    • 2019
  • IT 기술의 발달에 따라 전자기기의 이용량은 증가하였지만, 시각장애인들이나 지체 장애인들이 이용하는 데에 어려움이 있다. 따라서 본 논문에서는 Google Cloud API를 활용하여 음성으로 프로그램을 제어할 수 있는 음성 인터페이스를 제안한다. Google Cloud에서 제공하는 STT(Speech To Text)와 TTS(Text To Speech) API를 이용하여 사용자의 음성을 인식하면 텍스트로 변환된 음성이 시스템을 통해 응용 프로그램을 제어할 수 있도록 설계한다. 이 시스템은 장애인들이 전자기기를 사용하는데 많은 편리함을 줄 것으로 예상하며 나아가 장애인들뿐 아니라 비장애인들도 활용 가능할 것으로 기대한다.

A Drowsiness Detection System using ChatGPT and Image Processing (ChatGPT와 영상처리를 이용한 졸음 감지 시스템)

  • Hyeon-Jun Lee;Hyeon-Sang Soon;Seong-Hun Jo;Chang-Hui Seo;Ji-Yun Kang;Se-Jin Oh
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.259-260
    • /
    • 2024
  • 졸음운전으로 인한 교통사고는 매년 꾸준하게 일어나 이에 대한 다방면의 해결책이 요구되고 있다. 본 논문에서는 위 문제를 개선하고자 ChatGPT와 영상처리를 이용한 졸음 감지 시스템을 구현하였다. 이 시스템은 운전자의 얼굴 부분을 영상처리로 인식하여 눈동자의 종횡비를 구해 PERCLOS 공식에 따른 운전자의 졸음을 판별시키고, 경고와 동시에 ChatGPT가 운전자에게 특정 주제를 키워드로 TTS와 STT를 통해 대화한다. 운전자의 졸음을 판별하기 위해 임베디드 보드에서 연결된 캠을 통해 졸음 판별을 하고, ChatGPT도 마찬가지로 보드에서 연결한 스피커, 마이크를 통해 운전자와 대화한다. 이를 활용하여 운전자의 졸음 자각을 통한 안전운전 및 사고 발생률의 감소를 기대할 수 있다.

  • PDF

An HMM-based Korean TTS synthesis system using phrase information (운율 경계 정보를 이용한 HMM 기반의 한국어 음성합성 시스템)

  • Joo, Young-Seon;Jung, Chi-Sang;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.89-91
    • /
    • 2011
  • In this paper, phrase boundaries in sentence are predicted and a phrase break information is applied to an HMM-based Korean Text-to-Speech synthesis system. Synthesis with phrase break information increases a naturalness of the synthetic speech and an understanding of sentences. To predict these phrase boundaries, context-dependent information like forward/backward POS(Part-of-Speech) of eojeol, a position of eojeol in a sentence, length of eojeol, and presence or absence of punctuation marks are used. The experimental results show that the naturalness of synthetic speech with phrase break information increases.

  • PDF

Voice transformation for HTS using correlation between fundamental frequency and vocal tract length (기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환)

  • Yoo, Hyogeun;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.41-47
    • /
    • 2017
  • The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.