• Title/Summary/Keyword: TTS system

Search Result 145, Processing Time 0.028 seconds

Development of IoT System Based on Context Awareness to Assist the Visually Impaired

  • Song, Mi-Hwa
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.320-328
    • /
    • 2021
  • As the number of visually impaired people steadily increases, interest in independent walking is also increasing. However, there are various inconveniences in the independent walking of the visually impaired at present, reducing the quality of life of the visually impaired. The white cane, which is an existing walking aid for the visually impaired, has difficulty in recognizing upper obstacles and obstacles outside the effective distance. In addition, it is inconvenient to cross the street because the sound signal to help the visually impaired cross the crosswalk is lacking or damaged. These factors make it difficult for the visually impaired to walk independently. Therefore, we propose the design of an embedded system that provides traffic light recognition through object recognition technology, voice guidance using TTS, and upper obstacle recognition through ultrasonic sensors so that blind people can realize safe and high-quality independent walking.

The Study on Korean Prosody Generation using Artificial Neural Networks (인공 신경망의 한국어 운율 발생에 관한 연구)

  • Min Kyung-Joong;Lim Un-Cheon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.337-340
    • /
    • 2004
  • The exactly reproduced prosody of a TTS system is one of the key factors that affect the naturalness of synthesized speech. In general, rules about prosody had been gathered either from linguistic knowledge or by analyzing the prosodic information from natural speech. But these could not be perfect and some of them could be incorrect. So we proposed artificial neural network(ANN)s that can be trained to team the prosody of natural speech and generate it. In learning phase, let ANNs learn the pitch and energy contour of center phoneme by applying a string of phonemes in a sentence to ANNs and comparing the output pattern with target pattern and making adjustment in weighting values to get the least mean square error between them. In test phase, the estimation rates were computed. We saw that ANNs could generate the prosody of a sentence.

  • PDF

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

  • 이기승;김정수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.135-144
    • /
    • 2003
  • A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

Contents Development of IrobiQ on School Violence Prevention Program for Young Children (지능형 로봇 아이로비큐(IrobiQ)를 활용한 학교폭력 예방 프로그램 개발)

  • Hyun, Eunja;Lee, Hawon;Yeon, Hyemin
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.9
    • /
    • pp.455-466
    • /
    • 2013
  • The purpose of this study was to develop a school violence prevention program "Modujikimi" for young children to be embedded in IrobiQ, the teacher assistive robot. The themes of this program consisted of basic character education, bullying prevention education and sexual violence prevention education. The activity types included large group, individual and small group activities, free choice activities, and finally parents' education, which included poems, fairy tales, music, art, sharing stories. Finally, the multi modal functions of the robot were employed: image on the screen, TTS (Text To Speech), touch function, recognition of sound and recording system. The robot content was demonstrated to thirty early childhood educators whose acceptability of the content was measured using questionnaires. And also the content was applied to children in daycare center. As a result, majority of them responded positively in acceptability. The results of this study suggest that the further research is needed to improve two-way interactivity of teacher assistive robot.

Speech Synthesis for the Korean large Vocabulary Through the Waveform Analysis in Time Domains and Evauation of Synthesized Speech Quality (시간영역에서의 파형분석에 의한 무제한 어휘 합성 및 음절 유형별 규칙합성음 음질평가)

  • Kang, Chan-Hee;Chin, Yong-Ohk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.71-83
    • /
    • 1994
  • This paper deals with the improvement of the synthesized speech quality and naturality in the Korean TTS(Text-to-Speech) system. We had extracted the parameters(table2) such as its amplitude, duration and pitch period in a syllable through the analysis of speech waveforms(table1) in the time domain and synthesized syllables using them. To the frequencies of the Korean pronunciation large vocabulary dictionary we had synthesized speeches selected 229 syllables such as V types are 19, CV types are 80. VC types are 30 and CVC types are 100. According to the 4 Korean syllable types from the data format dictionary(table3) we had tested each 15 syllables with the objective MOS(Mean Opinion Score) evaluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. As the results of experiments the qualities of them are very clear and we can control the prosodic elements such as durations, accents and pitch periods (fig9, 10, 11, 12).

  • PDF

Development of Korean-to-English and English-to-Korean Mobile Translator for Smartphone (스마트폰용 영한, 한영 모바일 번역기 개발)

  • Yuh, Sang-Hwa;Chae, Heung-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.3
    • /
    • pp.229-236
    • /
    • 2011
  • In this paper we present light weighted English-to-Korean and Korean-to-English mobile translators on smart phones. For natural translation and higher translation quality, translation engines are hybridized with Translation Memory (TM) and Rule-based translation engine. In order to maximize the usability of the system, we combined an Optical Character Recognition (OCR) engine and Text-to-Speech (TTS) engine as a Front-End and Back-end of the mobile translators. With the BLEU and NIST evaluation metrics, the experimental results show our E-K and K-E mobile translation equality reach 72.4% and 77.7% of Google translators, respectively. This shows the quality of our mobile translators almost reaches the that of server-based machine translation to show its commercial usefulness.

Development of Functional Fishing Field Game for the Elderly Based on Virtual Reality (가상현실 기반 고령자를 위한 기능성 낚시터 게임 개발)

  • Kim, Min-jeong;Kim, Young-june;Oh, Ha-hyun;Lee, Chung-ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.308-311
    • /
    • 2021
  • This paper describes the development of a functional game for preventing dementia for the elderly based on virtual reality. The game was developed using the Unity 3D engine to create a fishing spot, a virtual reality space. In consideration of the fact that the object of the game is an elderly person relatively unfamiliar with the virtual reality, the number of operation buttons is minimized and the sense of resistance and fatigue are reduced by an intuitive game configuration. In addition, the game was designed to give people a sense of accomplishment with a score system after the game, and to encourage them to participate in the game Overall, the developed game consists of main, interface, stage, score, TTS, tutorials, and ending credit. Each category stage is divided into three stages and realized in one integrated environment, and VRHMD can be used to games that enhance memory, attention, and judgment.

  • PDF

A Study on Interactive Talking Companion Doll Robot System Using Big Data for the Elderly Living Alone (빅데이터를 이용한 독거노인 돌봄 AI 대화형 말동무 아가야(AGAYA) 로봇 시스템에 관한 연구)

  • Song, Moon-Sun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.305-318
    • /
    • 2022
  • We focused on the care effectiveness of the interactive AI robots. developed an AI toy robot called 'Agaya' to contribute to personalization with more human-centered care. First, by applying P-TTS technology, you can maximize intimacy by autonomously selecting the voice of the person you want to hear. Second, it is possible to heal in your own way with good memory storage and bring back memory function. Third, by having five senses of the role of eyes, nose, mouth, ears, and hands, seeking better personalised services. Fourth, it attempted to develop technologies such as warm temperature maintenance, aroma, sterilization and fine dust removal, convenient charging method. These skills will expand the effective use of interactive robots by elderly people and contribute to building a positive image of the elderly who can plan the remaining old age productively and independently

An HMM-based Korean TTS synthesis system using phrase information (운율 경계 정보를 이용한 HMM 기반의 한국어 음성합성 시스템)

  • Joo, Young-Seon;Jung, Chi-Sang;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.89-91
    • /
    • 2011
  • In this paper, phrase boundaries in sentence are predicted and a phrase break information is applied to an HMM-based Korean Text-to-Speech synthesis system. Synthesis with phrase break information increases a naturalness of the synthetic speech and an understanding of sentences. To predict these phrase boundaries, context-dependent information like forward/backward POS(Part-of-Speech) of eojeol, a position of eojeol in a sentence, length of eojeol, and presence or absence of punctuation marks are used. The experimental results show that the naturalness of synthetic speech with phrase break information increases.

  • PDF

Voice transformation for HTS using correlation between fundamental frequency and vocal tract length (기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환)

  • Yoo, Hyogeun;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.41-47
    • /
    • 2017
  • The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.