• Title/Summary/Keyword: text-to-speech

Search Result 492, Processing Time 0.027 seconds

Speech Generation Using Kinect Devices Using NLP

  • D. Suganthi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.25-30
    • /
    • 2024
  • Various new technologies and aiding instruments are always being introduced for the betterment of the challenged. This project focuses on aiding the mute in expressing their views and ideas in a much efficient and effective manner thereby creating their own place in this world. The proposed system focuses on using various gestures traced into texts which could in turn be transformed into speech. The gesture identification and mapping is performed by the Kinect device, which is found to cost effective and reliable. A suitable text to speech convertor is used to translate the texts generated from Kinect into a speech. The proposed system though cannot be applied to man-to-man conversation owing to the hardware complexities, but could find itself very much of use under addressing environments such as auditoriums, classrooms, etc

A Study on the Sound Effect for Improving Customer's Speech Recognition in the TTS-based Shop Music Broadcasting Service (TTS를 이용한 매장음원방송에서 고객의 인지도 향상을 위한 음향효과 연구)

  • Kang, Sun-Mee;Kim, Hyun-Deuc;Chang, Moon-Soo
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.105-109
    • /
    • 2009
  • This thesis describes the method for well voice announcement using the TTS(Text-To-Speech) technology in the shop music broadcasting service. Offering a high quality TTS sound service for each shop requires a great expense. According to a report on the architectural acoustics the room acoustic indexes such as reverberation time and early decay time are closely connected with a subjective awareness about acoustics. By using the result the customers will be able to recognize better the voice announcement by applying sound effect to speech files made by TTS. The result of an aural comprehension examination has shown better about almost all of the parameters by applying reverb effect to TTS sound.

  • PDF

Korean Prosody Generation Based on Stem-ML (Stem-ML에 기반한 한국어 억양 생성)

  • Han, Young-Ho;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.54
    • /
    • pp.45-61
    • /
    • 2005
  • In this paper, we present a method of generating intonation contour for Korean text-to-speech (TTS) system and a method of synthesizing emotional speech, both based on Soft template mark-up language (Stem-ML), a novel prosody generation model combining mark-up tags and pitch generation in one. The evaluation shows that the intonation contour generated by Stem-ML is better than that by our previous work. It is also found that Stem-ML is a useful tool for generating emotional speech, by controling limited number of tags. Large-size emotional speech database is crucial for more extensive evaluation.

  • PDF

An emotional speech synthesis markup language processor for multi-speaker and emotional text-to-speech applications (다음색 감정 음성합성 응용을 위한 감정 SSML 처리기)

  • Ryu, Se-Hui;Cho, Hee;Lee, Ju-Hyun;Hong, Ki-Hyung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.523-529
    • /
    • 2021
  • In this paper, we designed and developed an Emotional Speech Synthesis Markup Language (SSML) processor. Multi-speaker emotional speech synthesis technology that can express multiple voice colors and emotional expressions have been developed, and we designed Emotional SSML by extending SSML for multiple voice colors and emotional expressions. The Emotional SSML processor has a graphic user interface and consists of following four components. First, a multi-speaker emotional text editor that can easily mark specific voice colors and emotions on desired positions. Second, an Emotional SSML document generator that creates an Emotional SSML document automatically from the result of the multi-speaker emotional text editor. Third, an Emotional SSML parser that parses the Emotional SSML document. Last, a sequencer to control a multi-speaker and emotional Text-to-Speech (TTS) engine based on the result of the Emotional SSML parser. Based on SSML which is a programming language and platform independent open standard, the Emotional SSML processor can easily integrate with various speech synthesis engines and facilitates the development of multi-speaker emotional text-to-speech applications.

Enhancing Multimodal Emotion Recognition in Speech and Text with Integrated CNN, LSTM, and BERT Models (통합 CNN, LSTM, 및 BERT 모델 기반의 음성 및 텍스트 다중 모달 감정 인식 연구)

  • Edward Dwijayanto Cahyadi;Hans Nathaniel Hadi Soesilo;Mi-Hwa Song
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.617-623
    • /
    • 2024
  • Identifying emotions through speech poses a significant challenge due to the complex relationship between language and emotions. Our paper aims to take on this challenge by employing feature engineering to identify emotions in speech through a multimodal classification task involving both speech and text data. We evaluated two classifiers-Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM)-both integrated with a BERT-based pre-trained model. Our assessment covers various performance metrics (accuracy, F-score, precision, and recall) across different experimental setups). The findings highlight the impressive proficiency of two models in accurately discerning emotions from both text and speech data.

Entrepreneur Speech and User Comments: Focusing on YouTube Contents (기업가 연설문의 주제와 시청자 댓글 간의 관계 분석: 유튜브 콘텐츠를 중심으로)

  • Kim, Sungbum;Lee, Junghwan
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.513-524
    • /
    • 2020
  • Recently, YouTube's growth started drawing attention. YouTube is not only a content-consumption channel but also provides a space for consumers to express their intention. Consumers share their opinions on YouTube through comments. The study focuses on the text of global entrepreneurs' speeches and the comments in response to those speeches on YouTube. A content analysis was conducted for each speech and comment using the text mining software Leximancer. We analyzed the theme of each entrepreneurial speech and derived topics related to the propensity and characteristics of individual entrepreneurs. In the comments, we found the theme of money, work and need to be common regardless of the content of each speech. Talking into account the different lengths of text, we additionally performed a Prominence Index analysis. We derived time, future, better, best, change, life, business, and need as common keywords for speech contents and viewer comments. Users who watched an entrepreneur's speech on YouTube responded equally to the topics of life, time, future, customer needs, and positive change.

A Study on the Text-to-Speech Conversion Using the Formant Synthesis Method (포만트 합성방식을 이용한 문자-음성 변환에 관한 연구)

  • Choi, Jin-San;Kim, Yin-Nyun;See, Jeong-Wook;Bae, Geun-Sune
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.9-23
    • /
    • 1997
  • Through iterative analysis and synthesis experiments on Korean monosyllables, the Korean text-to-speech system was implemented using the phoneme-based formant synthesis method. Since the formants of initial and final consonants in this system showed many variations depending on the medial vowels, the database for each phoneme was made up of formants depending on the medial vowels as well as duration information of transition region. These techniques were needed to improve the intelligibility of synthetic speech. This paper investigates also methods of concatenating the synthesis units to improve the quality of synthetic speech.

  • PDF

An Android Application for Speech Communication of People with Speech Disorders (언어장애인을 위한 안드로이드 기반 의사소통보조 어플리케이션)

  • Choi, Yoonjung;Hong, Ki-Hyung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.141-148
    • /
    • 2014
  • Voice is the most common means for communication, but some people have difficulties in generating voice due to their congenital or acquired disorders. Individuals with speech disorders might lose their speaking ability due to hearing impairment, encephalopathy or cerebral palsy accompanied by motor skill impairments, or autism caused by mental problems. However, they have needs for communication, so some of them use various types of AAC (Augmentative & Alternative Communication) devices in order to meet their communication needs. In this paper, a mobile application for literate people having speech disorder was designed and implemented by developing accurate and fast sentence-completion functions for efficient user interaction. From a user study and the previous study on Korean text-based communication for adults having difficulty in speech communication, we identified functionality and usability requirements. Specifically, the user interface with scanning features was designed by considering the users' motor skills in using the touch-screen of a mobile device. Finally, we conducted the usability test for the application. The results of the usability test show that the application is easy to learn and efficient to use in communication with people with speech disorders.

A Study on Exceptional Pronunciations For Automatic Korean Pronunciation Generator (한국어 자동 발음열 생성 시스템을 위한 예외 발음 연구)

  • Kim Sunhee
    • MALSORI
    • /
    • no.48
    • /
    • pp.57-67
    • /
    • 2003
  • This paper presents a systematic description of exceptional pronunciations for automatic Korean pronunciation generation. An automatic pronunciation generator in Korean is an essential part of a Korean speech recognition system and a TTS (Text-To-Speech) system. It is composed of a set of regular rules and an exceptional pronunciation dictionary. The exceptional pronunciation dictionary is created by extracting the words that have exceptional pronunciations, based on the characteristics of the words of exceptional pronunciation through phonological research and the systematic analysis of the entries of Korean dictionaries. Thus, the method contributes to improve performance of automatic pronunciation generator in Korean as well as the performance of speech recognition system and TTS system in Korean.

  • PDF

The Impact of Speech-To-Text-based Class on Learners' Cognitive Abilities

  • HyunMin Kang;SunKwan Han
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.287-293
    • /
    • 2024
  • This research studied the cognitive impact of classes using artificial intelligence on aviation technical school students. First, we developed a class consisting of a class based on traditional presentation materials and a class composed of speech-to-text (STT)-based artificial intelligence materials. A 133 students from an aviation education institution participated in two types of classes. We measured students' cognitive load and Mind Wandering test results before and after class, and conducted an achievement evaluation. As a result of the test analysis, we confirmed that extraneous cognitive load was reduced, content concentration increased, and achievement improved. In the future, we hope that AI-based STT classes will be widely used in schools that teach technology.