Search | Korea Science

A Design and Implementation of Speech Recognition and Synthetic Application for Hearing-Impairment

Kim, Woo-Lin;Ham, Hye-Won;Yun, Sang-Un;Lee, Won Joo
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.12
- /
- pp.105-110
- /
- 2021
In this paper, we design and implement an Android mobile application that helps hearing impaired people communicate based on STT(Speech-to-Text) and TTS(Text-to-Speech) APIs and accelerometer sensor of a smartphone. This application provides the ability to record what the hearing-Impairment person's interlocutor is saying with a microphone, convert it to text using the STT API, and display it to the hearing-Impairment person. In addition. In addition, when a hearing-impaired person inputs a text using the TTS API, it is converted into voice and told to the interlocutor. When a hearing-impaired person shakes their smartphone, an accelerometer based background service function is provided to run the application. The application implemented in this paper provides a function that allows hearing impaired people to communicate easily with other people when communicating with others without using sign language as a video call.
https://doi.org/10.9708/jksci.2021.26.12.105 인용 PDF KSCI HTML

An Android Application for Speech Communication of People with Speech Disorders (언어장애인을 위한 안드로이드 기반 의사소통보조 어플리케이션)

Choi, Yoonjung;Hong, Ki-Hyung
- Phonetics and Speech Sciences
- /
- v.6 no.4
- /
- pp.141-148
- /
- 2014
Voice is the most common means for communication, but some people have difficulties in generating voice due to their congenital or acquired disorders. Individuals with speech disorders might lose their speaking ability due to hearing impairment, encephalopathy or cerebral palsy accompanied by motor skill impairments, or autism caused by mental problems. However, they have needs for communication, so some of them use various types of AAC (Augmentative & Alternative Communication) devices in order to meet their communication needs. In this paper, a mobile application for literate people having speech disorder was designed and implemented by developing accurate and fast sentence-completion functions for efficient user interaction. From a user study and the previous study on Korean text-based communication for adults having difficulty in speech communication, we identified functionality and usability requirements. Specifically, the user interface with scanning features was designed by considering the users' motor skills in using the touch-screen of a mobile device. Finally, we conducted the usability test for the application. The results of the usability test show that the application is easy to learn and efficient to use in communication with people with speech disorders.
https://doi.org/10.13064/KSSS.2014.6.4.141 인용 PDF KSCI

Development of Speech Recognition and Synthetic Application for the Hearing Impairment (청각장애인을 위한 음성 인식 및 합성 애플리케이션 개발)

Lee, Won-Ju;Kim, Woo-Lin;Ham, Hye-Won;Yun, Sang-Un
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2020.07a
- /
- pp.129-130
- /
- 2020
본 논문에서는 청각장애인의 의사소통을 위한 안드로이드 애플리케이션 시스템 구현 결과를 보인다. 구글 클라우드 플랫폼(Google Cloud Platform)의 STT(Speech to Text) API를 이용하여 음성 인식을 통해 대화의 내용을 텍스트의 형태로 출력한다. 그리고 TTS(Text to Speech)를 이용한 음성 합성을 통해 텍스트를 음성으로 출력한다. 또한, 포그라운드 서비스(Service)에서 가속도계 센서(Accelerometer Sensor)를 이용하여 스마트폰을 2~3회 흔들었을 때 해당 애플리케이션을 실행할 수 있도록 하여 애플리케이션의 활용성을 높인 시스템을 개발하였다.
PDF

MPEG-4 TTS (Text-to-Speech)

한민수
- Proceedings of the IEEK Conference
- /
- 1999.06a
- /
- pp.699-707
- /
- 1999
It cannot be argued that speech is the most natural interfacing tool between men and machines. In order to realize acceptable speech interfaces, highly advanced speech recognizers and synthesizers are inevitable. Text-to-Speech(TTS) technology has been attracting a lot of interest among speech engineers because of its own benefits. Namely, the possible application areas of talking computers, emergency alarming systems in speech, speech output devices fur speech-impaired, and so on. Hence, many researchers have made significant progresses in the speech synthesis techniques in the sense of their own languages and as a result, the quality of currently available speech synthesizers are believed to be acceptable to normal users. These are partly why the MPEG group had decided to include the TTS technology as one of its MPEG-4 functionalities. ETRI has made major contributions to the current MPEG-4 TTS among various MPEG-4 functionalities. They are; 1) use of original prosody for synthesized speech output, 2) trick mode functions fer general users without breaking synthesized speech prosody, 3) interoperability with Facial Animation(FA) tools, and 4) dubbing a moving/animated picture with lib-shape pattern information.
PDF

A Voice-enabled Chatbot Mobile Application (음성지원 챗봇 모바일 애플리케이션)

Choi, In-Kyung;Choi, Yun-Jeong;Lee, Ye-Rin
- Proceedings of the Korea Information Processing Society Conference
- /
- 2019.05a
- /
- pp.438-439
- /
- 2019
사회적 문제와 인공지능 기술의 발달로 챗봇 서비스에 대한 관심이 점점 증가하고 있으며, 그 결과 TTS(Text to Speech) 및 STT(Speech to Text) 기술을 기반으로 한 보조형 프로그램에 대한 개발이 다양한 모바일 환경에서 진행중이다. 본 논문에서는 문자를 소리로 변환해주는 TTS(Text to Speech) 기술과 소리를 문자로 변환해주는 STT(Speech to Text) 기술을 사용하여 음성지원 챗봇 시스템을 제작하고 이를 안드로이드 기반의 모바일 애플리케이션으로 구현한 '음성지원 챗봇 모바일 애플리케이션'을 제안하고, 이와 관련하여 관련 기술 및 기대효과에 대해 소개한다.
https://doi.org/10.3745/PKIPS.y2019m05a.438 인용 PDF

Implementation of Music Broadcasting Service System in the Shopping Center Using Text-To-Speech Technology (TTS를 이용한 매장 음악 방송 서비스 시스템 구현)

Chang, Moon-Soo;Kang, Sun-Mee
- Speech Sciences
- /
- v.14 no.4
- /
- pp.169-178
- /
- 2007
This thesis describes the development of a service system for small-sized shops which support not only music broadcasting, but editing and generating voice announcement using the TTS(Text-To-Speech) technology. The system has been developed based on web environments with an easy access whenever and wherever it is needed. The system is able to control the sound using silverlight media player based on the ASP .NET 2.0 technology without any additional application software. Use of the Ajax control allows for multiple users to get the maximum load when needed. TTS is built in the server side so that the service can be provided without user's computer. Due to convenience and usefulness of the system, the business sector can provide better service to many shops. Further additional functions such as statistical analysis will undoubtedly help shop management provide desirable services.
PDF

A Design and Implementation of The Deep Learning-Based Senior Care Service Application Using AI Speaker

Mun Seop Yun;Sang Hyuk Yoon;Ki Won Lee;Se Hoon Kim;Min Woo Lee;Ho-Young Kwak;Won Joo Lee
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.4
- /
- pp.23-30
- /
- 2024
In this paper, we propose a deep learning-based personalized senior care service application. The proposed application uses Speech to Text technology to convert the user's speech into text and uses it as input to Autogen, an interactive multi-agent large-scale language model developed by Microsoft, for user convenience. Autogen uses data from previous conversations between the senior and ChatBot to understand the other user's intent and respond to the response, and then uses a back-end agent to create a wish list, a shared calendar, and a greeting message with the other user's voice through a deep learning model for voice cloning. Additionally, the application can perform home IoT services with SKT's AI speaker (NUGU). The proposed application is expected to contribute to future AI-based senior care technology.
https://doi.org/10.9708/jksci.2024.29.04.023 인용 PDF HTML

A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering

Martin, J.C.;Jacquemin, C.;Pointal, L.;Katz, B.
- 한국정보컨버전스학회:학술대회논문집
- /
- 2008.06a
- /
- pp.53-56
- /
- 2008
This paper reports on the ACQA(Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent(ACA) for conducting research along two main lines: 1/ perceptual experiments(eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI(RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text(the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.
PDF

A Korean TTS System for Educational Purpose (교육용 한국어 TTS 플랫폼 개발)

Lee Jungchul;Lee Sangho
- MALSORI
- /
- no.50
- /
- pp.41-50
- /
- 2004
Recently, there has been considerable progress in the natural language processing and digital signal processing components and this progress has led to the improved synthetic speech qualify of many commercial TTS systems. But there still remain many obstacles to overcome for the practical application of TTS. To resolve the problems, the cooperative research among the related areas is highly required and a common Korean TTS platform is essential to promote these activities. This platform offers a general framework for building Korean speech synthesis systems and a full C/C++ source for modules supports to implement and test his own algorithm. In this paper we described the aspect of a Korean TTS platform to be developed and a developing plan.
PDF

APPLICATION OF KOREAN TEXT-TO-SPEECH FOR X.400 MHS SYSTEM

Kim, Hee-Dong;Koo, Jun-Mo;Choi, Ho-Joon;Kim, Sang-Taek
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.885-892
- /
- 1994
This paper presents the Korean text-to-speech (TTS) algorithm with speed and intonation control capability, and describes the development of the Voice message delivery system employing this TTS algorithm. This system allows the Interpersonal Messaging (IPM) Service users of Message Handling System (MHS) to send his/her text messages to user via telephone line using synthetic voice. In the X.400 MHS recommendation, the protocols and service elements are not specified for the voice message delivery system. Thus, we defined access protocol and service elements for Voice Access Unit based on the application program interface for message transfers between X.400 Message Transfer Agent and Voice Access Unit. The system architecture and operations will be provided.
PDF

Search Result 63, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)