• Title/Summary/Keyword: TTS(Text-to-Speech)

Search Result 139, Processing Time 0.024 seconds

Development of Voice Information System for Safe Navigation in Marine Simulator (시뮬레이터 기반 음성을 이용한 항행정보 안내시스템의 개발)

  • Son N. S.;Kim S. Y.
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.5 no.3
    • /
    • pp.28-34
    • /
    • 2002
  • As the technology of Speech Recognition(SR) and Text-To-Speech(TTS) develops rapidly, voice control and guidance system is thought to be very helpful for safe navigation. But Voice Control and Guidance System(VCGS) is not yet so popularly included in Navigation Supporting System(NSS). The main reason of this is that VCGS is so complicated and user-unfriendly that navigation officers hesitate to use VCGS. Frequent errors in operating VCGS due to low rate of SR are another reason. To make VCGS more practicable for safe navigation, we design the user-friendly VCGS. Firstly, by using interviews we survey functions and procedures that navigation officers want to be included in VCGS. Secondly, to raise the rate of SR, we tun the environmental noise in bridge and to reduce the errors due to low rate of SR in operating VCGS, we design the functions of self-correction. Also we apply a user-independent SR engine so that procedures of teaming of speakers is basically not necessary. Using simulator experiments the functions and procedures of the user-friendly YCGS for safe navigation are evaluated and the results of evaluation are fed back to the design. As a result, we can design the VCGS more helpful for safe navigation. In this paper, we describe the features of the user-friendly VCGS for safe navigation and discuss the results of simulator experiments.

  • PDF

VoiceXML Dialog System Based on RSS for Contents Syndication (콘텐츠 배급을 위한 RSS 기반의 VoiceXML 다이얼로그 시스템)

  • Kwon, Hyeong-Joon;Kim, Jung-Hyun;Lee, Hyon-Gu;Hong, Kwang-Seok
    • The KIPS Transactions:PartB
    • /
    • v.14B no.1 s.111
    • /
    • pp.51-58
    • /
    • 2007
  • This paper suggests prototype of dialog system combining VXML(VoiceXML) that is the W3C's standard XML format for specifying interactive voice dialogues between human and computer, and RSS(RDF Site Summary or Really Simple Syndication) that is representative technology of semantic web for syndication and subscription of updated web-contents. Merits of the proposed system are as following: 1) It is a new method that recognize spoken contents using ire and wireless telephone networks and then provide contents to user via STT(Speech-to-Text) and TTS(Text-to-Speech) instead of traditional method using web only. 2) It can apply advantage of RSS that subscription of updated contents is converted to VXML without modifying traditional method to provide RSS service, 3) In terms of users, it can reduce restriction on time-spate in search of contents that is provided by RSS because it uses ire and wireless telephone networks, not internet environment. 4) In terms of information provider, it does not need special component for syndication of the newest contents using speech recognition and synthesis technology. We implemented a news service system using VXML and RSS for performance evaluation of the proposed system. In experiment results, we estimated the response time and the speech recognition rate in subscription and search of actuality contents, and confirmed that the proposed system can provide contents those are provided using RSS Feed.

Development of Half-Mirror Interface System and Its Application for Ubiquitous Environment (유비쿼터스 환경을 위한 하프미러형 인터페이스 시스템 개발과 응용)

  • Kwon Young-Joon;Kim Dae-Jin;Lee Sang-Wan;Bien Zeungnam
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.12
    • /
    • pp.1020-1026
    • /
    • 2005
  • In the era of ubiquitous computing, human-friendly man-machine interface is getting more attention due to its possibility to offer convenient services. For this, in this paper, we introduce a 'Half-Mirror Interface System (HMIS)' as a novel type of human-friendly man-machine interfaces. Basically, HMIS consists of half-mirror, USB-Webcam, microphone, 2ch-speaker, and high-speed processing unit. In our HMIS, two principal operation modes are selected by the existence of the user in front of it. The first one, 'mirror-mode', is activated when the user's face is detected via USB-Webcam. In this mode, HMIS provides three basic functions such as 1) make-up assistance by magnifying an interested facial component and TTS (Text-To-Speech) guide for appropriate make-up, 2) Daily weather information provider via WWW service, 3) Health monitoring/diagnosis service using Chinese medicine knowledge. The second one, 'display-mode' is designed to show decorative pictures, family photos, art paintings and so on. This mode is activated when the user's face is not detected for a time being. In display-mode, we also added a 'healing-window' function and 'healing-music player' function for user's psychological comfort and/or relaxation. All these functions are accessible by commercially available voice synthesis/recognition package.

A Study on Development of Applications which Provides Step-by-step CPR Guidelines and Learning Materials for Non Health-related Person (비보건계열 일반인을 위한 단계별 CPR 가이드라인과 학습자료 제공 어플리케이션 개발 연구)

  • Kim, Jong-Min
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.649-651
    • /
    • 2021
  • In Korea, there are around 30,000 cardiac arrest patients annually. Gradually the number is increasing. Against this background, CPR education and publicity programs were expanded nationwide, but the rate of witness CPR by the general public was 4.4%, which is significantly lower than the 20%~70% rate in other countries. Therefore, in this paper, we analyzed the factors affecting the performance of CPR by witnesses who discovered cardiac arrest patients. Based on the results, an application planning and development study was conducted to provide users with correct cardiorespiratory response tips and step-by-step CPR guidelines to help users effectively assist in increasing the rate of CPR by general eyewitnesses.

  • PDF

Prosodic-Boundary Prediction for Korean Text-to-Speech System (한국어 TTS 시스템을 위한 운율구 경계 예측)

  • Chun Jin-wook;Kim Han Woo;Kim Dong gun;Lee Yanghee
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.77-82
    • /
    • 2002
  • 운율은 음성의 초분절적인 면에 연관하는 음성의 한 성으로서 통상적으로 화자는 음성을 달하는 과정에서 청자의 이해를 돕기 위해 운율을 사용하게 된다. 본 논문은 이러한 운율을 이루는 성분 중의 하나인 운율구의 위치 예측에 대한 성능을 향상시키는 것에 그 목적을 둔다. 한국어 운율 정보에 대한 표기 방법 중의 하나인 K-ToBI를 기반으로 하여, 운율구의 경계와 그에 대한 레벨을 Break Indices 정보로서 나타내었고, 통계학 분야에서 제안된 Support Vector Machine(SVM)을 이용하여 시스템의 예측률 향상을 꾀하였다. 기존의 방법에서 사용된 트리 기반 모델을 이용하여 한국어 운율에 가장 많은 영향을 끼치는 언어 정보들을 추출하였고 이를 실험에 적용하였다. 기존의 트리 모델과 SVM 모델에 대한 예측률을 비교한 결과, 경계 유무 정보 예측과 4단계의 레벨을 가지는 경계 정보의 예측에서 모두 본 방법이 보다 높은 예측률을 보여 주어 본 연구에서 제시한 접근법이 운율구의 경계 정보를 예측하는 데에 있어 더욱 효과적인 접근법임을 실험적으로 입증하였다.

  • PDF

A study on the Prosody Generation of Korean Sentences using Artificial Neural networks (인공 신경망을 이용한 한국어 문장단위 운율 발생에 관한 연구)

  • 이일구;민경중;강찬구;임운천
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.105-108
    • /
    • 1999
  • TTS(Text-To-Speech) 시스템 합성음성의 자연감을 개선하기 위해 하나의 언어에 대해 존재하는 운율 법칙을 정확히 구현해야 한다. 존재하는 운율 법칙을 추출하기 위해서는 방대한 분량의 언어 자료 구축이 필요하다. 그러나 이 방법은 존재하는 운율 현상이 포함된 언어자료에 대해 완벽한 운율을 파악할 수 없으므로 합성음성의 질을 좋게 할 수 없다. 본 논문은 한국어 음성의 운율을 학습하기 위해 2개의 인공 신경망을 제안한다. 하나의 신경망으로 문장의 각 음소에 대한 피치 변화를 학습시키는 것이며, 다른 하나는 에너지 변화를 학습하도록 하였다. 신경망은 BP 신경망을 이용하며 11개의 음소를 나타내기 위해 11개의 입력과, 중간 음소의 피치와 에너지 변화곡선을 근사하는 다항식 계수를 출력하도록 하였다. 신경망시스템의 학습과 평가에 앞서, 음성학적 균형잡힌 고립단어를 기반으로 의미있는 문장을 구성하였다. 문장을 남자 화자로 하여금 읽게 하고 녹음하여 음성 DB를 구축하였다. 음성 DB에 대해 각 음소의 운율 정보를 수집하여 신경망에 맞는 목표 패턴과 훈련 패턴을 작성하였다. 이 목표 패턴은 회귀분석을 통한 추세선을 이용해 피치와 에너지에 대한 2차 다항식계수로 구성하였다. 본 논문은 목표패턴에 맞는 신경망을 학습시켜 좋은 결과를 얻었다.

  • PDF

Kubernetes-based Framework for Improving Traffic Light Recognition Performance: Convergence Vision AI System based on YOLOv5 and C-RNN with Visual Attention (신호등 인식 성능 향상을 위한 쿠버네티스 기반의 프레임워크: YOLOv5와 Visual Attention을 적용한 C-RNN의 융합 Vision AI 시스템)

  • Cho, Hyoung-Seo;Lee, Min-Jung;Han, Yeon-Jee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.851-853
    • /
    • 2022
  • 고령화로 인해 65세 이상 운전자가 급증하며 고령운전자의 교통사고 비율이 증가함에 따라 시급한 사회 문제로 떠오르고 있다. 이에 본 연구에서는 객체 검출, 인식 모델을 결합하고 신호등을 인식하여 Text-To-Speech(TTS)로 알리는 쿠버네티스 기반의 프레임워크를 제안한다. 객체 검출 단계에서는 YOLOv5 모델들의 성능을 비교하여 활용하였으며 객체 인식 단계에서는 C-RNN 기반의 attention-OCR 모델을 활용하였다. 이는 신호등의 내부 LED 영역이 아닌 이미지 전체를 인식하는 방식으로 오탐지 요소를 낮춰 인식률을 높였다. 결과적으로 1,628장의 테스트 데이터에서 accuracy 0.997, F1-score 0.991의 성능 평가를 얻어 제안한 프레임워크의 타당성을 입증하였다. 본 연구는 후속 연구에서 특정 도메인에 딥러닝 모델을 한정하지 않고 다양한 분야의 모델을 접목할 수 있도록 하며 고령 운전자 및 신호 위반으로 인한 교통사고 문제를 예방할 수 있다.

A Study on Verification of Back TranScription(BTS)-based Data Construction (Back TranScription(BTS)기반 데이터 구축 검증 연구)

  • Park, Chanjun;Seo, Jaehyung;Lee, Seolhwa;Moon, Hyeonseok;Eo, Sugyeong;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.109-117
    • /
    • 2021
  • Recently, the use of speech-based interfaces is increasing as a means for human-computer interaction (HCI). Accordingly, interest in post-processors for correcting errors in speech recognition results is also increasing. However, a lot of human-labor is required for data construction. in order to manufacture a sequence to sequence (S2S) based speech recognition post-processor. To this end, to alleviate the limitations of the existing construction methodology, a new data construction method called Back TranScription (BTS) was proposed. BTS refers to a technology that combines TTS and STT technology to create a pseudo parallel corpus. This methodology eliminates the role of a phonetic transcriptor and can automatically generate vast amounts of training data, saving the cost. This paper verified through experiments that data should be constructed in consideration of text style and domain rather than constructing data without any criteria by extending the existing BTS research.

Prediction of Prosodic Break Using Syntactic Relations and Prosodic Features (구문 관계와 운율 특성을 이용한 한국어 운율구 경계 예측)

  • Jung, Young-Im;Cho, Sun-Ho;Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.1
    • /
    • pp.89-105
    • /
    • 2008
  • In this paper, we suggest a rule-based system for the prediction of natural prosodic phrase breaks from Korean texts. For the implementation of the rule-based system, (1) sentence constituents are sub-categorized according to their syntactic functions, (2) syntactic phrases are recognized using the dependency relations among sub-categorized constituents, (3) rules for predicting prosodic phrase breaks are created. In addition, (4) the length of syntactic phrases and sentences, the position of syntactic phrases in a sentence, sense information of contextual words have been considered as to determine the variable prosodic phrase breaks. Based on these rules and features, we obtained the accuracy over 90% in predicting the position of major break and no break which have high correlation with the syntactic structure of the sentence. As for the overall accuracy in predicting the whole prosodic phrase breaks, the suggested system shows Break_Correct of 87.18% and Juncture Correct of 89.27% which is higher than that of other models.

  • PDF