• Title/Summary/Keyword: Text-to-Speech system

Search Result 246, Processing Time 0.038 seconds

Education System to Learn the Skills of Management Decision-Making by Using Business Simulator with Speech Recognition Technology

  • Sakata, Daiki;Akiyama, Yusuke;Kaneko, Masaaki;Kumagai, Satoshi
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.3
    • /
    • pp.267-277
    • /
    • 2014
  • In this paper, we propose an educational system that involves a business game simulator and related curriculum. To develop these two elements, we examined the decision-making process related to business management and identified some significant skills thereby. In addition, we created an original simulator, named BizLator (http://bizlator.com), to help students develop these skills efficiently. Next, we developed a curriculum suitable for the simulator. We confirmed the effectiveness of the simulator and curriculum in a business-game-based class at Aoyama Gakuin University in Tokyo. On the basis of this, we compared our education system with a conventional system. This allowed us to identify advantages of and issues with our proposed system. Furthermore, we proposed a speech recognition support system named BizVoice in order to provide the teachers with more meaningful feedback, such as level of students' understanding. Concretely, BizVocie fetches students' speech of discussion during the game and converts the voice data to text data with speech recognition technology. Finally, teachers can grasp students' parameters of understanding, and thereby, the students also can take more effective class using BizLator. We also confirmed the effectiveness of the system in the class of Aoyama Gakuin Universiry.

Text-dependent Speaker Verification System Over Telephone Lines (전화망을 위한 어구 종속 화자 확인 시스템)

  • 김유진;정재호
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.663-667
    • /
    • 1999
  • In this paper, we review the conventional speaker verification algorithm and present the text-dependent speaker verification system for application over telephone lines and its result of experiments. We apply blind-segmentation algorithm which segments speech into sub-word unit without linguistic information to the speaker verification system for training speaker model effectively with limited enrollment data. And the World-mode] that is created from PBW DB for score normalization is used. The experiments are presented in implemented system using database, which were constructed to simulate field test, and are shown 3.3% EER.

  • PDF

Implementation of Text-to-Speech System using ABS/OLA Sinusoidal Model (ABS/OLA Sinusoidal 모델을 이용한 문서-음성 변환시스템의 구현)

  • Bae Jae-Hyun;Byeon Heo-Jin;Oh Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.17-20
    • /
    • 1999
  • 본 논문에서는 중첩 가산 Sinusoidal 합성방식에서 위상계승에 의한 단위음 연결법과 다프레임간 정현파 크기의 보간법을 제안한다. 그리고 합성 프레임의 중심이 pitch onset time이라고 가정하고, 음성에서 분리한 성도 모델의 위상을 음성 전체의 위상으로 사용하는 방법을 제안한다. 제안한 방법으로 문서-음성 변환 시스템 (Text-to-Speech System, TTS System)을 구현한 결과 단위음 연결시 연결부분의 파형 왜곡이 감소함을 알 수 있었고, 부드럽게 연결된 합성음을 얻을 수 있었다.

  • PDF

Variation of the Verification Error Rate of Automatic Speaker Recognition System With Voice Conditions (다양한 음성을 이용한 자동화자식별 시스템 성능 확인에 관한 연구)

  • Hong Soo Ki
    • MALSORI
    • /
    • no.43
    • /
    • pp.45-55
    • /
    • 2002
  • High reliability of automatic speaker recognition regardless of voice conditions is necessary for forensic application. Audio recordings in real cases are not consistent in voice conditions, such as duration, time interval of recording, given text or conversational speech, transmission channel, etc. In this study the variation of verification error rate of ASR system with the voice conditions was investigated. As a result in order to decrease both false rejection rate and false acception rate, the various voices should be used for training and the duration of train voices should be longer than the test voices.

  • PDF

Text Transliteration System and Number Transliteration Disambiguation for TTS (음성합성을 위한 텍스트 음역 시스템과 숫자 음역 모호성 처리)

  • Park, Jeong Yeon;Shin, Hyeong Jin;Yuk, Dae Bum;Lee, Jae Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.449-452
    • /
    • 2018
  • TTS(Text-to-Speech)는 문자열을 입력받아 그 문자열을 음성으로 변환하는 음성합성 기술이다. 그러나 실제 입력되는 문장에는 한글뿐만 아니라 영단어 및 숫자 등이 혼합되어 있다. 영단어는 대소문자에 따라 다르게 읽을 수 있으며, 단위로 사용될 때는 약어로 사용되는 것이므로, 알파벳 단위로 읽어서는 안 된다. 숫자 또한 함께 사용되는 단어에 따라 읽는 방식이 달라진다. 본 논문에서는 한글과 숫자 및 단위, 영단어가 혼합된 문장을 분류하고 이를 음역하는 시스템을 구성하며 word vector를 이용한 숫자 및 단위의 모호성 해소방법을 소개한다.

  • PDF

Study of Speech Recognition System Using the Java (자바를 이용한 음성인식 시스템에 관한 연구)

  • Choi, Kwang-Kook;Kim, Cheol;Choi, Seung-Ho;Kim, Jin-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.6
    • /
    • pp.41-46
    • /
    • 2000
  • In this paper, we implement the speech recognition system based on the continuous distribution HMM and Browser-embedded model using the Java. That is developed for the speech analysis, processing and recognition on the Web. Client sends server through the socket to the speech informations that extracting of end-point detection, MFCC, energy and delta coefficients using the Java Applet. The sewer consists of the HMM recognizer and trained DB which recognizes the speech and display the recognized text back to the client. Because of speech recognition system using the java is high error rate, the platform is independent of system on the network. But the meaning of implemented system is merged into multi-media parts and shows new information and communication service possibility in the future.

  • PDF

Implementation of Information Access Embedded System for the Blind People (시각 장애인을 위한 정보접근 임베디드 시스템의 구현)

  • Kim, Si-Woo;Lee, Jae-Kyun;Lee, Chae-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.2C
    • /
    • pp.167-172
    • /
    • 2008
  • Since a 2-dimensional (2D) bar code can retrieve data and information quickly, it is widely used and recognized as a useful tool for many industrial applications. However, the information capacity of the 2D bar code is still limited. Recently the analog-digital code (AD code), which has the largest storage capacity yet contained in a code, has been developed, thereby expanding the bar code's application range because it overcomes the limitation of data capacity. In this paper, we present the AD code and implement an effective embedded system which can transform text information into voice using the 2D AD code and Text To Speech (TTS). This voice information can also be transmitted to blind people as well as the old by capturing the AD code on paper or in books.

Object Detection Algorithm for Explaining Products to the Visually Impaired (시각장애인에게 상품을 안내하기 위한 객체 식별 알고리즘)

  • Park, Dong-Yeon;Lim, Soon-Bum
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.10
    • /
    • pp.1-10
    • /
    • 2022
  • Visually impaired people have very difficulty using retail stores due to the absence of braille information on products and any other support system. In this paper, we propose a basic algorithm for a system that recognizes products in retail stores and explains them as a voice. First, the deep learning model detects hand objects and product objects in the input image. Then, it finds a product object that most overlapping hand object by comparing the coordinate information of each detected object. We determine that this is a product selected by the user, and the system read the nutritional information of the product as Text-To-Speech. As a result of the evaluation, we confirmed a high performance of the learning model. The proposed algorithm can be actively used to build a system that supports the use of retail stores for the visually impaired.

Design and Implementation of Simple Text-to-Speech System using Phoneme Units (음소단위를 이용한 소규모 문자-음성 변환 시스템의 설계 및 구현)

  • Park, Ae-Hee;Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.49-60
    • /
    • 1995
  • This paper is a study on the design and implementation of the Korean Text-to-Speech system which is used for a small and simple system. In this paper, a parameter synthesis method is chosen for speech syntheiss method, we use PARCOR(PARtial autoCORrelation) coefficient which is one of the LPC analysis. And we use phoneme for synthesis unit which is the basic unit for speech synthesis. We use PARCOR, pitch, amplitude as synthesis parameter of voice, we use residual signal, PARCOR coefficients as synthesis parameter of unvoice. In this paper, we could obtain the 60% intelligibility by using the residual signal as excitation signal of unvoiced sound. The result of synthesis experiment, synthesis of a word unit is available. The controlling of phoneme duration is necessary for synthesizing of a sentence unit. For setting up the synthesis system, PC 486, a 70[Hz]-4.5[KHz] band pass filter for speech input/output, amplifier, and TMS320C30 DSP board was used.

  • PDF

A Design of the Emergency-notification and Driver-response Confirmation System(EDCS) for an autonomous vehicle safety (자율차량 안전을 위한 긴급상황 알림 및 운전자 반응 확인 시스템 설계)

  • Son, Su-Rak;Jeong, Yi-Na
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.2
    • /
    • pp.134-139
    • /
    • 2021
  • Currently, the autonomous vehicle market is commercializing a level 3 autonomous vehicle, but it still requires the attention of the driver. After the level 3 autonomous driving, the most notable aspect of level 4 autonomous vehicles is vehicle stability. This is because, unlike Level 3, autonomous vehicles after level 4 must perform autonomous driving, including the driver's carelessness. Therefore, in this paper, we propose the Emergency-notification and Driver-response Confirmation System(EDCS) for an autonomousvehicle safety that notifies the driver of an emergency situation and recognizes the driver's reaction in a situation where the driver is careless. The EDCS uses the emergency situation delivery module to make the emergency situation to text and transmits it to the driver by voice, and the driver response confirmation module recognizes the driver's reaction to the emergency situation and gives the driver permission Decide whether to pass. As a result of the experiment, the HMM of the emergency delivery module learned speech at 25% faster than RNN and 42.86% faster than LSTM. The Tacotron2 of the driver's response confirmation module converted text to speech about 20ms faster than deep voice and 50ms faster than deep mind. Therefore, the emergency notification and driver response confirmation system can efficiently learn the neural network model and check the driver's response in real time.