• Title/Summary/Keyword: Voice interface

Search Result 298, Processing Time 0.024 seconds

Speaker Identification in Small Training Data Environment using MLLR Adaptation Method (MLLR 화자적응 기법을 이용한 적은 학습자료 환경의 화자식별)

  • Kim, Se-hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.159-162
    • /
    • 2005
  • Identification is the process automatically identify who is speaking on the basis of information obtained from speech waves. In training phase, each speaker models are trained using each speaker's speech data. GMMs (Gaussian Mixture Models), which have been successfully applied to speaker modeling in text-independent speaker identification, are not efficient in insufficient training data environment. This paper proposes speaker modeling method using MLLR (Maximum Likelihood Linear Regression) method which is used for speaker adaptation in speech recognition. We make SD-like model using MLLR adaptation method instead of speaker dependent model (SD). Proposed system outperforms the GMMs in small training data environment.

  • PDF

Synthetic Speech Quality Improvement By Glottal parameter Interpolation - Preliminary study on open quotient interpolation in the speech corpus - (성대특성 보간에 의한 합성음의 음질향상 - 음성코퍼스 내 개구간 비 보간을 위한 기초연구 -)

  • Bae, Jae-Hyun;Oh, Yung-Hwa
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.63-66
    • /
    • 2005
  • For the Large Corpus based TTS the consistency of the speech corpus is very important. It is because the inconsistency of the speech quality in the corpus may result in a distortion at the concatenation point. And because of this inconsistency, large corpus must be tuned repeatedly One of the reasons for the inconsistency of the speech corpus is the different glottal characteristics of the speech sentence in the corpus. In this paper, we adjusted the glottal characteristics of the speech in the corpus to prevent this distortion. And the experimental results are showed.

  • PDF

Verification of AI Voice User Interface(VUI) Usability Evaluation : Focusing on Chinese Navigation VUI (인공지능 음성사용자 인터페이스 사용성 평가 기준 검증 : 중국 내비게이션 VUI를 중심으로)

  • Zhou, Yi Mou;Shang, Lin Rru;Lim, Hyun Chan;Hwang, Mi Kyung
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.7
    • /
    • pp.913-921
    • /
    • 2021
  • After arranging the general usability evaluation criteria of existing VUI researchers, this study verified how appropriate these criteria are for AI VUI specialized in navigation and the priority of their suitability. The VUI used in this study was analyzed through a survey from a total of 195 Chinese users after analyzing the navigation VUI used in China. As a result of the analysis, the usability evaluation criteria of the navigation VUI were extracted from three sub-factors of 'task accuracy', 'function satisfaction', and 'information reliability' in verifying conformance with general VUI evaluation criteria. With the recent advent of self-driving cars, safety and response speed are becoming very important, so Chinese users also ranked responsiveness as the top priority in VUI design, and the importance was also found to be high. Also, both men and women have the highest reactivity and the lowest multiplicity. VUI requires a convenient and natural interface to understand the intention between two objects through usability evaluation and verification in order to have effective interaction between humans and machines.

Mobile Voice Web Browser for the Low Vision (저시력자를 위한 모바일 보이스 웹 브라우저 개발)

  • Park, Joo Hyun;Lee, Han Na;Shin, Ji Eun;Dong, Suh-Yeon;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.11
    • /
    • pp.1418-1427
    • /
    • 2020
  • The web has become indispensable in all of our daily lives. We communicate, study and get information with others through the web. This behavior also continues in the smart phone environment. The biggest problem is that the small display screen of a smart phone degrades the accuracy in selecting or manipulating content for people with low vision. To compensate for this, voice guidance services that combine touch and voice, such as VoiceOver and Talkback, are currently provided to smart phone devices. However, restrictions arise in GUI, TTS control problems, and content expansion and selection. In addition, unnecessary content is also output by voice, which causes fatigue for low vision people to use. In this study, we propose a mobile web browser interface that selects and enlarges a desired area from web browsers and contents, or outputs it as a voice so that people with low vision can easily use the mobile web browser. In this paper, we propose a context selective focusing function that enables selection for each element of web content. In addition, we intend to develop a mobile voice web browser that can enlarge the selected content or output it by voice.

Development of Voice Activity Detection Algorithm for Elderly Voice based on the Higher Order Differential Energy Operator (고차 미분에너지 기반 노인 음성에서의 음성 구간 검출 알고리즘 연구)

  • Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.14 no.11
    • /
    • pp.249-255
    • /
    • 2016
  • Since the elderly voices include a lot of noise caused by physiological changes in respiration, phonation, and resonance, the performance of the convergence health-care equipments such as speech recognition, synthesis, analysis program done by elderly voice is deteriorated. Therefore it is necessary to develop researches to operate health-care instruments with elderly voices. In this study, a voice activity detection using a symmetric higher-order differential energy function (SHODEO) was developed and was compared with auto-correlation function(ACF) and the average magnitude difference function(AMDF). It was confirmed to have a better performance than other methods in the voice interval detection. The voice activity detection will be applied to a voice interface for the elderly to improve the accessibility of the smart devices.

Implementation of the automatic switching device for the voice communications between heterogeneous devices (이종 기기 간 음성통신을 위한 자동전환장치의 구현)

  • Lew, Chang-Guk;Lee, Bae-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.12
    • /
    • pp.1321-1328
    • /
    • 2015
  • A radio is a half-duplex voice communication method using the PTT(: Push To Talk), occupy a single line calls during transmission. As an interface between the telephone and the radio, UHF and VHF, for voice communication between the different heterogeneous devices, A device automatically switches between the two devices is required. Therefore, in accordance with the performance of the voice switching apparatus for detecting a voice to be transmitted from an input signal, loss of the audio signal to be transmitted is subjected to Significant influence. Conventional method has the problem responding to noise by setting the level through simple means of amplitude of input signal, in other words, the energy level of the input signal. This paper, by using the audio signal processing techniques, this discriminated what the voice is among the input signal and substantiated a device for the automatic voice transmission between heterogeneous devices. With this proposal, I was confirmed of improvement of performance in the automatic voice switching device, could perform loss-less transmission of voice between heterogeneous devices.

An Implementation of Travel Information Service Using VoiceXML and GPS (VoiceXML과 GPS를 이용한 여행정보 서비스의 구현)

  • Oh, Jae-Gyu;Kim, Sun-Hyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.6
    • /
    • pp.1443-1448
    • /
    • 2007
  • In this paper, we implement a distributed computing environment-based travel information service that can use web(internet) and speech interface at the same time and can apply location information, using voice and web browser-based VoiceXML and GPS, to escape the limitations of traditional web(internet)-based travel information services. Because of IVR(Interactive Voice Response) of traditional call center has operated to a pre-installation scenario, it takes much a service time and has the inconveniences that must repeat speech recording according to the revised scenarios in case change response contents. However, suggested VoiceXML and GPS-based travel information service system has advantages that reorganization of system setups is easy, because it consists of the method to update server after make individual conversation scenarios by file format(document), and can provide usefully various travel information in environmental restriction conditions such as the back regions environment, according as our prototype find user's present location using GPS information and then provide various travel information service by this information.

  • PDF

A Survey Study on the Utilization Status and User Perception of the VUI of Smartphones (스마트폰 음성 인터페이스의 사용 현황 및 사용자 인식에 대한 조사 연구)

  • Choe, Jaeho;Kim, Hoontae
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.4
    • /
    • pp.29-40
    • /
    • 2016
  • Voice User Interface (VUI) is the most familiar and comfortable interface to human. Recently, with the development of cloud and AI technologies VUI has been applied to various products. The aim of this study was to identify the problems of the current VUI and to find the direction of future study by investigating the utilization status and user perception of the VUI of smartphones. A survey was conducted with 163 college students using Google Forms. The results showed that the level of recognition of VUI is high but the rate of usage is very low, and many users feel uncomfortable about the voice recognition rate, reaction speed and operation method. Most of the survey participants tried VUI out of curiosity, but only a small portion of them found it useful to continue to use it. Many participants disliked talking to machines and also did not want others to listen. The study results will guide future research efforts for improving the utilization of VUI.

Handwriting and Voice Input using Transparent Input Overlay (투명한 입력오버레이를 이용한 필기 및 음성 입력)

  • Kim, Dae-Hyun;Kim, Myoung-Jun;Lee, Zin-O
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.4
    • /
    • pp.245-254
    • /
    • 2008
  • This paper proposes a unified multi-modal input framework to interface the recognition engines such as IBM ViaVoice and Microsoft handwriting-recognition system with general window applications, particularly, for pen-input displays. As soon as user pushes a hardware button attached to the pin-input display with one hand, the current window of focus such as a internet search window and a word processor is overlaid with a transparent window covering the whole desktop; upon which user inputs handwriting with the other hand, without losing the focus of attention on working context. As well as freeform handwriting on this transparent input overlay as a sketch pad, the user can dictate some words and draw diagrams to communicate with the system.