• Title/Summary/Keyword: Speech Interface

Search Result 251, Processing Time 0.024 seconds

Modified Mel Frequency Cepstral Coefficient for Korean Children's Speech Recognition (한국어 유아 음성인식을 위한 수정된 Mel 주파수 캡스트럼)

  • Yoo, Jae-Kwon;Lee, Kyoung-Mi
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.3
    • /
    • pp.1-8
    • /
    • 2013
  • This paper proposes a new feature extraction algorithm to improve children's speech recognition in Korean. The proposed feature extraction algorithm combines three methods. The first method is on the vocal tract length normalization to compensate acoustic features because the vocal tract length in children is shorter than in adults. The second method is to use the uniform bandwidth because children's voice is centered on high spectral regions. Finally, the proposed algorithm uses a smoothing filter for a robust speech recognizer in real environments. This paper shows the new feature extraction algorithm improves the children's speech recognition performance.

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface) (인간로봇 상호작용을 위한 잡음환경에 강인한 음성 끝점 검출 기법)

  • Park, Jin-Soo;Ko, Han-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.147-156
    • /
    • 2013
  • In this paper, a new speech endpoint detection method in noisy environments for moving robot platforms is proposed. In the conventional method, the endpoint of speech is obtained by applying an edge detection filter that finds abrupt changes in the feature domain. However, since the feature of the frame energy is unstable in such noisy environments, it is difficult to accurately find the endpoint of speech. Therefore, a novel feature extraction method based on the twice-iterated fast fourier transform (TIFFT) and statistical models of speech is proposed. The proposed feature extraction method was applied to an edge detection filter for effective detection of the endpoint of speech. Representative experiments claim that there was a substantial improvement over the conventional method.

Speech synthesis using acoustic Doppler signal (초음파 도플러 신호를 이용한 음성 합성)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.134-142
    • /
    • 2016
  • In this paper, a method synthesizing speech signal using the 40 kHz ultrasonic signals reflected from the articulatory muscles was introduced and performance was evaluated. When the ultrasound signals are radiated to articulating face, the Doppler effects caused by movements of lips, jaw, and chin observed. The signals that have different frequencies from that of the transmitted signals are found in the received signals. These ADS (Acoustic-Doppler Signals) were used for estimating of the speech parameters in this study. Prior to synthesizing speech signal, a quantitative correlation analysis between ADS and speech signals was carried out on each frequency bin. According to the results, the feasibility of the ADS-based speech synthesis was validated. ADS-to-speech transformation was achieved by the joint Gaussian mixture model-based conversion rules. The experimental results from the 5 subjects showed that filter bank energy and LPC (Linear Predictive Coefficient) cepstrum coefficients are the optimal features for ADS, and speech, respectively. In the subjective evaluation where synthesized speech signals were obtained using the excitation sources extracted from original speech signals, it was confirmed that the ADS-to-speech conversion method yielded 72.2 % average recognition rates.

An Experimental Multimodal Command Control Interface toy Car Navigation Systems

  • Kim, Kyungnam;Ko, Jong-Gook;SeungHo choi;Kim, Jin-Young;Kim, Ki-Jung
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.249-252
    • /
    • 2000
  • An experimental multimodal system combining natural input modes such as speech, lip movement, and gaze is proposed in this paper. It benefits from novel human-compute. interaction (HCI) modalities and from multimodal integration for tackling the problem of the HCI bottleneck. This system allows the user to select menu items on the screen by employing speech recognition, lip reading, and gaze tracking components in parallel. Face tracking is a supplementary component to gaze tracking and lip movement analysis. These key components are reviewed and preliminary results are shown with multimodal integration and user testing on the prototype system. It is noteworthy that the system equipped with gaze tracking and lip reading is very effective in noisy environment, where the speech recognition rate is low, moreover, not stable. Our long term interest is to build a user interface embedded in a commercial car navigation system (CNS).

  • PDF

A Mobile Newspaper Application Interface to Enhance Information Accessibility of the Visually Impaired (시각장애인의 정보 접근성 향상을 위한 모바일 신문 어플리케이션 인터페이스)

  • Lee, Seung Hwan;Hong, Seong Ho;Ko, Seung Hee;Choi, Hee Yeon;Hwang, Sung Soo
    • Journal of the HCI Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.5-12
    • /
    • 2016
  • The number of visually-impaired people using a smartphone is currently increasing with the help Text-to-Speech(TTS). TTS converts text data in a mobile application into sound data, and it only allows sequential search. For this reason, the location of buttons and contents inside an application should be determined carefully. However, little attention has been made on TTS service environment during the development of mobile newspaper application. This makes visually-impaired people difficult to use these applications. Furthermore, a mobile application interface which also reflects the desire of the low vision is necessary. Therefore, this paper presents a mobile newspaper interface which considers the accessibility and the desire of various visually impaired people. To this end, the proposed interface locates buttons with the consideration of TTS service environment and provides search functionality. The proposed interface also enables visually impaired people to use the application smoothly by filtering out the words that are pronounced improperly and providing the proper explanation for every button. Finally, several functionalities such as increasing font size and color reversal are implemented for the low vision. Simulation results show that the proposed interface achieves better performance than other applications in terms of search speed and usability.

Real-time implementation of the G.723.1 voice coder using DSP56362 (DSP56362를 이용한 G.723.1 음성코덱의 실시간 구현)

  • Lee, Jae-Sik;Son, Yong-Ki;Chang, Tae-Gyu;Min, Byoung-Ki
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.225-234
    • /
    • 2000
  • This paper describes the fixed-point DSP implementation of a CELP(Code-excited linear prediction)-based speech coder. The effective realization methodologies to maximize the utilization of the DSP's architectural features, specifically parallel movement and pipelining are also presented together with the implementation results targeted for the ITU-T standard G.723.1 using Motorola DSP56362. The operation of the implemented speech coder is verified using the test vectors offered by the standard as well as using the peripheral interface circuits designed for the coder's real-time operation.

  • PDF

Development of TTS for a Human-Robot Interface (휴먼-로봇 인터페이스를 위한 TTS의 개발)

  • Bae Jae-Hyun;Oh Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.135-138
    • /
    • 2006
  • The communication method between human and robot is one of the important parts for a human-robot interaction. And speech is easy and intuitive communication method for human-being. By using speech as a communication method for robot, we can use robot as familiar way. In this paper, we developed TTS for human-robot interaction. Synthesis algorithms were modified for an efficient utilization of restricted resource in robot. And synthesis database were reconstructed for an efficiency. As a result, we could reduce the computation time with slight degradation of the speech quality.

  • PDF

Interactive Rehabilitation Support System for Dementia Patients

  • Kim, Sung-Ill
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.221-225
    • /
    • 2010
  • This paper presents the preliminary study of an interactive rehabilitation support system for both dementia patients and their caregivers, the goal of which is to improve the quality of life(QOL) of the patients suffering from dementia through virtual interaction. To achieve the virtual interaction, three kinds of recognition modules for speech, facial image and pen-mouse gesture are studied. The results of both practical tests and questionnaire surveys show that the proposed system had to be further improved, especially in both speech recognition and user interface for real-world applications. The surveys also revealed that the pen-mouse gesture recognition, as one of possible interactive aids, show us a probability to support weakness of speech recognition.

IMPLEMENTATION OF REAL TIME RELP VOCODER ON THE TMS320C25 DSP CHIP

  • Kwon, Kee-Hyeon;Chong, Jong-Wha
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.957-962
    • /
    • 1994
  • Real-time RELP vocoder is implemented on the TMS320C25 DSP chip. The implemented system is IBM-PC add-on board and composed of analog in/out unit, DSP unit, memoy unit, IBM-PC interface unit and its supporting assembly software. Speech analyzer and synthesizer is implimented by DSP assembly software. Speech parameters such as LPC coefficients, base-band residuals, and signal gains is extracted by autocorrelation method and inverse filter and synthesized by spectral folding method and direct form synthesis filter in this board. And then, real-time RELP vocoder with 9.6Kbps is simulated by down-loading method in the DSP program RAM.

  • PDF