• Title/Summary/Keyword: Korean Speech Engineering Systems

Search Result 105, Processing Time 0.019 seconds

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.

Study on development of the remote control door lock system including speeker verification function in real time (화자 인증 기능이 포함된 실시간 원격 도어락 제어 시스템 개발에 관한 연구)

  • Kwon, Soon-Ryang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.6
    • /
    • pp.714-719
    • /
    • 2005
  • The paper attempts to design and implement the system which can remotely check visitors' speech or Image by a mobile phone. This system is designed to recognize who a visitor is through the automatic calling service, not through a short message, via the mobile phone, even when the home owner is outside. In general, door locks are controlled through the home Server, but it is more effective to control door locks by using DTMF signal from a real-time point of view. The technology suggested in this paper makes it possible to communicate between the visiter and the home owner by making a phone call to tile home owner's mobile phone automatically when the visiter visits the house even if the home owner is outside, and if necessary, it allows for the home owner to control the door lock remotely. Thanks to the system, the home owner is not restricted by time or space for checking the visitor's identification and controlling the door lock. In addition, the security system is improved by changing from the existing password form to the combination of password and speaker verification lot the verification procedure required for controlling the door lock and setting the environment under consideration of any disadvantages which may occur when the mobile Phone is lost. Also, any existing problems such as reconnection to tile network for controlling tile door lock are solved by controlling the door lock in real time by use of DTMF signal while on the phone.

Web-based Text-To-Sign Language Translating System (웹기반 청각장애인용 수화 웹페이지 제작 시스템)

  • Park, Sung-Wook;Wang, Bo-Hyeun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.3
    • /
    • pp.265-270
    • /
    • 2014
  • Hearing-impaired people have difficulty in hearing, so it is also hard for them to learn letters that represent sound and text that conveys complex and abstract concepts. Therefore it has been natural choice for the hearing-impaired people to use sign language for communication, which employes facial expression, and hands and body motion. However, the major communication methods in daily life are text and speech, which are big obstacles for the hearing-impaired people to access information, to learn and make intellectual activities, and to get jobs. As delivering information via internet become common the hearing-impaired people are experiencing more difficulty in accessing information since internet represents information mostly in text forms. This intensifies unbalance of information accessibility. This paper reports web-based text-to-sign language translating system that helps web designer to use sign language in web page design. Since the system is web-based, if web designers are equipped with common computing environment for internet browsing, they can use the system. The web-based text-to-sign language system takes the format of bulletin board as user interface. When web designers write paragraphs and post them through the bulletin board to the translating server, the server translates the incoming text to sign language, animates with 3D avatar and records the animation in a MP4 file. The file addresses are fetched by the bulletin board and it enables web designers embed the translated sign language file into their web pages by using HTML5 or Javascript. Also we analyzed text used by web pages of public services, then figured out new words to the translating system, and added to improve translation. This addition is expected to encourage wide and easy acceptance of web pages for hearing-impaired people to public services.

Modeling of Sensorineural Hearing Loss for the Evaluation of Digital Hearing Aid Algorithms (디지털 보청기 알고리즘 평가를 위한 감음신경성 난청의 모델링)

  • 김동욱;박영철
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.1
    • /
    • pp.59-68
    • /
    • 1998
  • Digital hearing aids offer many advantages over conventional analog hearing aids. With the advent of high speed digital signal processing chips, new digital techniques have been introduced to digital hearing aids. In addition, the evaluation of new ideas in hearing aids is necessarily accompanied by intensive subject-based clinical tests which requires much time and cost. In this paper, we present an objective method to evaluate and predict the performance of hearing aid systems without the help of such subject-based tests. In the hearing impairment simulation(HIS) algorithm, a sensorineural hearing impairment medel is established from auditory test data of the impaired subject being simulated. Also, the nonlinear behavior of the loudness recruitment is defined using hearing loss functions generated from the measurements. To transform the natural input sound into the impaired one, a frequency sampling filter is designed. The filter is continuously refreshed with the level-dependent frequency response function provided by the impairment model. To assess the performance, the HIS algorithm was implemented in real-time using a floating-point DSP. Signals processed with the real-time system were presented to normal subjects and their auditory data modified by the system was measured. The sensorineural hearing impairment was simulated and tested. The threshold of hearing and the speech discrimination tests exhibited the efficiency of the system in its use for the hearing impairment simulation. Using the HIS system we evaluated three typical hearing aid algorithms.

  • PDF

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.