• Title/Summary/Keyword: Automatic Speech Recognition

Search Result 213, Processing Time 0.035 seconds

A Study On the ASP Module in Conversational Automatic Speech Recognition Flight Information System (대화형 음성 인식 항공정보 시스템에서의 ASP 모듈에 관한 연구)

  • 윤재석;장준식
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.4
    • /
    • pp.595-603
    • /
    • 2002
  • In this research, it has been shown that how the computer can recognize and understand spoken natural language and its symbolization using VoiceXML and Grammar Specific Language in developing telephone based conversational automatic speech recognition flight information system. In order for user to hear correct information, ASP Module has been revised and its effectivities has been experimented on the Voice portal airplane information system platform.

ASR (Automatic Speech Recognition)-based welfare information search model to prevent digital alienation of the elderly (고령층의 디지털 소외 방지를 위한 ASR(Automatic Speech Recognition, 음성 인식 기술) 기반 복지 정보 검색 모델 연구)

  • Jang-Won Ha;Hwa-Rang Im;Dong-Gue Jung;Hye-won Lee;Youngjong Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.771-772
    • /
    • 2023
  • 복지 정보와 인터넷 사용에 대한 이해도가 낮은 고령층의 디지털 소외 문제를 해결하고자, 고령층 친화 UI/UX 및 음성 인식 기술 등의 기술을 활용한 <고령층의 디지털 소외 방지를 위한 ASR 기반 복지 정보 검색 모델>의 개발을 제안한다.

Acoustic-Phonetic Phenotypes in Pediatric Speech Disorders;An Interdisciplinary Approach

  • Bunnell, H. Timothy
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.31-36
    • /
    • 2006
  • Research in the Center for Pediatric Auditory and Speech Sciences (CPASS) is attempting to characterize or phenotype children with speech delays based on acoustic-phonetic evidence and relate those phenotypes to chromosome loci believed to be related to language and speech. To achieve this goal we have adopted a highly interdisciplinary approach that merges fields as diverse as automatic speech recognition, human genetics, neuroscience, epidemiology, and speech-language pathology. In this presentation I will trace the background of this project, and the rationale for our approach. Analyses based on a large amount of speech recorded from 18 children with speech delays will be presented to illustrate the approach we will be taking to characterize the acoustic phonetic properties of disordered speech in young children. The ultimate goal of our work is to develop non-invasive and objective measures of speech development that can be used to better identify which children with apparent speech delays are most in need of, or would receive the most benefit from the delivery of therapeutic services.

  • PDF

Noisy Speech Recognition using Probabilistic Spectral Subtraction (확률적 스펙트럼 차감법을 이용한 잡은 환경에서의 음성인식)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.94-99
    • /
    • 1997
  • This paper describes a technique of probabilistic spectral subtraction which uses the knowledge of both noise and speech so as to reduce automatic speech recognition errors in noisy environments. Spectral subtraction method estimates a noise prototype in non-speech intervals and the spectrum of clean speech is obtained from the spectrum of noisy speech by subtracting this noise prototype. Thus noise can not be suppressed effectively using a single noise prototype in case the characteristics of the noise prototype are different from those of the noise contained in input noisy speech. To modify such a drawback, multiple noise prototypes are used in probabilistic subtraction method. In this paper, the probabilistic characteristics of noise and the knowledge of speech which is embedded in hidden Markov models trained in clean environments are used to suppress noise. Futhermore, dynamic feature parameters are considered as well as static feature parameters for effective noise suppression. The proposed method reduced error rates in the recognition of 50 Korean words. The recognition rate was 86.25% with the probabilistic subtraction, 72.75% without any noise suppression method and 80.25% with spectral subtraction at SNR(Signal-to-Noise Ratio) 10 dB.

  • PDF

Study on the Improvement of Speech Recognizer by Using Time Scale Modification (시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.462-472
    • /
    • 2004
  • In this paper a method for compensating for thp performance degradation or automatic speech recognition (ASR) is proposed. which is mainly caused by speaking rate variation. Before the new method is proposed. quantitative analysis of the performance of an HMM-based ASR system according to speaking rate is first performed. From this analysis, significant performance degradation was often observed in the rapidly speaking speech signals. A quantitative measure is then introduced, which is able to represent speaking rate. Time scale modification (TSM) is employed to compensate the speaking rate difference between input speech signals and training speech signals. Finally, a method for compensating the performance degradation caused by speaking rate variation is proposed, in which TSM is selectively employed according to speaking rate. By the results from the ASR experiments devised for the 10-digits mobile phone number, it is confirmed that the error rate was reduced by 15.5% when the proposed method is applied to the high speaking rate speech signals.

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

A new approach technique on Speech-to-Speech Translation (신호의 복원된 위상 공간을 이용한 오디오 상황 인지)

  • Le, Thanh Hien;Lee, Sung-young;Lee, Young-Koo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.239-240
    • /
    • 2009
  • We live in a flat world in which globalization fosters communication, travel, and trade among more than 150 countries and thousands of languages. To surmount the barriers among these languages, translation is required; Speech-to-Speech translation will automate the process. Thanks to recent advances in Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), one can now utilize a system to translate a speech of source language to a speech of target language and vice versa in affordable manner. The three phase process establishes that the source speech be transcribed into a (set of) text of the source language (ASR) before the source text is translated into the target text (MT). Finally, the target speech is synthesized from the target text (TTS).

Development of Automatic Creating Web-Site Tool for the Blind (시각장애인용 웹사이트 자동생성 툴 개발)

  • Baek, Hyeun-Ki;Ha, Tai-Hyun
    • Journal of Digital Contents Society
    • /
    • v.8 no.4
    • /
    • pp.467-474
    • /
    • 2007
  • This paper documents the design and implementation of an automatic creating web-site tool for the blind to build their own homepage by using both voice recognition and voice mixed technology with equal ease as the non-disabled. The blind can make voice mails, schedules, address lists and bookmarks by making use of the tool. It also facilitates communication between the non-disabled with the help of their information management system. This tool converts basic commands into voice recognition, also making an offer of text-to-speech which supports voice output. In the end, the tool will remove the blind's social isolation, allowing them to enjoy the information age like the non-disabled.

  • PDF

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

  • HyunJung Choi;Muyeol Choi;Seonhui Kim;Yohan Lim;Minkyu Lee;Seung Yun;Donghyun Kim;Sang Hun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.127-136
    • /
    • 2024
  • The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.

A Study on Learning Mathematics for Machine Learning

  • Jun, Sang Pyo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.1
    • /
    • pp.257-263
    • /
    • 2019
  • This paper is a study on mathematical aspects that can be basic for understanding and applying the contents of machine learning. If you are familiar with mathematics in the field of computer science, you can create algorithms that can diversify researches and implement them faster, so you can implement many real-life ideas. There is no curriculum standard for mathematics in the field of machine learning, and there are many absolutely lacking mathematical contents that are taught in the curriculum presented at existing universities. Machine learning now includes speech recognition systems, search engines, automatic driving systems, process automation, object recognition, and more. Many applications that you want to implement combine a large amount of data with many variables into the components that the programmer generates. In this course, the mathematical areas required for computer engineer (CS) practitioners and computer engineering educators have become diverse and complex. It is important to analyze the mathematical content required by engineers and educators and the mathematics required in the field. This paper attempts to present an effective range design for the essential processes from the basic education content to the deepening education content for the development of many researches.