• Title/Summary/Keyword: Automatic Speech Recognition

Search Result 213, Processing Time 0.028 seconds

The Study on Automatic Speech Recognizer Utilizing Mobile Platform on Korean EFL Learners' Pronunciation Development (자동음성인식 기술을 이용한 모바일 기반 발음 교수법과 영어 학습자의 발음 향상에 관한 연구)

  • Park, A Young
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1101-1107
    • /
    • 2017
  • This study explored the effect of ASR-based pronunciation instruction, using a mobile platform, on EFL learners' pronunciation development. Particularly, this quasi-experimental study focused on whether using mobile ASR, which provides voice-to-text feedback, can enhance the perception and production of target English consonants minimal pairs (V-B, R-L, and G-Z) of Korean EFL learners. Three intact classes of 117 Korean university students were assigned to three groups: a) ASR Group: ASR-based pronunciation instruction providing textual feedback by the mobile ASR; b) Conventional Group: conventional face-to-face pronunciation instruction providing individual oral feedback by the instructor; and the c) Hybrid Group: ASR-based pronunciation instruction plus conventional pronunciation instruction. The ANCOVA results showed that the adjusted mean score for pronunciation production post-test on the Hybrid instruction group (M=82.71, SD =3.3) was significantly higher than the Conventional group (M=62.6, SD =4.05) (p<.05).

A Study on Analysis of Variant Factors of Recognition Performance for Lip-reading at Dynamic Environment (동적 환경에서의 립리딩 인식성능저하 요인분석에 대한 연구)

  • 신도성;김진영;이주헌
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.471-477
    • /
    • 2002
  • Recently, lip-reading has been studied actively as an auxiliary method of automatic speech recognition(ASR) in noisy environments. However, almost of research results were obtained based on the database constructed in indoor condition. So, we dont know how developed lip-reading algorithms are robust to dynamic variation of image. Currently we have developed a lip-reading system based on image-transform based algorithm. This system recognize 22 words and this word recognizer achieves word recognition of up to 53.54%. In this paper we present how stable the lip-reading system is in environmental variance and what the main variant factors are about dropping off in word-recognition performance. For studying lip-reading robustness we consider spatial valiance (translation, rotation, scaling) and illumination variance. Two kinds of test data are used. One Is the simulated lip image database and the other is real dynamic database captured in car environment. As a result of our experiment, we show that the spatial variance is one of degradations factors of lip reading performance. But the most important factor of degradation is not the spatial variance. The illumination variances make severe reduction of recognition rates as much as 70%. In conclusion, robust lip reading algorithms against illumination variances should be developed for using lip reading as a complementary method of ASR.

Classification of Doppler Audio Signals for Moving Target Using Hidden Markov Model in Pulse Doppler Radar (펄스 도플러 레이더에서 HMM을 이용한 이동표적의 도플러 오디오 신호 식별)

  • Sim, Jae-Hun;Lee, Jung-Ho;Bae, Keun-Sung
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.624-629
    • /
    • 2018
  • Classification of moving targets in Pulse Doppler Radar(PDR) for surveillance and reconnaissance purposes is generally carried out based on listening and training experience of Doppler audio signals by radar operator. In this paper, we proposed the automatic classification method to identify the class of moving target with Doppler audio signals using the Mel Frequency Cepstral Coefficients(MFCC) and the Hidden Markov Model(HMM) algorithm which are widely used in speech recognition and the classification performance was analyzed and verified by simulations.

A Study of Hybrid Automatic Interpret Support System (하이브리드 자동 통역지원 시스템에 관한 연구)

  • Lim, Chong-Gyu;Gang, Bong-Gyun;Park, Ju-Sik;Kang, Bong-Kyun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.28 no.3
    • /
    • pp.133-141
    • /
    • 2005
  • The previous research has been mainly focused on individual technology of voice recognition, voice synthesis, translation, and bone transmission technical. Recently, commercial models have been produced using aforementioned technologies. In this research, a new automated translation support system concept has been proposed by combining established technology of bone transmission and wireless system. The proposed system has following three major components. First, the hybrid system consist of headset, bone transmission and other technologies will recognize user's voice. Second, computer recognized voice (using small server attached to the user) of the user will be converted into digital signal. Then it will be translated into other user's language by translation algorithm. Third, the translated language will be wirelessly transmitted to the other party. The transmitted signal will be converted into voice in the other party's computer using the hybrid system. This hybrid system will transmit the clear message regardless of the noise level in the environment or user's hearing ability. By using the network technology, communication between users can also be clearly transmitted despite the distance.

A Study on subtitle synchronization calibration to enhance hearing-impaired persons' viewing convenience of e-sports contents or game streamer contents (청각장애인의 이스포츠 중계방송 및 게임 스트리머 콘텐츠 시청 편의성 증대를 위한 자막 동기화 보정 연구)

  • Shin, Dong-Hwan;Kim, Jeong-Soo;Kim, Chang-Won
    • Journal of Korea Game Society
    • /
    • v.19 no.1
    • /
    • pp.73-84
    • /
    • 2019
  • This study is intended to suggest ways to improve the quality of the service of subtitles provided for the convenience of viewing for deaf people on e-sports broadcast content and game streamer content. Generally, subtitling files of broadcast content are manually written on air by stenographers, so a delay of 3 to 5 seconds is inevitable compared to the original content. Therefore, the present study proposed the formation of an automatic synchronization calibration system using speech recognition technology. In addition, a content application experiment using this system was conducted, and the final result confirmed that the time of synchronization error of subtitling data could be reduced to less than 1 second.

A Korean Large Vocabulary Speech Recognition System for Automatic Telephone Number Query Service (자동 전화번호 안내를 위한 한국어 대용량 음성 인식 시스템)

  • 구준모;김형순;은종관
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.1E
    • /
    • pp.86-97
    • /
    • 1992
  • 인식어휘수가 1160단어이며 자동 전화번호 안내에 사용될 수 있는 한국어 대용량 음성 인식 시 스템에 관하여 소개하였다. 이 시스템은 네 개의 부시스템으로 구성되어 있다. 첫 번째는 HMM 방식으 로 입력음성중의 단어를 인식하는 처리부에서 인식할 어휘를 제한하므로써 인식시간을 감축시켜 주는 인식 시간 감축부이다. 이 부시스템은 언어학적 정보뿐만 아니라 음향학적 정보도 이용한다. 마지막은 음성인식 시스템의 파라미터를 새로운 화자의 음성에 신속하게 적응시켜 주는 화자적응부이다. 마지막 부시스템은 VQ 적응방식과 스펙트럼 mapping 방식에 근거한 HMM 파라미터 적응방식을 이용한다. 또 한, 본 논문에서는 대용량 음성인식 시스템의 성능을 향상시키기 위한 최근의 연구결과들에 관하여 살 펴보았다. 이 연구들은 화자 독립 음성인식을 위한 음향학적 처리부와 인식 시간 감축부의 성능향상에 초점이 맞추어져 있다. 마지막으로 화자적응을 위한 새로운 연구결과라도 기술하였다.

  • PDF

Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation

  • Jeon, Hyung-Bae;Lee, Soo-Young
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.487-493
    • /
    • 2016
  • Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the now trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the now trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis, non-negative matrix factorization, and LDA with n-gram counting.

A Study On The ASP Module Using VoiceMXL in Automatic Speech Recognition System (VoiceXML을 이용한 음성 인식시스템에서의 ASP 모듈 연구)

  • 장준식;김민석;윤재석
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.609-612
    • /
    • 2001
  • In this research, it has been shown that how the computer can recognize and understand spoken natural language and its symbolization using VoiceXML and Grammar Specific Language. In order for user to hear correct information, ASP Module has been revised and its effectivities has been experimented on the Voice portal airplane information system platform.

  • PDF

A Study on the Development of Automatic Schedule Management System through Speech Recognition Text Analysis (음성인식 텍스트 분석을 통한 자동 일정 관리 시스템 개발에 관한 연구)

  • Lee, Hae-Mi;Cho, We-Duke
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.279-282
    • /
    • 2022
  • 컴퓨터가 마이크 등의 소리 센서를 통해 얻은 음향학적 신호를 단어나 문장으로 변환시키는 기술인 음성 인식 기술과 인공지능 기술을 결합한 음성 대화 시스템에 대한 연구 진행 및 제품 출시가 활발하게 이루어지고 있다. 기존의 시스템을 사용하면서 날짜와 시간 외의 정보 추출 정도가 빈약하거나 자동 등록이 되지 않는 문제점을 확인하였다. 음성 인식 기술을 통해 얻은 텍스트에서 보다 많은 정보를 추출하고, 자동 등록 및 알림과 맛집 등 추가 정보 제공 시스템을 구축하는 것을 목표로 하였다.

Joint CTC/Attention Korean ASR with CTC Ratio Scheduling (CTC Ratio Scheduling을 이용한 Joint CTC/Attention 한국어 음성인식)

  • Moon, YoungKi;Jo, YongRae;Cho, WonIk;Jo, GeunSik
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.37-41
    • /
    • 2020
  • 본 논문에서는 Joint CTC/Attention 모델에 CTC ratio scheduling을 이용한 end-to-end 한국어 음성인식을 연구하였다. Joint CTC/Attention은 CTC와 attention의 장점을 결합한 모델로서 attention, CTC 단일 모델보다 좋은 성능을 보여주지만, 학습이 진행될수록 CTC가 attention의 학습을 저해하는 요인이 된다. 본 논문에서는 이러한 문제를 해결하기 위해, 학습 진행에 따라 CTC의 비율(ratio)를 줄여나가는 CTC ratio scheduling 방법을 제안한다. CTC ratio scheduling를 이용하여 학습한 결과물은 기존 Joint CTC/Attention, 단일 attention 모델 대비 좋은 성능을 보여주는 것을 확인하였다.

  • PDF