• Title/Summary/Keyword: Automatic Speech Recognition

Search Result 213, Processing Time 0.034 seconds

Synchronization of VOD Content and Captions Using Speech Recognition and Modified Dynamic Programming (음성인식과 변경된 동적계획법을 이용한 VOD 콘텐트와 자막의 동기화)

  • Oh, Juhyun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2021.06a
    • /
    • pp.131-134
    • /
    • 2021
  • 지상파 방송에서는 청각장애인을 위해 폐쇄자막(closed caption) 서비스가 제공되고 있지만, 이를 저장하여 VOD 서비스 등에 제공하고자 할 때는 영상과의 비동기화(desynchronization) 문제로 인해 활용할 수 없는 문제가 있다. 본 논문에서는 이를 해결하기 위해 자동 음성인식(automatic speech recognition)과, 자막 동기화 문제에 맞게 변경된 동적계획법(modified dynamic programming)을 이용하는 방법을 제안한다. 문자열 정렬에서 삽입과 삭제 등 간격(gap)의 발생을 제어하는 제약조건과 그에 따른 점수 구조를 적용함으로써 문자열 정렬 성능을 개선한다. 또한 정렬된 폐쇄자막과 음성인식 문자열로부터 시간 동기정보를 복원하고 동기화된 자막을 생성하는 방법을 제안한다. 실제 TV 프로그램과 자막에 적용하여 기존 방법에 비해 성능의 향상이 있음을 확인하였다.

  • PDF

Categorization and Analysis of Error Types in the Korean Speech Recognition System (한국어 음성 인식 시스템의 오류 유형 분류 및 분석)

  • Son, Junyoung;Park Chanjun;Seo, Jaehyung;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.144-151
    • /
    • 2021
  • 딥러닝의 등장으로 자동 음성 인식 (Automatic Speech Recognition) 기술은 인간과 컴퓨터의 상호작용을 위한 가장 중요한 요소로 자리 잡았다. 그러나 아직까지 유사 발음 오류, 띄어쓰기 오류, 기호부착 오류 등과 같이 해결해야할 난제들이 많이 존재하며 오류 유형에 대한 명확한 기준 정립이 되고 있지 않은 실정이다. 이에 본 논문은 음성 인식 시스템의 오류 유형 분류 기준을 한국어에 특화되게 설계하였으며 이를 다양한 상용화 음성 인식 시스템을 바탕으로 질적 분석 및 오류 분류를 진행하였다. 실험의 경우 도메인과 어투에 따른 분석을 각각 진행하였으며 이를 통해 각 상용화 시스템별 강건한 부분과 약점인 부분을 파악할 수 있었다.

  • PDF

A Study on the Automatic Recognition of Korean Basic Spoken Digit Using Energy of Special Bandwidth (특정 대역 에너지를 이용한 한국어 기본 수자 음성의 백동 인식에 관한 연구)

  • Han, Hee;Kim, Soon-Hyob;Park, Kyu-Tae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.19 no.3
    • /
    • pp.5-12
    • /
    • 1982
  • Through the use of energy ratio of special bandwidths of basic vowels, recognition of Korean basic spoken digit is performed in logical combination with a zero-crossing rate and an energy parameter. In the experiments for recognition of the digits, the speech signal of spoken digits is filtered by a lowpass filter of which the cutoff frequency is 10KHz, and then sampled at 20KHz of sampling rate, In the speech signal processing, we used four FIR digital filters, and the order of filter lengths is 61, 120, 25, 25respectively. The filters are designed by using Remetz exchange algorithm.[13],[14] As a result, the recognition rate of 92% for the three speakers is obstained.

  • PDF

Analysis of Korean Predicative Verb Forms in LAG Framework

  • Kim, Soora
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.177-186
    • /
    • 2002
  • Korean predicative verb forms obligatorily denote the three categories speech level, mood and sentence type which are not handled by most of the automatic word form recognition systems for this language. These categories are marked by special endings. This paper examines predicative verb forms concentrating on the lexical description of these endings in the framework of Left-Associative Grammar (LAG). Additionally this paper suggests a system to analyse verb forms in these aspects. The results of this study have been implemented using Malaga$^2$ and integrated into an automatic word form recognition system for Korerin called KMM (Korean Malaya Morphology).

  • PDF

Development of an Embedded System for Ship′s Steering Gear using Voice Recognition Module (음성인식모듈을 이용한 선박조타용 임베디드 시스템 개발)

  • 서기열;홍태호;김화영;박계각
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.04a
    • /
    • pp.144-148
    • /
    • 2004
  • Recently, various studies had been made for automatic control system of small ships, in order to improve maneuvering and to reduce labor and working on board. To achieve efficient operation of small ships, it had accomplished to rapid development of automatic technique, but the ship operation had been more complicated because of the need to handle various gauges and instruments. To solve these problems, there are examples to be applied to the speech information processing technologies which is one of the human interface methods in the system operation of ship, but the implementation of definite system is still incomplete. Therefore, the purpose of this paper is to implement the control system for ship steering using the voice recognition module.

  • PDF

Design and Implementation of Context-aware Application on Smartphone Using Speech Recognizer

  • Kim, Kyuseok
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.49-59
    • /
    • 2020
  • As technologies have been developing, our lives are getting easier. Today we are surrounded by the new technologies such as AI and IoT. Moreover, the word, "smart" is a very broad one because we are trying to change our daily environment into smart one by using those technologies. For example, the traditional workplaces have changed into smart offices. Since the 3rd industrial revolution, we have used the touch interface to operate the machines. In the 4th industrial revolution, however, we are trying adding the speech recognition module to the machines to operate them by giving voice commands. Today many of the things are communicated with human by voice commands. Many of them are called AI things and they do tasks which users request and do tasks more than what users request. In the 4th industrial revolution, we use smartphones all the time every day from the morning to the night. For this reason, the privacy using phone is not guaranteed sometimes. For example, the caller's voice can be heard through the phone speaker when accepting a call. So, it is needed to protect privacy on smartphone and it should work automatically according to the user context. In this aspect, this paper proposes a method to adjust the voice volume for call to protect privacy on smartphone according to the user context.

The Interactive Voice Services based on VoiceXML (VoiceXML 기반 음성인식시스템을 이용한 서비스 개발)

  • Kim Hak-Gyoon;Kim Eun-Hyang;Kim Jae-In;Koo Myoung-Wan
    • MALSORI
    • /
    • no.43
    • /
    • pp.113-125
    • /
    • 2002
  • As there are needs to search the Web information via wire or wireless telephones, VoiceXML forum was established to develop and promote the Voice eXtensible Markup Language (VoiceXML). VoiceXML simplifies the creation of personalized interactive voice response services on the Web, and allows voice and phone access to information on Web sites, call center databases. Also, it can utilize the Web-based technologies, such as CGI(Common Gateway Interface) scripts. In this paper, we have developed the voice portal service platform based on VoiceXML called TeleGateway. It enables integration of voice services with data services using the Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines. Also, we have showed the various services on voice portal services.

  • PDF

Model based Stress Decision Method (모델 기반의 강세 판정 방법)

  • Kim, Woo-Il;Koh, Hoon;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.49-57
    • /
    • 2000
  • This paper proposes an effective decision method focused on evaluating the 'stress position'. Conventional methods usually extract the acoustic parameters and compare them to references in absolute scale, adversely producing unstable results as testing conditions change. To cope with environmental dependency, the proposed method is designed to be model-based and determines the stressed interval by making relative comparison over candidates. The stressed/unstressed models are then induced from normal phone models by adaptive training. The experimental results indicate that the proposed method is promising, and that it is useful for automatic detection of stress positions. The results also show that generating the stressed/unstressed model by adaptive training is effective.

  • PDF

An Enhancement of Japanese Acoustic Model using Korean Speech Database (한국어 음성데이터를 이용한 일본어 음향모델 성능 개선)

  • Lee, Minkyu;Kim, Sanghun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.5
    • /
    • pp.438-445
    • /
    • 2013
  • In this paper, we propose an enhancement of Japanese acoustic model which is trained with Korean speech database by using several combination strategies. We describe the strategies for training more than two language combination, which are Cross-Language Transfer, Cross-Language Adaptation, and Data Pooling Approach. We simulated those strategies and found a proper method for our current Japanese database. Existing combination strategies are generally verified for under-resourced Language environments, but when the speech database is not fully under-resourced, those strategies have been confirmed inappropriate. We made tyied-list with only object-language on Data Pooling Approach training process. As the result, we found the ERR of the acoustic model to be 12.8 %.

Automatic Generation of Training Data for Korean Speech Recognition Post-Processor (한국어 음성인식 후처리기를 위한 학습 데이터 자동 생성 방안)

  • Seonmin Koo;Chanjun Park;Hyeonseok Moon;Jaehyung Seo;Sugyeong Eo;Yuna Hur;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.465-469
    • /
    • 2022
  • 자동 음성 인식 (Automatic Speech Recognition) 기술이 발달함에 따라 자동 음성 인식 시스템의 성능을 높이기 위한 방법 중 하나로 자동 후처리기 연구(automatic post-processor)가 진행되어 왔다. 후처리기를 훈련시키기 위해서는 오류 유형이 포함되어 있는 병렬 말뭉치가 필요하다. 이를 만드는 간단한 방법 중 하나는 정답 문장에 오류를 삽입하여 오류 문장을 생성하여 pseudo 병렬 말뭉치를 만드는 것이다. 하지만 이는 실제적인 오류가 아닐 가능성이 존재한다. 이를 완화시키기 위하여 Back TranScription (BTS)을 이용하여 후처리기 모델 훈련을 위한 병렬 말뭉치를 생성하는 방법론이 존재한다. 그러나 해당 방법론으로 생성 할 경우 노이즈가 적을 수 있다는 관점이 존재하다. 이에 본 연구에서는 BTS 방법론과 인위적으로 노이즈 강도를 추가한 방법론 간의 성능을 비교한다. 이를 통해 BTS의 정량적 성능이 가장 높은 것을 확인했을 뿐만 아니라 정성적 분석을 통해 BTS 방법론을 활용하였을 때 실제 음성 인식 상황에서 발생할 수 있는 실제적인 오류를 더 많이 포함하여 병렬 말뭉치를 생성할 수 있음을 보여준다.

  • PDF