Hybrid CTC-Attention Based End-to-End Speech Recognition Using Korean Grapheme Unit

Park, Hosung;Lee, Donghyun;Lim, Minkyu;Kang, Yoseb;Oh, Junseok;Seo, Soonshin;Rim, Daniel;Kim, Ji-Hwan;

Annual Conference on Human and Language Technology (한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리))

2018.10a
/
Pages.453-458
/
2018
/
2005-3053(pISSN)

Human and Language Technology (한국정보과학회 언어공학연구회)

Hybrid CTC-Attention Based End-to-End Speech Recognition Using Korean Grapheme Unit

한국어 자소 기반 Hybrid CTC-Attention End-to-End 음성 인식

Park, Hosung (Sogang University, Department of Computer Science and Engineering) ;
Lee, Donghyun (Sogang University, Department of Computer Science and Engineering) ;
Lim, Minkyu (Sogang University, Department of Computer Science and Engineering) ;
Kang, Yoseb (Sogang University, Department of Computer Science and Engineering) ;
Oh, Junseok (Sogang University, Department of Computer Science and Engineering) ;
Seo, Soonshin (Sogang University, Department of Computer Science and Engineering) ;
Rim, Daniel (Sogang University, Department of Computer Science and Engineering) ;
Kim, Ji-Hwan (Sogang University, Department of Computer Science and Engineering)

박호성 (서강대학교, 컴퓨터공학과) ;
이동현 (서강대학교, 컴퓨터공학과) ;
임민규 (서강대학교, 컴퓨터공학과) ;
강요셉 (서강대학교, 컴퓨터공학과) ;
오준석 (서강대학교, 컴퓨터공학과) ;
서순신 (서강대학교, 컴퓨터공학과) ;
;
김지환 (서강대학교, 컴퓨터공학과)

Published : 2018.10.12

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

본 논문은 한국어 자소를 인식 단위로 사용한 hybrid CTC-Attention 모델 기반 end-to-end speech recognition을 제안한다. End-to-end speech recognition은 기존에 사용된 DNN-HMM 기반 음향 모델과 N-gram 기반 언어 모델, WFST를 이용한 decoding network라는 여러 개의 모듈로 이루어진 과정을 하나의 DNN network를 통해 처리하는 방법을 말한다. 본 논문에서는 end-to-end 모델의 출력을 추정하기 위해 자소 단위의 출력구조를 사용한다. 자소 기반으로 네트워크를 구성하는 경우, 추정해야 하는 출력 파라미터의 개수가 11,172개에서 49개로 줄어들어 보다 효율적인 학습이 가능하다. 이를 구현하기 위해, end-to-end 학습에 주로 사용되는 DNN 네트워크 구조인 CTC와 Attention network 모델을 조합하여 end-to-end 모델을 구성하였다. 실험 결과, 음절 오류율 기준 10.05%의 성능을 보였다.

Annual Conference on Human and Language Technology (한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리))

Hybrid CTC-Attention Based End-to-End Speech Recognition Using Korean Grapheme Unit

한국어 자소 기반 Hybrid CTC-Attention End-to-End 음성 인식

Abstract

Keywords