Common Speech Database Collection for Telecommunications

통신망환경 한국어 공통음성 DB 구축

  • 김상훈 (한국전자통신연구원 컴퓨터소프트웨어연구소 음성/언어정보연구센터) ;
  • 박문환 (한국전자통신연구원 컴퓨터소프트웨어연구소 음성/언어정보연구센터) ;
  • 김현숙 (한국전자통신연구원 컴퓨터소프트웨어연구소 음성/언어정보연구센터)
  • Published : 2003.05.01

Abstract

This paper presents common speech database collection for telecommunication applications. During 3 year project, we will construct very large scale speech and text databases for speech recognition, speech synthesis, and speaker identification. The common speech database has been considered various communication environments, distribution of speakers' sex, distribution of speakers' age, and distribution of speakers' region. It consists of Korean continuous digit, isolated words, and sentences which reflects Korean phonetic coverage. In addition, it consists of various pronunciation style such as read speech, dialogue speech, and semi-spontaneous speech. Thanks to the common speech databases, the duplicated resources of Korean speech industries are prohibited. It encourages domestic speech industries and activate speech technology domestic market.

Keywords