Automatic Conversion of English Pronunciation Using Sequence-to-Sequence Model

Lee, Kong Joo;Choi, Yong Seok;

doi:10.3745/KTSDE.2017.6.5.267

KIPS Transactions on Software and Data Engineering (정보처리학회논문지:소프트웨어 및 데이터공학)

Volume 6 Issue 5
/
Pages.267-278
/
2017
/
2287-5905(pISSN)
/
2734-0503(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Automatic Conversion of English Pronunciation Using Sequence-to-Sequence Model

Sequence-to-Sequence Model을 이용한 영어 발음 기호 자동 변환

이공주 (충남대학교 전파정보통신공학과) ;
최용석 (충남대학교 전자전파정보통신공학과)

Received : 2017.01.18
Accepted : 2017.02.10
Published : 2017.05.31

https://doi.org/10.3745/KTSDE.2017.6.5.267 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

As the same letter can be pronounced differently depending on word contexts, one should refer to a lexicon in order to pronounce a word correctly. Phonetic alphabets that lexicons adopt as well as pronunciations that lexicons describe for the same word can be different from lexicon to lexicon. In this paper, we use a sequence-to-sequence model that is widely used in deep learning research area in order to convert automatically from one pronunciation to another. The 12 seq2seq models are implemented based on pronunciation training data collected from 4 different lexicons. The exact accuracy of the models ranges from 74.5% to 89.6%. The aim of this study is the following two things. One is to comprehend a property of phonetic alphabets and pronunciations used in various lexicons. The other is to understand characteristics of seq2seq models by analyzing an error.

영어는 동일 철자의 발음이 매우 다양한 언어이기 때문에 사전에 기술되어 있는 단어의 발음기호를 읽어야만 정확한 발음을 알 수 있다. 영어 사전마다 사용하는 발음기호(phonetic alphabet) 시스템이 다르며 같은 단어에 대해 기술하고 있는 발음 역시 다르다. 본 연구에서는 최근 딥 러닝 분야에서 널리 사용되고 있는 sequence-to-sequence (seq2seq) model을 이용하여 사전마다 다른 발음을 자동으로 변환해 보고자 한다. 4가지 다른 종류의 사전에서 추출한 발음 데이터를 이용하여 모두 12개의 seq2seq model을 구현하였으며, 발음 자동 변환 모듈의 정확 일치율은 74.5% ~ 89.6%의 성능을 보였다. 본 연구의 주요 목적은 다음의 두 가지이다. 첫째 영어 발음기호 시스템과 각 사전의 발음 데이터 특성을 살펴보는 것이고, 둘째, 발음 정보의 자동 변환과 오류 분석을 통해 seq2seq model의 특성을 살펴보는 것이다.

Keywords

References

Jurafsky, Dan. Speech & language processing, Pearson Education India, 2000.
N. M. Hosseinzadeh, A. K. Z. Kambuziya, and M. Shariati, "British and American phonetic varieties," Journal of Language Teaching and Research, Vol.6, No.3, pp.647-655. 2015. https://doi.org/10.17507/jltr.0603.23
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078. 2014.
RNNs in Tensorflow, a Practical Guide and Undocumented Features [Internet], http://www.wildml.com/2016/08/rnns-intensorflow-a-practical-guide-and-undocumented-features/.
I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in NIPS-2014, pp.3104-3112, 2014.
A. Finch, and E. Sumita, "Phrase-based machine transliteration," in Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST), pp.13-18, 2008.
A. Finch, P. Dixon, and E. Sumita, "Rescoring a phrase-based machine transliteration system with recurrent neural network language models," in Proceedings of the 4th Named Entity Workshop, Association for Computational Linguistics, pp. 47-51, 2012.
A. Finch, L. Liu, X. Wang, and E. Sumita, "Neural network transduction models in transliteration generation," in Proceedings of NEWS 2015 The Fifth Named Entities Workshop, p.61, 2015.