[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.29220/CSAM.2020.27.3.313

Understanding recurrent neural network for texts using English-Korean corpora

Lee, Hagyeong (Department of Statistics, Ewha Womans University)
Song, Jongwoo (Department of Statistics, Ewha Womans University)

Publication Information

Communications for Statistical Applications and Methods / v.27, no.3, 2020 , pp. 313-326 More about this Journal

Abstract

Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well.

Keywords

RNN; NLP; Seq2Seq; Neural Machine Translation; Keras;

Citations & Related Records

Reference

1	Jozefowicz R, Zaremba W, and Sutskever I (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML'15), pp 2342-2350.
2	Kyubyong P (2016). Pre-trained word vectors of 30+ languages. Available from: https://github.com/Kyubyong/wordvectors
3	Mikolov T, Chen K, Corrado G, and Dean J (2013). Efficient Estimation of Word Representations in Vector Space, arXiv preprint arXiv:1301.3781.
4	Papineni K, Roukos S, Ward T, and Zhu WJ (2002). BLEU: a Method for Automatic Evaluation of Machine Translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311-318.
5	Pennington J, Socher R, and Manning CD (2014). GloVe: Global Vectors for Word Representation.
6	Sutskever I, Vinyals O, and Le QV (2014). Sequence to Sequence Learning with Neural Networks, arXiv preprint arXiv:1409.3215.
7	Wu Y, Schuster M, Chen Z, Le QV, and Norouzi M (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, arXiv:1609.08144, Retrived 2018.
8	Chollet F (2017). Deep Learning with Python, Manning, New York.
9	Bojanowski P, Grave E, Joulin A, and Mikolov T (2016). Enriching Word Vectors with Subword Information.
10	Bahdanau D, Cho K, and Bengio Y (2015). Neural machine translation by jointly learning to align and translate. In ICLR.
11	Cho K, Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, and Bengio Y (2014). Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation, arXiv preprint arXiv:1406.1078.
12	Chung J, Gulcehre C, Cho K, and Bengio Y (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv preprint arXiv:1412.3555.
13	Facebook Inc. Word vectors for 157 languages, Available from: https://fasttext.cc/docs/en/crawl-vectors.html
14	Grave E, Bojanowski P, Gupta P, Joulin A, and Mikolov T (2018). Learning Word Vectors for 157 Languages, arXiv preprint arXiv:1802.06893.
15	Jordan MI (1986). Attractor dynamics and parallelism in a connectionist sequential machine, Cogitive Science Conference, pp 531-546.
16	Greff K, Srivastava RK, Koutnik J, Steunebrink BR, and Schmidhuber J (2015). LSTM: A Search Space Odyssey, arXiv preprint arXiv:1503.04069.
17	Hochreiter S and Schmidhuber J (1997). Long short-term memory, Neural Computation, 9, 1735-1780. DOI