Browse > Article
http://dx.doi.org/10.13088/jiis.2017.23.2.071

Korean Sentence Generation Using Phoneme-Level LSTM Language Model  

Ahn, SungMahn (School of Business Administration, Kookmin University)
Chung, Yeojin (School of Business Administration, Kookmin University)
Lee, Jaejoon (Department of Data Science, Kookmin University)
Yang, Jiheon (Department of Data Science, Kookmin University)
Publication Information
Journal of Intelligence and Information Systems / v.23, no.2, 2017 , pp. 71-88 More about this Journal
Abstract
Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.
Keywords
Language model; Recurrent neural network; Long short-term memory model; Sentence generation model;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Cauchy, A. "Methode generale pour la resolution des systemes d'equations simultanees." Comp. Rend. Sci. Paris, Vol.25 (1847), 536-538.
2 Chollet, F. "Keras." Available at https://github.com/fchollet/keras (downloaded 1 December, 2016).
3 Chung, J., Cho, K., and Bengio, Y. "A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation." arXiv:1603.06147 (2016).
4 Olah, Christopher. "Understanding LSTM Networks." Colah's Blog. Available at http://colah.github.io/posts/2015-08-Understan ding-LSTMs/ (downloaded 1 December, 2016).
5 Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q. V., et al. "Large Scale Distributed Deep Networks." In Advances in neural information processing systems, (2012), 1223-1231.
6 Dozat, T. "Incorporating Nesterov Momentum into Adam." Technical report, Stanford University, Available at http://cs229.stanford. edu/proj2015/054report.pdf (2015).
7 Duchi, J., Hazan, E., and Singer, Y. "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization." Journal of Machine Learning Research, Vol. 12 (2011), 2121- 2159.
8 Gers, F. A. and Schmidhuber, E. "LSTM Recurrent Networks Learn Simple Context-Free and Context-Sensitive Languages." IEEE Transactions on Neural Networks, Vol. 12, No. 6 (2001), 1333-1340.   DOI
9 Goodfellow, I., Bengio, Y., and Courville, A. "Deep Learning." MIT Press, Massachusetts, 2016.
10 Hinton, G., Srivastava, N., and Swersky, K. "Neural networks for machine learning." Coursera, video lectures, Available at https://www.coursera.org/learn/neural-networks (downloaded 1 December, 2016).
11 Hochreiter, S. and Schmidhuber, J. "Long Short-Term Memory." Neural Computation, Vol. 9, No. 8 (1997), 1735-1780.   DOI
12 Hutter, M. "The Human Knowledge Compression Prize." Available at http://prize.hutter1.net/ (2006).
13 Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., and Wu, Y. "Exploring the Limits of Language Modeling." arXiv:1602.02410 (2016).
14 Kim, Y., Jernite, Y., Sontag, D., and Rush, A. M. "Character-Aware Neural Language Models." arXiv:1508.06615 (2015).
15 Kingma, D. and Ba, J. "Adam: A Method for Stochastic Optimization." arXiv:1412.6980 (2014).
16 Kim, Y.-h., Hwang, Y.-k., Kang, T.-g., and Jung, K.-m. "LSTM Language Model Based Korean Sentence Generation." The Journal of Korean Institute of Communications and Information Sciences, Vol. 41, No. 5 (2016), 592-601.   DOI
17 Ahn, S. "Deep Learning Architectures and Applications." Journal of Intelligence and Information Systems, Vol. 22, No. 2 (2016), 127-142.   DOI
18 Bojanowski, P., Joulin, A., and Mikolov, T. "Alternative Structures for Character-Level RNNs." arXiv:1511.06303 (2015).
19 Lankinen, M., Heikinheimo, H., Takala, P., and Raiko, T. "A Character-Word Compositional Neural Language Model for Finnish." arXiv:1612.03266 (2016).
20 Lee, D., Oh, Kh., and Choi, H.-J. "Measuring the Syntactic Similarity between Korean Sentences Using RNN." In Proceedings of Korea Computer Congress (2016a), 792-794.
21 Lee, J., Cho, K., and Hofmann, T. "Fully Character-Level Neural Machine Translation without Explicit Segmentation." arXiv:1610. 03017 (2016b).
22 Ling, W., Luis, T., Marujo, L., Astudillo, R. F., Amir, S., Dyer, C., Black, A. W., and Trancoso, I. "Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation." arXiv: 1508.02096 (2015).
23 Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., and Khudanpur, S. "Recurrent Neural Network Based Language Model." In Proceedings of Interspeech (2010), 1045-1048.
24 Mikolov, T. and Zweig, G. "Context Dependent Recurrent Neural Network Language Model." SLT (2012), 234-239.
25 Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." Journal of Machine Learning Research, Vol. 15, No. 1 (2014), 1929-1958.
26 Polyak, B. T. "Some Methods of Speeding Up the Convergence of Iteration Methods." USSR Computational Mathematics and Mathematical Physics, Vol. 4, No. 5 (1964), 1-17.   DOI
27 Rissanen, J. and Langdon, G. G. "Arithmetic Coding." IBM Journal of research and development, Vol.23, No. 2 (1979), 149-162.   DOI
28 Socher, R. and Mundra, R. S. "CS 224D: Deep Learning for NLP1." Available at http://cs224d.stanford.edu/ (downloaded 1 December, 2016).
29 Sundermeyer, M., Schlu ̈ter, R., and Ney, H. "LSTM Neural Networks for Language Modeling." In Proceedings of Interspeech (2012), 194-197.
30 Sutskever, I. and Martens, J. "Generating Text with Recurrent Neural Networks." In Proceedings of the 28th International Conference on Machine Learning (2011), 1017-1024.
31 Theano Development Team. "Theano: A Python Framework for Fast Computation of Mathematical Expressions." arXiv:1605. 02688 (2016).
32 Ward, D. J., Blackwell, A. F., and MacKay, D. J. "Dasher-a Data Entry Interface Using Continuous Gestures and Language Models." In Proceedings of the 13th annual ACM symposium on User interface software and technology (2000), 129-137.
33 Zeiler, M. D. "ADADELTA: An Adaptive Learning Rate Method." arXiv:1212.5701 (2012).