Fig. 1. Data Composition by One-hot Encoding with Last Five Characters of the Word
Fig. 2. The Accuracy with Training and Testing Data Set. (A) Drop-out technique is not applied on any three stacked layers, (B) Drop-out technique is applied on the only bottom layer among three stack layer, (C) Drop-out technique is applied all three stack layers
Fig. 3. The Accuracy with Training and Testing Data Set. (A) Drop-out technique is applied all three stack layers, (B) Drop-out technique is applied all four-stacked-layers, (C) Drop-out technique is applied all five stack layers
Fig. 4. The Framework of Suggested Model
Fig. 5. The Screen-shot of Tensor Board forthe Suggested Model
Fig. 6. The Min, Max, and Average Accuracy for Each Epoch. The Red Point Stands for Average of Accuracy and Blue Range is Min and Max of Accuracy
Fig. 7. The ROC Curve
Fig. 8. Average Confusion Matrix Values of 10-fold Cross Validation for Each Epoch
Table 1. The Examples of Automatic Translation by Google, Naver and Kakao Applications
Table 2. The Number of Vowel and Consonant Word in Dataset
Table 3. The Number of Parts of Speech in Dataset
Table 4. The Examples of Wrong Transliteration in Korean
Table 5. Data Classification Depending on the Korean Pronunciation. “1” at postposition class stands for “eul - 을” and “0” means “reul-를”
Table 6. The Number and Distribution of Data for Each Class
Table 7. The Number of Word for Each Length in Dataset
Table 8. Confusion Matrix
References
- Songyi Lee, "A Study on Perception of English-Transliteration Words in Newspaper Articles," Studies in Linguistics, Vol.46, No.1, pp.313-333, 2018. https://doi.org/10.17002/sil..46.201801.313
- Google Translation, [Online]. Available: https://translate.google.com
- Naver Translation, [Online]. Available: https://papogo.naver.com
- kakao Translation, [Online]. Available: https://translate.kakao.com
- Lee, Donghyun, Lim, Minkyu, Park, Hosung, and Kim, Ji-Hwan, "LSTM RNN-based Korean Speech Recognition System Using CTC," Journal of Digital Contents Society, Vol.18, No.1, pp.93-99, 2017. https://doi.org/10.9728/dcs.2017.18.1.93
- Goldberg, Yoav, "A Primer on Neural Network Models for Natural Language Processing," Journal of Artificial Intelligence Research, Vol.57, No.1, pp.345-420, 2016. https://doi.org/10.1613/jair.4992
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al., "Google's neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.
- English-Korean Transliteration [Internet], https://github.com/muik/transliteration
- WordNet: A Lexical Database for English [Internet], https://wordnet.princeton.edu/
- Edward Loper and Steven Bird, "NLTK: the Natural Language Toolkit," ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, Vol. 1, pp.63-70.
- TensorFlow Release [Internet], https://www.tensorflow.org/, Retrieved 14 November 2018.
- Theano Release [Internet], http://www.deeplearning.net/software/theano/, Retrieved 17 September 2018.
- R. Collobert, S. Bengio, and J. Marithoz, "Torch: a modular machine learning software library," Technical Report IDIAPRR02-46, IDIAP, 2002.
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama and Darrell, Trevor, "Caffe: Convolutional Architecture for Fast Feature Embedding," 2014.
- Víctor Martínez-Cagigal, ROC Curve [Internet], (https://www.mathworks.com/matlabcentral/fileexchange/52442-roc-curve), MATLAB Central File Exchange. Retrieved February 7, 2019.