[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7236/IJIBC.2020.12.2.45

Subword Neural Language Generation with Unlikelihood Training

Iqbal, Salahuddin Muhammad (Department of Computer Engineering, Dongseo University)
Kang, Dae-Ki (Department of Computer Engineering, Dongseo University)

Publication Information

International Journal of Internet, Broadcasting and Communication / v.12, no.2, 2020 , pp. 45-50 More about this Journal

Abstract

A Language model with neural networks commonly trained with likelihood loss. Such that the model can learn the sequence of human text. State-of-the-art results achieved in various language generation tasks, e.g., text summarization, dialogue response generation, and text generation, by utilizing the language model's next token output probabilities. Monotonous and boring outputs are a well-known problem of this model, yet only a few solutions proposed to address this problem. Several decoding techniques proposed to suppress repetitive tokens. Unlikelihood training approached this problem by penalizing candidate tokens probabilities if the tokens already seen in previous steps. While the method successfully showed a less repetitive generated token, the method has a large memory consumption because of the training need a big vocabulary size. We effectively reduced memory footprint by encoding words as sequences of subword units. Finally, we report competitive results with token level unlikelihood training in several automatic evaluations compared to the previous work.

Keywords

Subword units; Natural language processing; Neural language generation; Natural Language processing Maximum likelihood training; Unlikelihood training;

Citations & Related Records

Reference

1	Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi, The Curious Case of Neural Text Degeneration, in Proc. of International Conference on Learning Representations, 2020.
2	Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston, Neural Text Generation with Unlikelihood Training, In Proc. of International Conference on Learning Representations, 2020.
3	OpenAI, Language models are unsupervised multitask learners. https://openai.com/blog/better-language-models/
4	Alexander M. Rush, Yin-Wen Chang, and Michael Collins, Optimal Beam Search for Machine Translation, in Proc. of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 210-221, Oct. 18-21, 2013.
5	Angela Fan, Mike Lewis, and Yann Dauphin, Hierarchical Neural Story Generation, in Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 889-898, July 15-20, 2018. DOI: https://doi.org/10.18653/v1/P18-1082
6	Liang Huang, Kai Zhao, and Mingbo Ma, When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size), in Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2134-2139, September 7-11, 2017. DOI: https://doi.org/10.18653/v1/D17-1227
7	Philip Gage, A New Algorithm for Data Compression, C users Journal, Vol. 12, No. 2, pp. 23-382, June 1994. DOI: https://dl.acm.org/doi/10.5555/177910.177914
8	Rico Sennrich, Barry Haddow, Alexandra Birch, Neural Machine Translation of Rare Words with Subword Units, in Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715-1725, August, 2016. DOI: https://doi.org/10.18653/v1/P16-1162
9	Taku Kudo, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates, in Proc. of 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 66-75, July 15-20, 2018. DOI: https://doi.org/10.18653/v1/P18-1007
10	Rohan Chitnis, John DeNero, Variable-Length Word Encodings for Neural Translation Models," in Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2088-2093, September 17-21, 2015. DOI: https://doi.org/10.18653/v1/D15-1249
11	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention is all you need, in Proc. of the 2017 Advances in Neural Information Processing Systems, pages 5998-6008, 2017. DOI: https://dl.acm.org/doi/10.5555/3295222.3295349
12	Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher, Pointer sentinel mixture models, in Proc. of International Conference on Learning Representations, 2017.