References
- Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi, The Curious Case of Neural Text Degeneration, in Proc. of International Conference on Learning Representations, 2020.
- Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston, Neural Text Generation with Unlikelihood Training, In Proc. of International Conference on Learning Representations, 2020.
- OpenAI, Language models are unsupervised multitask learners. https://openai.com/blog/better-language-models/
- Alexander M. Rush, Yin-Wen Chang, and Michael Collins, Optimal Beam Search for Machine Translation, in Proc. of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 210-221, Oct. 18-21, 2013.
- Angela Fan, Mike Lewis, and Yann Dauphin, Hierarchical Neural Story Generation, in Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 889-898, July 15-20, 2018. DOI: https://doi.org/10.18653/v1/P18-1082
- Liang Huang, Kai Zhao, and Mingbo Ma, When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size), in Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2134-2139, September 7-11, 2017. DOI: https://doi.org/10.18653/v1/D17-1227
- Rico Sennrich, Barry Haddow, Alexandra Birch, Neural Machine Translation of Rare Words with Subword Units, in Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715-1725, August, 2016. DOI: https://doi.org/10.18653/v1/P16-1162
- Taku Kudo, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates, in Proc. of 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 66-75, July 15-20, 2018. DOI: https://doi.org/10.18653/v1/P18-1007
- Rohan Chitnis, John DeNero, Variable-Length Word Encodings for Neural Translation Models," in Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2088-2093, September 17-21, 2015. DOI: https://doi.org/10.18653/v1/D15-1249
- Philip Gage, A New Algorithm for Data Compression, C users Journal, Vol. 12, No. 2, pp. 23-382, June 1994. DOI: https://dl.acm.org/doi/10.5555/177910.177914
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention is all you need, in Proc. of the 2017 Advances in Neural Information Processing Systems, pages 5998-6008, 2017. DOI: https://dl.acm.org/doi/10.5555/3295222.3295349
- Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher, Pointer sentinel mixture models, in Proc. of International Conference on Learning Representations, 2017.