[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2021.03.002

Robustness of Differentiable Neural Computer Using Limited Retention Vector-based Memory Deallocation in Language Model

Lee, Donghyun (Department of Computer Science and Engineering, Sogang University)
Park, Hosung (Department of Computer Science and Engineering, Sogang University)
Seo, Soonshin (Department of Computer Science and Engineering, Sogang University)
Son, Hyunsoo (Department of Computer Science and Engineering, Sogang University)
Kim, Gyujin (Department of Computer Science and Engineering, Sogang University)
Kim, Ji-Hwan (Department of Computer Science and Engineering, Sogang University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.3, 2021 , pp. 837-852 More about this Journal

Abstract

Recurrent neural network (RNN) architectures have been used for language modeling (LM) tasks that require learning long-range word or character sequences. However, the RNN architecture is still suffered from unstable gradients on long-range sequences. To address the issue of long-range sequences, an attention mechanism has been used, showing state-of-the-art (SOTA) performance in all LM tasks. A differentiable neural computer (DNC) is a deep learning architecture using an attention mechanism. The DNC architecture is a neural network augmented with a content-addressable external memory. However, in the write operation, some information unrelated to the input word remains in memory. Moreover, DNCs have been found to perform poorly with low numbers of weight parameters. Therefore, we propose a robust memory deallocation method using a limited retention vector. The limited retention vector determines whether the network increases or decreases its usage of information in external memory according to a threshold. We experimentally evaluate the robustness of a DNC implementing the proposed approach according to the size of the controller and external memory on the enwik8 LM task. When we decreased the number of weight parameters by 32.47%, the proposed DNC showed a low bits-per-character (BPC) degradation of 4.30%, demonstrating the effectiveness of our approach in language modeling tasks.

Keywords

Differentiable Neural Computer (DNC); Language Model (LM); Memory Deallocation; Retention Vector; Robustness;

Citations & Related Records

Reference

1	S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computing, vol. 9, no. 8, pp. 1735-1780, Nov. 1997. DOI
2	U. Khandelwal, H. He, P. Qi, and D. Jurafsky, "Sharp nearby, fuzzy far away: How neural language models use context," in Proc. of the 56th Annual Workshops of the Association for Computational Linguistics, pp. 284-294, 2018.
3	K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention," in Proc. of the 32nd International Conference on Machine Learning, vol. 37, pp. 2048-2057, 2015.
4	S. Duan, H. Zhao, J. Zhou, and R. Wang, "Syntax-aware transformer encoder for neural machine translation," in Proc. of 2019 International Conference on Asian Language Processing, pp. 396-401, 2019.
5	J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1, pp. 4171-4186, 2019.
6	J. Yang, M. Wang, H. Zhou, C. Zhao, W. Zhang, Y. Yu, and L. Li, "Towards making the most of BERT in neural machine translation," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 34, no. 5, pp. 9378-9385, 2020.
7	L. Floridi and M. Chiriatti, "GPT-3: Its nature, scope, limits, and consequences," Minds and Machines, vol. 30, pp. 681-694, Nov. 2020. DOI
8	A. Mujika, F. Meier, and A. Steger, "Fast-slow recurrent neural networks," in Proc. of the 31st Annual Conference on Neural Information Processing Systems, pp. 5917-5926, 2017.
9	Y. Qu, P. Liu, W. Song, L. Liu, and M. Cheng, "A text generation and prediction system: pre-training on new corpora using BERT and GPT-2," in Proc. of the 10th International Conference on Electronics Information and Emergency Communication, pp. 323-326, 2020.
10	W. Ko and J. Li, "Assessing discourse relations in language generation from GPT-2," in Proc. of the 13th International Conference on Natural Language Generation, pp. 52-59, 2020.
11	T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, and A. Neelakanta, "Language models are few-shot learners," in Proc. of Conference on Neural Information Processing Systems, pp. 1-25, 2020.
12	R. Al-Rfou, D. Choe, N. Constant, M. Guo, and L. Jones, "Character-level language modeling with deeper self-attention," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 3159-3166, 2019.
13	R. Sharma, A. Kumar, D. Meena, and S. Pushp, "Employing differentiable neural computers for image captioning and neural machine translation," Procedia Computer Science, vol. 173, pp. 234-244, July 2020. DOI
14	C. Yin, J. Tang, Z. Xu, and Y. Wang, "Memory augmented deep recurrent neural network for video question answering," IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3159-3167, Sep. 2020. DOI
15	R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in Proc. of the 30th International Conference on Machine Learning, vol. 28, no. 3, pp. 1310-1318, 2013.
16	S. Mani, S. V. Gothe, S. Ghosh, A. K. Mishra, P. Kulshreshtha, M. Bhargavi, and M. Jumaran, "Real-time optimized n-gram for mobile devices," in Proc. of the 13th International Conference on Semantic Computing, pp. 87-92, 2019.
17	R. Mu and X. Zeng, "A review of deep learning research," KSII Transactions on Internet and Information Systems, vol. 13, no. 4, pp. 1738-1764, Apr. 2019. DOI
18	F. Lin, X. Ma, Y. Chen, J. Zhou, and B. Liu, "PC-SAN: Pretraining-based contextual self-attention model for topic essay generation," KSII Transactions on Internet and Information Systems, vol. 14, no. 8, pp. 3168-3186, Aug. 2020. DOI
19	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, T. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, pp. 1-11, 2017.
20	T. Park, I. Choi, and M. Lee, "Distributed memory based self-supervised differentiable neural computer," arXiv:2007.10637, 2020.
21	Y. Ming, D. Pelsusi, C. Fang, M. Prasad, Y. Wang, D. Wu, and C. T. Lin, "EEG data analysis with stacked differentiable neural computers," Neural Computing and Applications, vol. 32, pp. 7611-7621, June 2018. DOI
22	A. Mufti, S. Penkov, and S. Ramamoorthy, "Iterative model-based reinforcement learning using simulations in the differentiable neural computer," arXiv:1906.07248, 2019.
23	M. Rasekh and F. Safi-Esfahani, "EDNC: Evolving differentiable neural computers," Neurocomputing, vol. 412, pp. 514-542, Oct. 2020. DOI
24	Y. Zhang, X. Wang, and H. Tang, "An improved Elman neural network with piecewise weighted gradient for time series prediction," Neurocomputing, vol. 359, pp. 199-208, Sep. 2019. DOI
25	W. Shi and V. Demberg, "Next sentence prediction helps implicit discourse relation classification within and across domains," in Proc. of 2019 Conference on Empirical Methods in Natural Language Processing, pp. 5790-5796, 2019.
26	J. Kim, I. Choi, and M. Lee, "Context aware video caption generation with consecutive differentiable neural computer," Electronics, vol. 9, no. 7, pp. 1-15, July 2020.
27	R. Csordas and J. Schmidhuber, "Improving differentiable neural computers through memory masking, de-allocation, and link distribution sharpness control," in Proc. of International Conference on Learning Representations, pp. 7299-7310, 2019.
28	Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov, "Transformer-XL: Attentive language models beyond a fixed-length context," in Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2928-2988, 2019.
29	A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska, and S. Colmenarejo, "Hybrid computing using a neural network with dynamic external memory," Nature, vol. 538, pp. 471-476, Oct. 2016. DOI
30	W. Luo and F. Yu, "Recurrent highway networks with grouped auxiliary memory," IEEE Access, vol. 7, pp. 182037-182049, Dec. 2019. DOI
31	T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur, "Recurrent neural network based language model," in Proc. of the 11th Annual Conference of the International Speech Communication Association, pp. 1045-1048, 2010.
32	E. Arisoy, A. Sethy, B. Ramabhadran, and S. Chen, "Bidirectional recurrent neural network language models for automatic speech recognition," in Proc. of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5421-5425, 2015.
33	Y. Belinkov and J. Glass, "Analysis methods in neural language processing: A survey," Transactions of Association for Computational Linguistics, vol. 7, pp. 49-72, Mar. 2019. DOI