Robustness of Differentiable Neural Computer Using Limited Retention Vector-based Memory Deallocation in Language Model |
Lee, Donghyun
(Department of Computer Science and Engineering, Sogang University)
Park, Hosung (Department of Computer Science and Engineering, Sogang University) Seo, Soonshin (Department of Computer Science and Engineering, Sogang University) Son, Hyunsoo (Department of Computer Science and Engineering, Sogang University) Kim, Gyujin (Department of Computer Science and Engineering, Sogang University) Kim, Ji-Hwan (Department of Computer Science and Engineering, Sogang University) |
1 | S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computing, vol. 9, no. 8, pp. 1735-1780, Nov. 1997. DOI |
2 | U. Khandelwal, H. He, P. Qi, and D. Jurafsky, "Sharp nearby, fuzzy far away: How neural language models use context," in Proc. of the 56th Annual Workshops of the Association for Computational Linguistics, pp. 284-294, 2018. |
3 | K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention," in Proc. of the 32nd International Conference on Machine Learning, vol. 37, pp. 2048-2057, 2015. |
4 | S. Duan, H. Zhao, J. Zhou, and R. Wang, "Syntax-aware transformer encoder for neural machine translation," in Proc. of 2019 International Conference on Asian Language Processing, pp. 396-401, 2019. |
5 | J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1, pp. 4171-4186, 2019. |
6 | J. Yang, M. Wang, H. Zhou, C. Zhao, W. Zhang, Y. Yu, and L. Li, "Towards making the most of BERT in neural machine translation," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 34, no. 5, pp. 9378-9385, 2020. |
7 | L. Floridi and M. Chiriatti, "GPT-3: Its nature, scope, limits, and consequences," Minds and Machines, vol. 30, pp. 681-694, Nov. 2020. DOI |
8 | A. Mujika, F. Meier, and A. Steger, "Fast-slow recurrent neural networks," in Proc. of the 31st Annual Conference on Neural Information Processing Systems, pp. 5917-5926, 2017. |
9 | Y. Qu, P. Liu, W. Song, L. Liu, and M. Cheng, "A text generation and prediction system: pre-training on new corpora using BERT and GPT-2," in Proc. of the 10th International Conference on Electronics Information and Emergency Communication, pp. 323-326, 2020. |
10 | W. Ko and J. Li, "Assessing discourse relations in language generation from GPT-2," in Proc. of the 13th International Conference on Natural Language Generation, pp. 52-59, 2020. |
11 | T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, and A. Neelakanta, "Language models are few-shot learners," in Proc. of Conference on Neural Information Processing Systems, pp. 1-25, 2020. |
12 | R. Al-Rfou, D. Choe, N. Constant, M. Guo, and L. Jones, "Character-level language modeling with deeper self-attention," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 3159-3166, 2019. |
13 | R. Sharma, A. Kumar, D. Meena, and S. Pushp, "Employing differentiable neural computers for image captioning and neural machine translation," Procedia Computer Science, vol. 173, pp. 234-244, July 2020. DOI |
14 | C. Yin, J. Tang, Z. Xu, and Y. Wang, "Memory augmented deep recurrent neural network for video question answering," IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3159-3167, Sep. 2020. DOI |
15 | R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in Proc. of the 30th International Conference on Machine Learning, vol. 28, no. 3, pp. 1310-1318, 2013. |
16 | S. Mani, S. V. Gothe, S. Ghosh, A. K. Mishra, P. Kulshreshtha, M. Bhargavi, and M. Jumaran, "Real-time optimized n-gram for mobile devices," in Proc. of the 13th International Conference on Semantic Computing, pp. 87-92, 2019. |
17 | R. Mu and X. Zeng, "A review of deep learning research," KSII Transactions on Internet and Information Systems, vol. 13, no. 4, pp. 1738-1764, Apr. 2019. DOI |
18 | F. Lin, X. Ma, Y. Chen, J. Zhou, and B. Liu, "PC-SAN: Pretraining-based contextual self-attention model for topic essay generation," KSII Transactions on Internet and Information Systems, vol. 14, no. 8, pp. 3168-3186, Aug. 2020. DOI |
19 | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, T. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, pp. 1-11, 2017. |
20 | T. Park, I. Choi, and M. Lee, "Distributed memory based self-supervised differentiable neural computer," arXiv:2007.10637, 2020. |
21 | Y. Ming, D. Pelsusi, C. Fang, M. Prasad, Y. Wang, D. Wu, and C. T. Lin, "EEG data analysis with stacked differentiable neural computers," Neural Computing and Applications, vol. 32, pp. 7611-7621, June 2018. DOI |
22 | A. Mufti, S. Penkov, and S. Ramamoorthy, "Iterative model-based reinforcement learning using simulations in the differentiable neural computer," arXiv:1906.07248, 2019. |
23 | M. Rasekh and F. Safi-Esfahani, "EDNC: Evolving differentiable neural computers," Neurocomputing, vol. 412, pp. 514-542, Oct. 2020. DOI |
24 | Y. Zhang, X. Wang, and H. Tang, "An improved Elman neural network with piecewise weighted gradient for time series prediction," Neurocomputing, vol. 359, pp. 199-208, Sep. 2019. DOI |
25 | W. Shi and V. Demberg, "Next sentence prediction helps implicit discourse relation classification within and across domains," in Proc. of 2019 Conference on Empirical Methods in Natural Language Processing, pp. 5790-5796, 2019. |
26 | J. Kim, I. Choi, and M. Lee, "Context aware video caption generation with consecutive differentiable neural computer," Electronics, vol. 9, no. 7, pp. 1-15, July 2020. |
27 | R. Csordas and J. Schmidhuber, "Improving differentiable neural computers through memory masking, de-allocation, and link distribution sharpness control," in Proc. of International Conference on Learning Representations, pp. 7299-7310, 2019. |
28 | Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov, "Transformer-XL: Attentive language models beyond a fixed-length context," in Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2928-2988, 2019. |
29 | A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska, and S. Colmenarejo, "Hybrid computing using a neural network with dynamic external memory," Nature, vol. 538, pp. 471-476, Oct. 2016. DOI |
30 | W. Luo and F. Yu, "Recurrent highway networks with grouped auxiliary memory," IEEE Access, vol. 7, pp. 182037-182049, Dec. 2019. DOI |
31 | T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur, "Recurrent neural network based language model," in Proc. of the 11th Annual Conference of the International Speech Communication Association, pp. 1045-1048, 2010. |
32 | E. Arisoy, A. Sethy, B. Ramabhadran, and S. Chen, "Bidirectional recurrent neural network language models for automatic speech recognition," in Proc. of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5421-5425, 2015. |
33 | Y. Belinkov and J. Glass, "Analysis methods in neural language processing: A survey," Transactions of Association for Computational Linguistics, vol. 7, pp. 49-72, Mar. 2019. DOI |