DOI QR코드

DOI QR Code

A Study on Improved Comments Generation Using Transformer

트랜스포머를 이용한 향상된 댓글 생성에 관한 연구

  • Seong, So-yun (Dept. of Game and Multimedia Engineering, Korea Polytechnic University) ;
  • Choi, Jae-yong (Dept. of Game and Multimedia Engineering, Korea Polytechnic University) ;
  • Kim, Kyoung-chul (Dept. of Game and Multimedia Engineering, Korea Polytechnic University)
  • 성소윤 (한국산업기술대학교 게임공학과) ;
  • 최재용 (한국산업기술대학교 게임공학과) ;
  • 김경철 (한국산업기술대학교 게임공학과)
  • Received : 2019.09.09
  • Accepted : 2019.10.14
  • Published : 2019.10.31

Abstract

We have been studying a deep-learning program that can communicate with other users in online communities since 2017. But there were problems with processing a Korean data set because of Korean characteristics. Also, low usage of GPUs of RNN models was a problem too. In this study, as Natural Language Processing models are improved, we aim to make better results using these improved models. To archive this, we use a Transformer model which includes Self-Attention mechanism. Also we use MeCab, korean morphological analyzer, to address a problem with processing korean words.

온라인 커뮤니티 안에서 다른 사용자들의 글에 반응할 수 있는 딥러닝 연구를 2017년부터 진행해 왔으나, 한국어의 조사와 같은 특성으로 인한 단어처리의 어려움과 RNN 모델의 특성으로 인한 GPU 사용률 저조 문제로 인해 적은 양의 데이터로 학습을 제한해야 했다. 하지만 최근 자연어 처리 분야의 급격한 발전으로 이전보다 뛰어난 모델들이 등장함에 따라 본 연구에서는 이러한 발전된 모델을 적용해 더 나은 학습 결과를 생성해 내는 것을 목표로 한다. 이를 위해 셀프-어텐션 개념이 적용된 트랜스포머모델을 도입했고 여기에 한국어 형태소 분석기 MeCab을 적용해 단어처리의 어려움을 완화했다.

Keywords

References

  1. J. Choi, S. Sung, K. Kim. "A Study on Automatic Comment Generation Using Deep Learning", Journal of Korea Game Society, 18(5), pp 83-92, 2018. https://doi.org/10.7583/JKGS.2018.18.5.83
  2. Stroh, Eylon, and Priyank Mathur. "Question answering using deep learning.", 2016
  3. Tang, Gongbo, et al. "Why self-attention? a targeted evaluation of neural machine translation architectures.", arXiv preprint arXiv:1808.08946, 2018.
  4. Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate.", arXiv preprint arXiv:1409.0473, 2014.
  5. Vaswani, Ashish, et al. "Attention is all you need.", Advances in neural information processing systems, pp.5998-6008, 2017.
  6. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding.", arXiv preprint arXiv:1810.04805, 2018.
  7. Radford, Alec, et al. "Improving language understanding by generative pre-training.", https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper.pdf, 2018.
  8. Yang, Zhilin, et al. "XLNet: Generalized Autoregressive Pretraining for Language Understanding.", arXiv preprint arXiv:1906.08237, 2019.
  9. Song, Kaitao, et al. "Mass: Masked sequence to sequence pre-training for language generation.", arXiv preprint arXiv:1905.02450, 2019.
  10. Sherstinsky, Alex. "Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network.", arXiv preprint arXiv:1808.03314, 2018.
  11. Hochreiter, Sepp, and Jurgen Schmidhuber. "Long short-term memory.", Neural computation 9.8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  12. Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv: 1412.3555, 2014.
  13. Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks.", Advances, in neural information processing systems, 2014.
  14. Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks.", IEEE Transactions on Signal Processing 45.11, pp.2673-2681, 1997. https://doi.org/10.1109/78.650093
  15. Cheng, Jianpeng, Li Dong, and Mirella Lapata. "Long short-term memory-networks for machine reading.", arXiv preprint arXiv:1601.06733, 2016.
  16. Jakob Uszkoreit, "Transformer: A Novel Neural Network Architecture for Language Understanding", https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html, 2017
  17. Damien Sileo, "Understanding BERT Transformer: Attention isn't all you need", https://medium.com/synapse-dev/understanding-bert-transformer-attention-isnt-all-you-need-5839ebd396db, 2019.
  18. https://reniew.github.io/43/
  19. leod. "Generate Hacker News Comments from Titles", https://github.com/leod/hncynic.
  20. Shibata, Yusuxke, et al. "Byte Pair encoding: A text compression scheme that accelerates pattern matching.", Technical Report DOI-TR-161, Department of Informatics, Kyushu University, 1999.
  21. "The Stanford Question Answering Dataset", https://rajpurkar.github.io/SQuAD-explorer/
  22. Peters, Matthew E., et al. "Deep contextualized word representations.", arXiv preprint arXiv:1802.05365, 2018.
  23. kakao, "Kakao Hangul Analyzer III", https://github.com/kakao/khaiii
  24. eunjeon,"mecab-ko-dic", https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/
  25. tensorflow, "Models and Examples built with Tensorflow", https://github.com/tensorflow/models
  26. Papineni, Kishore, et al. "BLEU: a method for automatic evaluation of machine translation.", Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002.
  27. Banerjee, Satanjeev, Alon Lavie. "METEOR: An automatic metric for MT evaluation with improved correlation with human judgments.", Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 2005.
  28. Liu, Chia-Wei, et al. "How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation.", arXiv preprint arXiv:603.08023, 2016.