DOI QR코드

DOI QR Code

딥러닝 사전학습 언어모델 기술 동향

Recent R&D Trends for Pretrained Language Model

  • 발행 : 2020.06.01

초록

Recently, a technique for applying a deep learning language model pretrained from a large corpus to fine-tuning for each application task has been widely used as a language processing technology. The pretrained language model shows higher performance and satisfactory generalization performance than existing methods. This paper introduces the major research trends related to deep learning pretrained language models in the field of language processing. We describe in detail the motivations, models, learning methods, and results of the BERT language model that had significant influence on subsequent studies. Subsequently, we introduce the results of language model studies after BERT, focusing on SpanBERT, RoBERTa, ALBERT, BART, and ELECTRA. Finally, we introduce the KorBERT pretrained language model, which shows satisfactory performance in Korean language. In addition, we introduce techniques on how to apply the pretrained language model to Korean (agglutinative) language, which consists of a combination of content and functional morphemes, unlike English (refractive) language whose endings change depending on the application.

키워드

과제정보

이 논문은 2020년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임[No. 2013-0-00131, (엑소브레인-총괄/1세부) 휴먼 지식증강 서비스를 위한 지능진화형 WiseQA 플랫폼 기술 개발].

참고문헌

  1. J. Devlin et al., "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. North Am. Association Computat. Linguistics (NAACL)-HLT, Minneapolis, MN, USA, June 2-7, 2019, pp. 4171-4186.
  2. T. Mikolov et al., "Distributed representations of words and phrases and their compositionality," in Proc. Int. Conf. Neural Inf. Process. Syst., 2013, pp. 3111-3119, doi: 10.5555/2999792.2999959.
  3. P. Bojanowski et al., "Enriching word vectors with subword information," Trans. Assoc. Comput. Linguistics, vol. 5, Dec. 2017, pp. 135-146. https://doi.org/10.1162/tacl_a_00051
  4. A. Vaswani et al., "Attention is all you need," in Proc. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp. 30-34.
  5. https://gluebenchmark.com/
  6. https://rajpurkar.github.io/SQuAD-explorer/
  7. Y. Sun et al., "ERNIE: Enhanced Representation through Knowledge Integration," arXiv preprint arXiv:1904.09223, 2019.
  8. K. Song et al., "Mass: Masked sequence to sequence pretraining for language generation," in Int. Conf. Mach. Learning, Long Beach, CA, USA, 2019, pp. 5926-5936.
  9. L. Dong et al., "Unified language model pre-training for natural language understanding and generation," arXiv preprint arXiv:1905.03197, 2019.
  10. Z. Yang et al., "XLNet: Generalized autoregressive pretraining for language understanding," arXiv preprint 1906.08237, 2019.
  11. M. Joshi et al., "SpanBERT: Improving pre-training by representing and predicting spans," arXiv preprint 1907.10529, 2019.
  12. Y. Liu et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach," arXiv:1907.11692, 2019.
  13. Z. Lan et al., "ALBERT: A Lite BERT for Selfsupervised Learning of Language Representations," in Int. Conf. Learning Representations, Addis Ababa, Ethiopia, May 2020.
  14. M. Lewis et al., "Bart: Denoising sequence-to-sequence pretraining for natural language generation, translation, and comprehension." arXiv preprint arXiv:1910.13461, 2019.
  15. K. Clark et al., "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators." in Int. Conf. Learning Representations, Addis Ababa, Ethiopia, May 2020.
  16. H. Bao et al., "UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training." arXiv preprint arXiv:2002.12804, 2020.
  17. http://aiopen.etri.re.kr/service_dataset.php