Browse > Article
http://dx.doi.org/10.36498/kbigdt.2022.7.2.11

A Survey on Deep Learning-based Pre-Trained Language Models  

Sangun Park (경기대학교 소프트웨어경영대학 ICT융합학부 경영정보전공)
Publication Information
The Journal of Bigdata / v.7, no.2, 2022 , pp. 11-29 More about this Journal
Abstract
Pre-trained language models are the most important and widely used tools in natural language processing tasks. Since those have been pre-trained for a large amount of corpus, high performance can be expected even with fine-tuning learning using a small number of data. Since the elements necessary for implementation, such as a pre-trained tokenizer and a deep learning model including pre-trained weights, are distributed together, the cost and period of natural language processing has been greatly reduced. Transformer variants are the most representative pre-trained language models that provide these advantages. Those are being actively used in other fields such as computer vision and audio applications. In order to make it easier for researchers to understand the pre-trained language model and apply it to natural language processing tasks, this paper describes the definition of the language model and the pre-learning language model, and discusses the development process of the pre-trained language model and especially representative Transformer variants.
Keywords
NLP; deep learning; language model; Transformer; BERT; GPT;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 T. Lin et al., "A survey of transformers", AI Open, 2022. 
2 A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale.", arXiv preprint arXiv:2010.11929, 2020. 
3 M. Chen, et al., "Generative pretraining from pixels", International conference on machine learning. PMLR, 2020. 
4 C. Subakan et al., "Attention is all you need in speech separation", ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. 
5 H. Akbari et al., "Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text" Advances in Neural Information Processing Systems, Vol.34, pp.24206-24221, 2021. 
6 H. Li, "Language models: past, present, and future", Communications of the ACM, Vol.65, No.7, pp56-63, 2022.    DOI
7 A. Radford et al., "Language models are unsupervised multitask learners.", OpenAI blog Vol.1, No.8, p.9, 2019. 
8 A. Wang et al., "Superglue: A stickier benchmark for general-purpose language understanding systems", Advances in neural information processing systems, Vol.32, 2019. 
9 Y. Liu et al., "Roberta: A robustly optimized bert pretraining approach", arXiv preprint arXiv:1907. 11692, 2019. 
10 Z. Lan et al., "Albert: A lite bert for self-supervised learning of language representations", arXiv preprint arXiv:1909.11942, 2019. 
11 K. Clark et al., "Electra: Pre-training text encoders as discriminators rather than generators", arXiv preprint arXiv:2003.10555, 2020. 
12 M. Lewis et al., "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension", arXiv preprint arXiv:1910.13461, 2019. 
13 https://aiopen.etri.re.kr/service_dataset.php, 2019. 
14 https://github.com/SKTBrain/KoBERT, 2019. 
15 S. Lee et al., "Kr-bert: A small-scale korean-specific language model", arXiv preprint arXiv:2008.03979, 2020. 
16 https://github.com/monologg/KoELECTRA, 2020. 
17 https://huggingface.co/xlm-roberta-base 
18 https://aida.kisti.re.kr/data/107ca6f3-ebcb-4a64-87d5-cea412b76daf, 2021. 
19 https://github.com/SKT-AI/KoGPT2, 2020. 
20 https://github.com/haven-jeon/kogpt2-chatbot, 2022. 
21 https://github.com/kakaobrain/kogpt, 2021. 
22 https://github.com/SKT-AI/KoBART, 2020.
23 Y. Bengio et al., "A neural probabilistic language model", Advances in neural information processing systems, Vol.13, 2000. 
24 J. Devlin et al., "Bert: Pre-training of deep bidirectional transformers for language understanding", arXiv preprint arXiv:1810.04805, 2018. 
25 T. Brown et al., "Language models are few-shot learners", Advances in neural information processing systems, Vol.33, pp.1877-1901, 2020. 
26 S.J. Pan an d Y. Qian g, "A survey on transfer learning", IEEE Transactions on knowledge and data engineering, Vol.22, No.10, pp.1345-1359, 2009. 
27 T. Mikolov et al., "Efficient estimation of word representations in vector space", arXiv preprint arXiv:1301.3781. 2013. 
28 J. Sarzynska-Wawer et al., "Detecting formal thought disorder by deep contextualized word representations", Psychiatry Research, Vol.304, p.114135, 2021. 
29 A. Vaswani et al., "Attention is all you need", Advances in neural information processing systems, Vol.30, pp.5998-6008, 2017. 
30 J.L. Ba et al., "Layer normalization", arXiv preprint arXiv:1607.06450, 2016. 
31 Y. Wu et al., "Google's neural machine translation system: Bridging the gap between human and machine translation", arXiv preprint arXiv:1609. 08144, 2016. 
32 A. Radford et al., "Improving language understanding by generative pre-training", 2018. 
33 A. Wang et al., "GLUE: A multi-task benchmark and analysis platform for natural language understanding", arXiv preprint arXiv:1804.07461, 2018.