1 |
T. Lin et al., "A survey of transformers", AI Open, 2022.
|
2 |
A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale.", arXiv preprint arXiv:2010.11929, 2020.
|
3 |
M. Chen, et al., "Generative pretraining from pixels", International conference on machine learning. PMLR, 2020.
|
4 |
C. Subakan et al., "Attention is all you need in speech separation", ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.
|
5 |
H. Akbari et al., "Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text" Advances in Neural Information Processing Systems, Vol.34, pp.24206-24221, 2021.
|
6 |
H. Li, "Language models: past, present, and future", Communications of the ACM, Vol.65, No.7, pp56-63, 2022.
DOI
|
7 |
A. Radford et al., "Language models are unsupervised multitask learners.", OpenAI blog Vol.1, No.8, p.9, 2019.
|
8 |
A. Wang et al., "Superglue: A stickier benchmark for general-purpose language understanding systems", Advances in neural information processing systems, Vol.32, 2019.
|
9 |
Y. Liu et al., "Roberta: A robustly optimized bert pretraining approach", arXiv preprint arXiv:1907. 11692, 2019.
|
10 |
Z. Lan et al., "Albert: A lite bert for self-supervised learning of language representations", arXiv preprint arXiv:1909.11942, 2019.
|
11 |
K. Clark et al., "Electra: Pre-training text encoders as discriminators rather than generators", arXiv preprint arXiv:2003.10555, 2020.
|
12 |
M. Lewis et al., "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension", arXiv preprint arXiv:1910.13461, 2019.
|
13 |
https://aiopen.etri.re.kr/service_dataset.php, 2019.
|
14 |
https://github.com/SKTBrain/KoBERT, 2019.
|
15 |
S. Lee et al., "Kr-bert: A small-scale korean-specific language model", arXiv preprint arXiv:2008.03979, 2020.
|
16 |
https://github.com/monologg/KoELECTRA, 2020.
|
17 |
https://huggingface.co/xlm-roberta-base
|
18 |
https://aida.kisti.re.kr/data/107ca6f3-ebcb-4a64-87d5-cea412b76daf, 2021.
|
19 |
https://github.com/SKT-AI/KoGPT2, 2020.
|
20 |
https://github.com/haven-jeon/kogpt2-chatbot, 2022.
|
21 |
https://github.com/kakaobrain/kogpt, 2021.
|
22 |
https://github.com/SKT-AI/KoBART, 2020.
|
23 |
Y. Bengio et al., "A neural probabilistic language model", Advances in neural information processing systems, Vol.13, 2000.
|
24 |
J. Devlin et al., "Bert: Pre-training of deep bidirectional transformers for language understanding", arXiv preprint arXiv:1810.04805, 2018.
|
25 |
T. Brown et al., "Language models are few-shot learners", Advances in neural information processing systems, Vol.33, pp.1877-1901, 2020.
|
26 |
S.J. Pan an d Y. Qian g, "A survey on transfer learning", IEEE Transactions on knowledge and data engineering, Vol.22, No.10, pp.1345-1359, 2009.
|
27 |
T. Mikolov et al., "Efficient estimation of word representations in vector space", arXiv preprint arXiv:1301.3781. 2013.
|
28 |
J. Sarzynska-Wawer et al., "Detecting formal thought disorder by deep contextualized word representations", Psychiatry Research, Vol.304, p.114135, 2021.
|
29 |
A. Vaswani et al., "Attention is all you need", Advances in neural information processing systems, Vol.30, pp.5998-6008, 2017.
|
30 |
J.L. Ba et al., "Layer normalization", arXiv preprint arXiv:1607.06450, 2016.
|
31 |
Y. Wu et al., "Google's neural machine translation system: Bridging the gap between human and machine translation", arXiv preprint arXiv:1609. 08144, 2016.
|
32 |
A. Radford et al., "Improving language understanding by generative pre-training", 2018.
|
33 |
A. Wang et al., "GLUE: A multi-task benchmark and analysis platform for natural language understanding", arXiv preprint arXiv:1804.07461, 2018.
|