Browse > Article
http://dx.doi.org/10.15207/JKCS.2022.13.03.077

Korean and Multilingual Language Models Study for Cross-Lingual Post-Training (XPT)  

Son, Suhyune (Department of Computer Science and Engineering, Korea University)
Park, Chanjun (Department of Computer Science and Engineering, Korea University)
Lee, Jungseob (Department of Computer Science and Engineering, Korea University)
Shim, Midan (Department of Software Convergence, Kyung Hee University)
Lee, Chanhee (Naver Corporation)
Park, Kinam (Human-inspired Computing Research Center, Korea University)
Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)
Publication Information
Journal of the Korea Convergence Society / v.13, no.3, 2022 , pp. 77-89 More about this Journal
Abstract
It has been proven through many previous researches that the pretrained language model with a large corpus helps improve performance in various natural language processing tasks. However, there is a limit to building a large-capacity corpus for training in a language environment where resources are scarce. Using the Cross-lingual Post-Training (XPT) method, we analyze the method's efficiency in Korean, which is a low resource language. XPT selectively reuses the English pretrained language model parameters, which is a high resource and uses an adaptation layer to learn the relationship between the two languages. This confirmed that only a small amount of the target language dataset in the relationship extraction shows better performance than the target pretrained language model. In addition, we analyze the characteristics of each model on the Korean language model and the Korean multilingual model disclosed by domestic and foreign researchers and companies.
Keywords
Pretrained Language Model; Transfer Learning; Korean Language Model; Cross-Lingual Language Model; Language Convergence;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 I. Kim, G. Han, J. Ham & W. Baek. (2021). KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer. Opgehaal van https://github.com/kakaobrain/kogpt
2 M. Zaheer et al. (2020). Big Bird: Transformers for Longer Sequences. NeurIPS.
3 J. Yang, S. Ma, D. Zhang, S. Wu, Z. Li & M. Zhou. (2020). Alternating language modeling for cross-lingual pre-training. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 9386-9393.
4 Z. Chi et al. (2021). Xlm-e: Cross-lingual language model pre-training via electra. arXiv preprint arXiv:2106.16138.
5 Z. Chi et al. (2021). Improving pretrained cross-lingual language models via self-labeled word alignment. arXiv preprint arXiv:2106.06381.
6 Y. Liu et al. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726-742.   DOI
7 Z. Chi et al. (2021). mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs. arXiv preprint arXiv:2104.08692.
8 K. Song, X. Tan, T. Qin, J. Lu & T.-Y. Liu. (2019). Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450.
9 Z. Chi, L. Dong, F. Wei, W. Wang, X.-L. Mao & H. Huang. (2020). Cross-lingual natural language generation via pre-training. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 7570-7577.
10 F. Luo et al. (2020). Veco: Variable encoder-decoder pre-training for cross-lingual understanding and generation. arXiv preprint arXiv:2010.16046.
11 A. Conneau et al. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
12 B. A. Richards et al. (2019). A deep learning framework for neuro- science. Nature neuroscience, Vol. 22, No. 11, pp. 1761-1770.   DOI
13 W. Qi et al. (2021). Prophetnet-x: Large-scale pre-training models for english, chinese, multi-lingual, dialog, and code generation. arXiv preprint arXiv:2104.08006.
14 Lewis. M et al. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
15 B. Lester, R. Al-Rfou & N. Constant. (2021). The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
16 J. Devlin, M. Chang, K. Lee & K. Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
17 G. Lample & A. Conneau. (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.
18 C. Lee, K. Yang, T. Whang, C. Park, A. Matteson & H. Lim. (2021). Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models. Applied Sciences, 11(5), 1974.   DOI
19 J. Lee. (2020). Kcbert: Korean comments bert. In Annual Conference on Human and Language Technology (pp. 437-440).
20 A. Vaswani et al. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
21 R. Bommasani et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
22 Y. Liu et al. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
23 K. Clark, M.-T. Luong, Q. V. Le & C. D. Manning. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
24 J. Park. (2020). KoELECTRA: Pretrained ELECTRA model for Korean. https://github.com/monologg/KoELECTRA
25 J. Lee. (2021). KcELECTRA: Korean comments ELECTRA. GitHub repository. Opgehaal van https://github.com/Beomi/KcELECTRA
26 J. Park & D. Kim. (2021). KoBigBird: Pretrained BigBird Model for Korean (Version 1.0.0). doi:10.5281/zenodo.5654154   DOI
27 B. Kim et al. (2021) What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers. arXiv preprint arXiv:2109.04650.
28 A. Radford et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
29 J. Park. (2019). DistilKoBERT: Distillation of KoBERT. GitHub repository. Opgehaal van https://github.com/monologg/DistilKoBERTc
30 Y. Wu et al. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
31 T. Kudo & J. Richardson. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
32 K. Airc. (2021. Mar). KE-T5: Korean English T5. Opgehaal van. https://github.com/AIRC-KETI/ke-t5
33 T. Brown et al. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
34 S. Park et al. (2021). KLUE: Korean Language Understanding Evaluation. arXiv preprint arXiv:2105.09680.
35 I. Yamada, K. Washio. H. Shindo & Y. Matsumoto. (2019). Global entity disambiguation with pretrained contextualized embeddings of words and entities. arXiv preprint arXiv:1909.00426.
36 H. Lee, J. Yoon, B. Hwang, S. Joe, S. Min & Y. Gwon. (2021). KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding. 2020 25th International Conference on Pattern Recognition (ICPR), 5551-5557. IEEE.
37 S. Lee, H. Jang, Y. Baik, S. Park & H. Shin. (2020). Kr-bert: A small-scale korean-specific language model. arXiv preprint arXiv:2008.03979.
38 C. Raffel et al. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
39 V. Sanh, L. Debut, J. Chaumond & T. Wolf. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, Vol. abs/1910.01108.
40 Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma & R. Soricut. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
41 R. Sennrich, B. Haddow & A. Birch. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
42 L. Xue et al. (2020). mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
43 Z. Chi et al. (2020). Infoxlm: An information-theoretic framework for cross-lingual language model pre-training. arXiv preprint arXiv:2007.07834.
44 H. Huang et al. (2019). Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks. arXiv preprint arXiv:1909.00964.
45 Y. Tang et al. (2020). Multilingual translation with extensible multilingual pretraining and finetuning. arXiv preprint arXiv:2008.00401.
46 G. Attardi. (2015). WikiExtractor. GitHub repository. Opgehaal van. https://github.com/attardi/wikiextractor
47 J. Hu, M. Johnson, O. Firat, A. Siddhant & G. Neubig. (2020). Explicit alignment objectives for multilingual bidirectional encoders. arXiv preprint arXiv:2010.07972.
48 R. Ri, I. Yamada & Y. Tsuruoka. (2021). mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models. arXiv preprint arXiv:2110.08151.