Filter-mBART Based Neural Machine Translation Using Parallel Corpus Filtering |
Moon, Hyeonseok
(Department of Computer Science and Engineering, Korea University)
Park, Chanjun (Department of Computer Science and Engineering, Korea University) Eo, Sugyeong (Department of Computer Science and Engineering, Korea University) Park, JeongBae (Department of Human Inspired AI Research, Korea University) Lim, Heuiseok (Department of Computer Science and Engineering, Korea University) |
1 | K. Papineni, S. Roukos, T. Ward & W. J. Zhu. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). |
2 | C. Park, Y. Yang, K. Park & H. Lim. (2020). Decoding strategies for improving low-resource machine translation. Electronics, 9(10), 1562. DOI |
3 | T. Kudo & J. Richardson. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226. DOI : 10.18653/v1/P18-1007 |
4 | K. Song, X. Tan, T. Qin, J. Lu & T. Y. Liu. (2019). Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450. |
5 | M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer & O. Levy. (2020). Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8, 64-77. DOI : 10.1162/tacl_a_00300 DOI |
6 | C. Park & H. Lim. (2020). A Study on the Performance Improvement of Machine Translation Using Public Korean-English Parallel Corpus. Journal of Digital Convergence, 18(6), 271-277. DOI : 10.14400/JDC.2020.18.6.271 DOI |
7 | H. Khayrallah & P. Koehn. (2018). On the impact of various types of noise on neural machine translation. arXiv preprint arXiv:1805.12282. DOI : 10.18653/v1/w18-2709 |
8 | Y. Liu et al. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726-742. DOI : 10.1162/tacl_a_00343 DOI |
9 | C. Park, Y. Lee, C. Lee & H Lim. (2020). "Quality, not Quantity? : Effect of parallel corpus quantity and quality on Neural Machine Translation," The 32st Annual Conference on Human Cog-nitive Language Technology. |
10 | W. A. Gale & K. Church. (1993). A program for aligning sentences in bilingual corpora. Computational linguistics, 19(1), 75-102. |
11 | M. Cettolo et al. (2017). Overview of the iwslt 2017 evaluation campaign. In International Workshop on Spoken Language Translation (pp. 2-14). |
12 | M. Ott et al. (2019). fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038. DOI : 10.18653/v1/n19-4009 |
13 | P. Koehn, V. Chaudhary, A. El-Kishky, N. Goyal, P. J. Chen & F. Guzman. (2020, November). Findings of the WMT 2020 shared task on parallel corpus filtering and alignment. In Proceedings of the Fifth Conference on Machine Translation (pp. 726-742). |
14 | M. Lewis et al. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. |
15 | A. Vaswani et al. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762. |
16 | J. Devlin, M. W. Chang, K. Lee & K. Toutanova. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. |
17 | G. Lample & A. Conneau. (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291. |