[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.14400/JDC.2021.19.7.199

Recent Automatic Post Editing Research

Moon, Hyeonseok (Department of Computer Science and Engineering, Korea University)
Park, Chanjun (Department of Computer Science and Engineering, Korea University)
Eo, Sugyeong (Department of Computer Science and Engineering, Korea University)
Seo, Jaehyung (Department of Computer Science and Engineering, Korea University)
Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)

Publication Information

Journal of Digital Convergence / v.19, no.7, 2021 , pp. 199-208 More about this Journal

Abstract

Automatic Post Editing(APE) is the study that automatically correcting errors included in the machine translated sentences. The goal of APE task is to generate error correcting models that improve translation quality, regardless of the translation system. For training these models, source sentence, machine translation, and post edit, which is manually edited by human translator, are utilized. Especially in the recent APE research, multilingual pretrained language models are being adopted, prior to the training by APE data. This study deals with multilingual pretrained language models adopted to the latest APE researches, and the specific application method for each APE study. Furthermore, based on the current research trend, we propose future research directions utilizing translation model or mBART model.

Keywords

Deep Learning; Natural Language Process; Language Convergence; Machine Translation; Automatic Post Editing; Pretrained model;

Citations & Related Records

Reference

1	Koehn, P., Chaudhary, V., El-Kishky, A., Goyal, N., Chen, P. J., & Guzman, F. (2020, November). Findings of the WMT 2020 shared task on parallel corpus filtering and alignment. In Proceedings of the Fifth Conference on Machine Translation, (pp. 726-742).
2	Pal, S., Herbig, N., Kruger, A., & van Genabith, J. (2018, October). A transformer-based multi-source automatic post-editing system. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers (pp. 827-835).
3	Ive, J., Specia, L., Szoc, S., Vanallemeersch, T., Van den Bogaert, J., Farah, E., ... & Khalilov, M. (2020, May). A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 3692-3697).
4	Specia, L. et al. (2020, November). Findings of the WMT 2020 shared task on quality estimation. In Proceedings of the Fifth Conference on Machine Translation (pp. 743-764).
5	Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzman, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
6	Park, C., & Lim, H. (2020). Automatic Post Editing Research. Journal of the Korea Convergence Society, 11(5), 1-8. DOI
7	Park, C., Yang, Y., Park, K., & Lim, H. (2020). Decoding strategies for improving low-resource machine translation. Electronics, 9(10), 1562. DOI
8	Chatterjee, R., Freitag, M., Negri, M., & Turchi, M. (2020, November). Findings of the WMT 2020 shared task on automatic post-editing. In Proceedings of the Fifth Conference on Machine Translation, (pp. 646-659).
9	Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359. DOI
10	Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., ... & Zettlemoyer, L. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726-742. DOI
11	Wenzek, G., Lachaux, M. A., Conneau, A., Chaudhary, V., Guzman, F., Joulin, A., & Grave, E. (2019). Ccnet: Extracting high quality monolingual datasets from web crawl data. arXiv preprint arXiv:1911.00359.
12	Lee, D. (2020, November). Cross-Lingual Transformers for Neural Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation (pp. 772-776).
13	Shin, J., & Lee, J. H. (2018, October). Multi-encoder transformer network for automatic post-editing. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers (pp. 840-845).
14	Junczys-Dowmunt, M., & Grundkiewicz, R. (2018). MS-UEdin submission to the WMT2018 APE shared task: Dual-source transformer for automatic post-editing. arXiv preprint arXiv:1809.00188.
15	Conneau, A., Lample, G., Rinott, R., Williams, A., Bowman, S. R., Schwenk, H., & Stoyanov, V. (2018). XNLI: Evaluating cross-lingual sentence representations. arXiv preprint arXiv:1809.05053.
16	Correia, G. M., & Martins, A. F. (2019). A simple and effective approach to automatic post-editing with transfer learning. arXiv preprint arXiv:1906.06253.
17	Jihyung L, WonKee L, Young-Gil K, Jonghyeok L. (2020). Transfer Learning of Automatic Post-Editing with Cross-lingual Language Model. KIISE 2020, 392-394.
18	Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., ... & Gelly, S. (2019, May). Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning (pp. 2790-2799). PMLR.
19	Lee, J., Lee, W., Shin, J., Jung, B., Kim, Y. G., & Lee, J. H. (2020, November). POSTECH-ETRI's Submission to the WMT2020 APE Shared Task: Automatic Post-Editing with Cross-lingual Language Model. In Proceedings of the Fifth Conference on Machine Translation (pp. 777-782).
20	Allen, J., & Hogan, C. (2000, April). Toward the development of a post editing module for raw machine translation output: A controlled language perspective. In Third International Controlled Language Applications Workshop (CLAW-00) (pp. 62-71).
21	Simard, M., Goutte, C., & Isabelle, P. (2007, April). Statistical phrase-based post-editing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference (pp. 508-515).
22	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
23	Pires, T., Schlinger, E., & Garrette, D. (2019). How multilingual is multilingual bert?. arXiv preprint arXiv:1906.01502.
24	Lample, G., & Conneau, A. (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.
25	Yang, H., Wang, M., Wei, D., Shang, H., Guo, J., Li, Z., ... & Chen, Y. (2020, November). HW-TSC's Participation at WMT 2020 Automatic Post Editing Shared Task. In Proceedings of the Fifth Conference on Machine Translation (pp. 797-802).
26	Park, C., Yang, Y., Lee, C., & Lim, H. (2020). Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection. IEEE Access, 8, 106264-106272. DOI
27	Park, C., Kim, K., Yang, Y., Kang, M., & Lim, H. (2020). Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimedia Tools and Applications, 1-18.
28	Wang, J., Wang, K., Fan, K., Zhang, Y., Lu, J., Ge, X., ... & Zhao, Y. (2020, November). Alibaba's Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT. In Proceedings of the Fifth Conference on Machine Translation (pp. 789-796).
29	Kim, H., Lee, J. H., & Na, S. H. (2017, September). Predictor-estimator using multilevel task learning with stack propagation for neural quality estimation. In Proceedings of the Second Conference on Machine Translation (pp. 562-568).
30	Lopes, A. V., Farajian, M. A., Correia, G. M., Trenous, J., & Martins, A. F. (2019). Unbabel's Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing. arXiv preprint arXiv:1905.13068.
31	Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
32	Lee, W., Shin, J., Jung, B., Lee, J., & Lee, J. H. (2020, November). Noising Scheme for Data Augmentation in Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation (pp. 783-788).
33	Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

KSCI

Recent Automatic Post Editing Research 최신 기계번역 사후 교정 연구

Recent Automatic Post Editing Research