DOI QR코드

DOI QR Code

Study on Decoding Strategies in Neural Machine Translation

인공신경망 기계번역에서 디코딩 전략에 대한 연구

  • Seo, Jaehyung (Department of Computer Science and Engineering, Korea University) ;
  • Park, Chanjun (Department of Computer Science and Engineering, Korea University) ;
  • Eo, Sugyeong (Department of Computer Science and Engineering, Korea University) ;
  • Moon, Hyeonseok (Department of Computer Science and Engineering, Korea University) ;
  • Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)
  • 서재형 (고려대학교 컴퓨터학과) ;
  • 박찬준 (고려대학교 컴퓨터학과) ;
  • 어수경 (고려대학교 컴퓨터학과) ;
  • 문현석 (고려대학교 컴퓨터학과) ;
  • 임희석 (고려대학교 컴퓨터학과)
  • Received : 2021.08.10
  • Accepted : 2021.11.20
  • Published : 2021.11.28

Abstract

Neural machine translation using deep neural network has emerged as a mainstream research, and an abundance of investment and studies on model structure and parallel language pair have been actively undertaken for the best performance. However, most recent neural machine translation studies pass along decoding strategy to future work, and have insufficient a variety of experiments and specific analysis on it for generating language to maximize quality in the decoding process. In machine translation, decoding strategies optimize navigation paths in the process of generating translation sentences and performance improvement is possible without model modifications or data expansion. This paper compares and analyzes the significant effects of the decoding strategy from classical greedy decoding to the latest Dynamic Beam Allocation (DBA) in neural machine translation using a sequence to sequence model.

딥러닝 모델을 활용한 인공신경망 기계번역 (Neural machine translation)이 주류 분야로 떠오르면서 최고의 성능을 위해 모델과 데이터 언어 쌍에 대한 많은 투자와 연구가 활발하게 진행되고 있다. 그러나, 최근 대부분의 인공신경망 기계번역 연구들은 번역 문장의 품질을 극대화하는 자연어 생성을 위한 디코딩 전략 (Decoding strategy)에 대해서는 미래 연구 과제로 남겨둔 채 다양한 실험과 구체적인 분석이 부족한 상황이다. 기계번역에서 디코딩 전략은 번역 문장을 생성하는 과정에서 탐색 경로를 최적화 하고, 모델 변경 및 데이터 확장 없이도 성능 개선이 가능하다. 본 논문은 시퀀스 투 시퀀스 (Sequence to Sequence) 모델을 활용한 신경망 기반의 기계번역에서 고전적인 그리디 디코딩 (Greedy decoding)부터 최신의 방법론인 Dynamic Beam Allocation (DBA)까지 비교 분석하여 디코딩 전략의 효과와 그 의의를 밝힌다.

Keywords

Acknowledgement

"This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-0-01405) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation)" and this research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2021R1A6A1A03045425).

References

  1. P. Koehn, F. J. Och & D. Marcu. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 127-133). DOI : 10.3115/1073445.107346
  2. Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey & J. Dean. (2016). Google's neural machine translation system: Bridging the gap betweenhuman and machine translation. arXiv preprint arXiv:1609.08144.
  3. D. Bahdanau, K. H. Cho & Y. Bengio. (2015). Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations, 3, (pp. 1-15).
  4. I. Sutskever, O. Vinyals & Q. V. Le. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems, 27, (pp. 3104-3112). DOI : 10.5555/2969033.2969173
  5. Park, C., Eo, S., Moon, H., & Lim, H. S. (2021, June). Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers (pp. 97-104). DOI : 10.18653/v1/2021.naacl-industry.13
  6. H. Moon, C. Park, S. Eo, J. Park & H. Lim. (2021). Filter-mBART Based Neural Machine Translation Using Parallel Corpus Filtering. Journal of the Korea Convergence Society, 12(5), 1-7. DOI : 10.15207/JKCS.2021.12.5.001
  7. C. Park, Y. Lee, C. Lee & H. Lim. (2020). Quality, not quantity?: Effect of parallel corpus quantity and quality on neural machine translation. In The 32st Annual Conference on Human Cognitive Language Technology (pp. 363-368).
  8. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez & I. Polosukhin. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). DOI : 10.5555/3295222.3295349
  9. A., Conneau & G. Lample. (2019). Cross-lingual language model pretraining. Advances in Neural Information Processing Systems, 32, (pp.. 7059-7069). DOI : 10.1145/3442381.3449830
  10. Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad & L. Zettlemoyer. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, (pp. 726-742). DOI : 10.1162/tacl_a_00343
  11. B. T. Lowerre. (1976). The harpy speech recognition system. Carnegie Mellon University. DOI : 10.1121/1.2003089
  12. S. Wiseman & A. M. Rush. (2016). Sequence-to-Sequence Learning as Beam-Search Optimization. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1296-1306). DOI : 10.18653/v1/D16-1137
  13. M. Freitag & Y. Al-Onaizan. (2017). Beam Search Strategies for Neural Machine Translation. In Proceedings of the First Workshop on Neural Machine Translation (pp. 56-60). DOI : 10.18653/v1/W17-3207
  14. X. Hu, W Li, X. Lan, H. Wu & H. Wang. (2015). Improved beam search with constrained softmax for nmt. Proceedings of MT Summit XV, 297.
  15. J. Li & D. Jurafsky. (2016). Mutual information and diverse decoding improve neural machine translation. arXiv preprint arXiv:1601.00372.
  16. J. Li, W. Monroe & D. Jurafsky. (2017). Learning to decode for future success. arXiv preprint arXiv:1701.06549.
  17. R. Paulus, C. Xiong & R. Socher. (2017). A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.
  18. S. Jean, O. Firat, K. Cho, R. Memisevic & Y. Bengio. (2015). Montreal neural machine translation systems for WMT'15. In Proceedings of the Tenth Workshop on Statistical Machine Translation (pp. 134-140). DOI : 10.18653/v1/W15-3014
  19. P. Koehn & R. Knowles. (2017). Six Challenges for Neural Machine Translation. In Proceedings of the First Workshop on Neural Machine Translation (pp. 28-39). DOI : 10.18653/v1/W17-3204
  20. K. Murray & D. Chiang. (2018). Correcting Length Bias in Neural Machine Translation. In Proceedings of the Third Conference on Machine Translation: Research Papers (pp. 212-223). DOI : 10.18653/v1/W18-6322
  21. W. He, Z. He, H. Wu & H. Wang. (2016, February). Improved neural machine translation with SMT features. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (pp. 151-157). DOI : 10.5555/3015812.3015835
  22. Y. Yang, L. Huang & M. Ma. (2018). Breaking the Beam Search Curse: A Study of (Re-) Scoring Methods and Stopping Criteria for Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3054-3059). DOI : 10.18653/v1/D18-1342.
  23. N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong & R. Socher. (2019). Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
  24. A. Holtzman, J. Buys, M. Forbes, A. Bosselut, D. Golub & Y. Choi. (2018). Learning to Write with Cooperative Discriminators. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1638-1649). DOI : 10.18653/v1/P18-1152
  25. Q. Huang, Z. Gan, A. Celikyilmaz, D. Wu, J. Wang & X. He. (2019). Hierarchically structured reinforcement learning for topically coherent visual story generation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 8465-8472) DOI : 10.1609/aaai.v33i01.33018465
  26. A. Holtzman, J. Buys, L. Du, M. Forbes & Y. Choi. (2019). The Curious Case of Neural Text Degeneration. In International Conference on Learning Representations. arXiv preprint arXiv:1904.09751.
  27. M. Post & D. Vilar. (2018). Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 1314-1324). DOI : 10.18653/v1/N18-1119
  28. J. E. Hu, H. Khayrallah, R. Culkin, P. Xia, T. Chen, M. Post & B. Van Durme. (2019). Improved lexically constrained decoding for translation and monolingual rewriting. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 839-850). DOI : 10.18653/v1/N-19-1090
  29. L. Huang, K. Zhao & M. Ma. (2017). When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2134-2139). DOI : 10.18653/v1/D17-1227
  30. R. Shu & H. Nakayama. (2018). Improving beam search by removing monotonic constraint for neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 339-344). DOI : 10.18653/v1/P18-2054
  31. Y. Shibata, T. Kida, S. Fukamachi, M. Takeda, A. Shinohara, T. Shinohara & S. Arikawa. (1999). Byte Pair encoding: A text compression scheme that accelerates pattern matching.
  32. T. Kudo & J. Richardson. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 66-71). DOI : 10.18653/v1/D18-2012
  33. I. Provilkov, D. Emelianenko & E. Voita. (2020). BPE-Dropout: Simple and Effective Subword Regularization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1882-1892). DOI : 10.18653/v1/2020.acl-main.170
  34. C. Hokamp & Q. Liu. (2017). Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1535-1546). DOI : 10.18653/v1/P17-1141
  35. C. Park & H. Lim. (2020). A Study on the Performance Improvement of Machine Translation Using Public Korean-English Parallel Corpus. Journal of Digital Convergence, 18(6), 271-277. DOI : 10.14400/JDC.2020.18.6.271
  36. K. Papineni, S. Roukos, T. Ward & W. J. Zhu. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). DOI : 10.3115/1073083.1073135
  37. C. Park, Y. Yang, K. Park & H. Lim. (2020). Decoding strategies for improving low-resource machine translation. Electronics, 9(10), 1562. DOI : 10.3390/electronics9101562
  38. C. Park, J. Seo, S. Lee, C. Lee, H. Moon, S. Eo, & H. Lim. (2021). BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text. Proceedings of the 8th Workshop on Asian Translation, (pp. 106-116). DOI : 10.18653/v1/2021.wat-1.10