DOI QR코드

DOI QR Code

Simultaneous neural machine translation with a reinforced attention mechanism

  • Lee, YoHan (Language Intelligence Research Section, Electronics and Telecommunications Research Institutes) ;
  • Shin, JongHun (Language Intelligence Research Section, Electronics and Telecommunications Research Institutes) ;
  • Kim, YoungKil (Language Intelligence Research Section, Electronics and Telecommunications Research Institutes)
  • 투고 : 2020.09.17
  • 심사 : 2021.02.25
  • 발행 : 2021.10.01

초록

To translate in real time, a simultaneous translation system should determine when to stop reading source tokens and generate target tokens corresponding to a partial source sentence read up to that point. However, conventional attention-based neural machine translation (NMT) models cannot produce translations with adequate latency in online scenarios because they wait until a source sentence is completed to compute alignment between the source and target tokens. To address this issue, we propose a reinforced learning (RL)-based attention mechanism, the reinforced attention mechanism, which allows a neural translation model to jointly train the stopping criterion and a partial translation model. The proposed attention mechanism comprises two modules, one to ensure translation quality and the other to address latency. Different from previous RL-based simultaneous translation systems, which learn the stopping criterion from a fixed NMT model, the modules can be trained jointly with a novel reward function. In our experiments, the proposed model has better translation quality and comparable latency compared to previous models.

키워드

과제정보

This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (R7119-16-1001, core technology development of the real-time simultaneous speech translation based on knowledge enhancement).

참고문헌

  1. D. Bahdanau et al., Neural machine translation by jointly learning to align and translate, in Proc. Int. Conf. Learn. Representations (San Diego, CA, USA), 2015, pp. 1-15.
  2. T. Luong et al., Effective approaches to attention-based neural machine translation, in Proc. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 1412-1421.
  3. K. Cho and M. Esipova, Can neural machine translation do simultaneous translation?, arXiv Preprint, CoRR, 2016, arXiv: abs/1606.02012.
  4. A. Grissom II et al., Don't until the final verb wait: Reinforcement learning for simultaneous machine translation, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 1342-1352.
  5. A. Alinejad, M. Siahbani, and A. Sarkar, Prediction improves simultaneous neural machine translation, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Brussels, Belgium), Oct. 2018, pp. 3022-3027.
  6. M. Ma et al., STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework, arXiv Preprint, CoRR, 2019, arXiv: 1810.08398.
  7. H. Satija and J. Pineau, Simultaneous machine translation using deep reinforcement learning, in Proc. ICML 2016 Workshop Abstr. Reinf. Learn. (New York City, NY, USA), June 2016, pp. 110-119.
  8. J. Gu et al., Learning to translate in real-time with neural machine translation, arXiv Preprint, CoRR, 2016, arXiv: 1610.00388.
  9. Y. Chen et al., How to do simultaneous translation better with consecutive neural machine translation?, arXiv Preprint, CoRR, 2019, arXiv: abs/1911.03154.
  10. Y. Luo et al., Learning online alignments with continuous rewards policy gradient, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), (New Orleans, LA, USA), Mar. 2017, pp. 2801-2805.
  11. C. Raffel et al., Online and linear-time attention by enforcing monotonic alignments, in Proc. Int. Conf. Mach. Learn. (Sydney, Australia), July 2017, pp. 2837-2846.
  12. N. Arivazhagan et al., Monotonic infinite lookback attention for simultaneous machine translation, arXiv Preprint, CoRR, 2019, arXiv:1906.05218.
  13. B. Zheng et al., Simultaneous translation with flexible policy via restricted imitation learning, arXiv Preprint, CoRR, 2019, arXiv: 1906.01135.
  14. B. Zhang et al., Neural machine translation with GRU-gated attention model, IEEE Trans. Neural Netw. Learn. Syst. 31 (2020), no. 11, 4688-4698. https://doi.org/10.1109/tnnls.2019.2957276
  15. B. Zhang, D. Xiong, and J. Su, Accelerating neural transformer via an average attention network, arXiv Preprint, CoRR, 2018, arXiv: 1805.00631.
  16. D. Kim and S. Kim, Fast speaker adaptation using extended diagonal linear transformation for deep neural networks, ETRI J. 41 (2019), no. 1, 109-116. https://doi.org/10.4218/etrij.2017-0087
  17. Y. Wu et al., Google's neural machine translation system: Bridging the gap between human and machine translation, arXiv Preprint, CoRR, 2016, arXiv: abs/1609.08144.
  18. J. Gehring et al., Convolutional sequence to sequence learning, in Proc. Int. Conf. Mach. Learn. (Sydney, Australia), July 2017, pp. 1243-1252.
  19. A. Vaswani et al., Attention is all you need, arXiv Preprint, CoRR, 2017, arXiv: 1706.03762.
  20. S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  21. R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (1992), no. 3-4, 229-256. https://doi.org/10.1007/BF00992696
  22. C. Cherry and G. Foster, Thinking slow about latency evaluation for simultaneous machine translation, arXiv Preprint, CoRR, 2019, arXiv: abs/1906.00048.
  23. J. Hou et al., Segment boundary detection directed attention for online end-to-end speech recognition, EURASIP J. Audio, Speech, Music Process. 1 (2020), 1-16.
  24. R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units, arXiv Preprint, CoRR, 2015, arXiv: 1508.07909.
  25. T. Kudo and J. Richardson, Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Brussels, Belgium), Oct. 2018, pp. 66-71.
  26. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proc. Int. Conf. Learn. Representations, (San Diego, CA, USA), 2015, pp. 1-15.
  27. M. Chen et al., The best of both worlds: Combining recent advances in neural machine translation, arXiv Preprint, CoRR, 2018, arXiv: 1804.09849.
  28. K. Papineni et al., Bleu: A method for automatic evaluation of machine translation, in Proc. Assoc. Comput. Linguist. (Philadelphia, PA, USA), July 2002, pp. 311-318.
  29. X. Ma et al., Monotonic multihead attention, in Proc. Int. Conf. Learn. Representations (Addis Ababa, Ethiopia), Sept. 2019, pp. 1-11.
  30. C. C. Chiu and C. Raffel, Monotonic chunkwise attention, in Proc. Int. Conf. Learn. Representations (Vancouver, Canada), Dec. 2017, pp. 1-16.
  31. E. Jang, S. Gu, and B. Poole, Categorical reparametrization with gumble-softmax, in Proc. Int. Conf. Learn. Representations (Toulon, France), Apr. 2017, pp. 1-12.
  32. T. Shen et al., Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling, in Proc. Int. Joint Conf. Artif. Intell. Jan. 2018, pp. 4345-4352.
  33. F. Schneider and A. H. Waibel, Towards stream translation: Adaptive computation time for simultaneous machine translation, in Proc. Int. Conf. Spoken Lang. Transl. July 2020, pp. 228-326.