Browse > Article
http://dx.doi.org/10.4218/etrij.2020-0358

Simultaneous neural machine translation with a reinforced attention mechanism  

Lee, YoHan (Language Intelligence Research Section, Electronics and Telecommunications Research Institutes)
Shin, JongHun (Language Intelligence Research Section, Electronics and Telecommunications Research Institutes)
Kim, YoungKil (Language Intelligence Research Section, Electronics and Telecommunications Research Institutes)
Publication Information
ETRI Journal / v.43, no.5, 2021 , pp. 775-786 More about this Journal
Abstract
To translate in real time, a simultaneous translation system should determine when to stop reading source tokens and generate target tokens corresponding to a partial source sentence read up to that point. However, conventional attention-based neural machine translation (NMT) models cannot produce translations with adequate latency in online scenarios because they wait until a source sentence is completed to compute alignment between the source and target tokens. To address this issue, we propose a reinforced learning (RL)-based attention mechanism, the reinforced attention mechanism, which allows a neural translation model to jointly train the stopping criterion and a partial translation model. The proposed attention mechanism comprises two modules, one to ensure translation quality and the other to address latency. Different from previous RL-based simultaneous translation systems, which learn the stopping criterion from a fixed NMT model, the modules can be trained jointly with a novel reward function. In our experiments, the proposed model has better translation quality and comparable latency compared to previous models.
Keywords
attention mechanism; neural network; reinforcement learning; simultaneous machine translation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Ma et al., STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework, arXiv Preprint, CoRR, 2019, arXiv: 1810.08398.
2 J. Gehring et al., Convolutional sequence to sequence learning, in Proc. Int. Conf. Mach. Learn. (Sydney, Australia), July 2017, pp. 1243-1252.
3 R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (1992), no. 3-4, 229-256.   DOI
4 R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units, arXiv Preprint, CoRR, 2015, arXiv: 1508.07909.
5 M. Chen et al., The best of both worlds: Combining recent advances in neural machine translation, arXiv Preprint, CoRR, 2018, arXiv: 1804.09849.
6 F. Schneider and A. H. Waibel, Towards stream translation: Adaptive computation time for simultaneous machine translation, in Proc. Int. Conf. Spoken Lang. Transl. July 2020, pp. 228-326.
7 K. Cho and M. Esipova, Can neural machine translation do simultaneous translation?, arXiv Preprint, CoRR, 2016, arXiv: abs/1606.02012.
8 J. Gu et al., Learning to translate in real-time with neural machine translation, arXiv Preprint, CoRR, 2016, arXiv: 1610.00388.
9 T. Luong et al., Effective approaches to attention-based neural machine translation, in Proc. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 1412-1421.
10 A. Grissom II et al., Don't until the final verb wait: Reinforcement learning for simultaneous machine translation, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 1342-1352.
11 H. Satija and J. Pineau, Simultaneous machine translation using deep reinforcement learning, in Proc. ICML 2016 Workshop Abstr. Reinf. Learn. (New York City, NY, USA), June 2016, pp. 110-119.
12 Y. Luo et al., Learning online alignments with continuous rewards policy gradient, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), (New Orleans, LA, USA), Mar. 2017, pp. 2801-2805.
13 C. Raffel et al., Online and linear-time attention by enforcing monotonic alignments, in Proc. Int. Conf. Mach. Learn. (Sydney, Australia), July 2017, pp. 2837-2846.
14 D. Bahdanau et al., Neural machine translation by jointly learning to align and translate, in Proc. Int. Conf. Learn. Representations (San Diego, CA, USA), 2015, pp. 1-15.
15 Y. Chen et al., How to do simultaneous translation better with consecutive neural machine translation?, arXiv Preprint, CoRR, 2019, arXiv: abs/1911.03154.
16 E. Jang, S. Gu, and B. Poole, Categorical reparametrization with gumble-softmax, in Proc. Int. Conf. Learn. Representations (Toulon, France), Apr. 2017, pp. 1-12.
17 A. Alinejad, M. Siahbani, and A. Sarkar, Prediction improves simultaneous neural machine translation, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Brussels, Belgium), Oct. 2018, pp. 3022-3027.
18 T. Shen et al., Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling, in Proc. Int. Joint Conf. Artif. Intell. Jan. 2018, pp. 4345-4352.
19 T. Kudo and J. Richardson, Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Brussels, Belgium), Oct. 2018, pp. 66-71.
20 C. C. Chiu and C. Raffel, Monotonic chunkwise attention, in Proc. Int. Conf. Learn. Representations (Vancouver, Canada), Dec. 2017, pp. 1-16.
21 S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780.   DOI
22 Y. Wu et al., Google's neural machine translation system: Bridging the gap between human and machine translation, arXiv Preprint, CoRR, 2016, arXiv: abs/1609.08144.
23 N. Arivazhagan et al., Monotonic infinite lookback attention for simultaneous machine translation, arXiv Preprint, CoRR, 2019, arXiv:1906.05218.
24 B. Zheng et al., Simultaneous translation with flexible policy via restricted imitation learning, arXiv Preprint, CoRR, 2019, arXiv: 1906.01135.
25 B. Zhang et al., Neural machine translation with GRU-gated attention model, IEEE Trans. Neural Netw. Learn. Syst. 31 (2020), no. 11, 4688-4698.   DOI
26 B. Zhang, D. Xiong, and J. Su, Accelerating neural transformer via an average attention network, arXiv Preprint, CoRR, 2018, arXiv: 1805.00631.
27 D. Kim and S. Kim, Fast speaker adaptation using extended diagonal linear transformation for deep neural networks, ETRI J. 41 (2019), no. 1, 109-116.   DOI
28 A. Vaswani et al., Attention is all you need, arXiv Preprint, CoRR, 2017, arXiv: 1706.03762.
29 C. Cherry and G. Foster, Thinking slow about latency evaluation for simultaneous machine translation, arXiv Preprint, CoRR, 2019, arXiv: abs/1906.00048.
30 J. Hou et al., Segment boundary detection directed attention for online end-to-end speech recognition, EURASIP J. Audio, Speech, Music Process. 1 (2020), 1-16.
31 D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proc. Int. Conf. Learn. Representations, (San Diego, CA, USA), 2015, pp. 1-15.
32 K. Papineni et al., Bleu: A method for automatic evaluation of machine translation, in Proc. Assoc. Comput. Linguist. (Philadelphia, PA, USA), July 2002, pp. 311-318.
33 X. Ma et al., Monotonic multihead attention, in Proc. Int. Conf. Learn. Representations (Addis Ababa, Ethiopia), Sept. 2019, pp. 1-11.