DOI QR코드

DOI QR Code

Neural Machine translation specialized for Coronavirus Disease-19(COVID-19)

Coronavirus Disease-19(COVID-19)에 특화된 인공신경망 기계번역기

  • Park, Chan-Jun (Department of Computer Science and Engineering, Korea University) ;
  • Kim, Kyeong-Hee (Department of South East Asia, Busan University of Foreign Studies) ;
  • Park, Ki-Nam (Creative Information and Computer Institute, Korea University) ;
  • Lim, Heui-Seok (Department of Computer Science and Engineering, Korea University)
  • 박찬준 (고려대학교 컴퓨터학과) ;
  • 김경희 (부산외국어대학교 동남아창의융합학부) ;
  • 박기남 (고려대학교 정보창의교육연구소) ;
  • 임희석 (고려대학교 컴퓨터학과)
  • Received : 2020.06.25
  • Accepted : 2020.09.20
  • Published : 2020.09.28

Abstract

With the recent World Health Organization (WHO) Declaration of Pandemic for Coronavirus Disease-19 (COVID-19), COVID-19 is a global concern and many deaths continue. To overcome this, there is an increasing need for sharing information between countries and countermeasures related to COVID-19. However, due to linguistic boundaries, smooth exchange and sharing of information has not been achieved. In this paper, we propose a Neural Machine Translation (NMT) model specialized for the COVID-19 domain. Centering on English, a Transformer based bidirectional model was produced for French, Spanish, German, Italian, Russian, and Chinese. Based on the BLEU score, the experimental results showed significant high performance in all language pairs compared to the commercialization system.

최근 세계보건기구(WHO)의 Coronavirus Disease-19(COVID-19)에 대한 팬데믹 선언으로 COVID-19는 세계적인 관심사이며 많은 사망자가 속출하고 있다. 이를 극복하기 위하여 국가 간 정보 교환과 COVID-19 관련 대응 방안 등의 공유에 대한 필요성이 증대되고 있다. 하지만 언어적 경계로 인해 원활한 정보 교환 및 공유가 이루어지지 못하고 있는 실정이다. 이에 본 논문은 COVID-19 도메인에 특화 된 인공신경망 기반 기계번역(Neural Machine Translation(NMT)) 모델을 제안한다. 제안한 모델은 영어를 중심으로 프랑스어, 스페인어, 독일어, 이탈리아어, 러시아어, 중국어 지원이 가능한 Transformer 기반 양방향 모델이다. 실험결과 BLEU 점수를 기준으로 상용화 시스템과 비교하여 모든 언어 쌍에서 유의미한 높은 성능을 보였다.

Keywords

References

  1. Covid, C. D. C. & Team, R. (2020). Severe outcomes among patients with coronavirus disease 2019 (COVID-19)-United States, February 12-March 16, 2020. MMWR Morb Mortal Wkly Rep, 69(12), 343-346. https://doi.org/10.15585/mmwr.mm6912e2
  2. Sohrabi, C. et al. (2020). World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). International Journal of Surgery.
  3. Kasher, A. (Ed.). (2012). Language in focus: foundations, methods and systems: essays in memory of Yehoshua Bar-Hillel (Vol. 43). Springer Science & Business Media.
  4. Dugast, L., Senellart, J. & Koehn, P. (2007, June). Statistical Post-Editing on SYSTRAN's Rule-Based Translation System. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 220-223).
  5. Koehn, P., Och, F. J. & Marcu, D. (2003, May). Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 48-54). Association for Computational Linguistics.
  6. Bahdanau, D., Cho, K. & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  7. Sutskever, I., Vinyals, O. & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
  8. Kalchbrenner, N., Espeholt, L., Simonyan, K., Oord, A. V. D., Graves, A. & Kavukcuoglu, K. (2016). Neural machine translation in linear time. arXiv preprint arXiv:1610.10099.
  9. Gehring, J., Auli, M., Grangier, D., Yarats, D. & Dauphin, Y. N. (2017, August). Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1243-1252). JMLR. org.
  10. Vaswani, A. et al. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
  11. Lample, G. & Conneau, A. (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.
  12. Song, K., Tan, X., Qin, T., Lu, J. & Liu, T. Y. (2019). Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450.
  13. Liu, Y. et al. (2020). Multilingual denoising pre-training for neural machine translation. arXiv preprint arXiv:2001.08210.
  14. Kudo, T. & Richardson, J. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
  15. Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. (2002, ㄴ July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics.