SciBabel: a system for crowd-sourced validation of automatic translations of scientific texts

  • 투고 : 2020.03.21
  • 심사 : 2020.05.26
  • 발행 : 2020.05.28


Scientific research is mostly published in English, regardless of the researcher's nationality. However, this growing practice impairs or hinders the comprehension of professionals who depend on the results of these studies to provide adequate care for their patients. We suggest that machine translation (MT) can be used as a way of providing useful translation for biomedical articles, even though the translation itself may not be fluent. To tackle possible mistranslation that can harm a patient, we resort to crowd-sourced validation of translations. We developed a prototype of MT validation and edition, where users can vote for that translation as valid, or suggest modifications (i.e., post-editing the MT). A glossary match system is also included, aiming at terminology consistency.



  1. Gordin MD. Scientific Babel: How Science Was Done before and after Global English. Chicago: University of Chicago Press, 2015.
  2. Hutchins JW. Machine translation: a concise history. Comput Aided Transl Theor Pract 2007;13:11.
  3. Hutchins JW. Early Years in Machine Translation: Memoirs and Biographies of Pioneers. Amsterdam: John Benjamins Publishing, 2000.
  4. Poibeau T. The 1966 ALPAC report and its consequences. In: Machine Translation (Poibeau T, ed.). Cambridge: MIT Press, 2017. pp. 75-89.
  5. Garg A, Agarwal M. Machine translation: a literature review. Preprint at (2018).
  6. Okpor MD. Machine translation approaches: issues and challenges. Int J Comput Sci Issues 2014;11:159-165.
  7. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Preprint at (2014).
  8. Cheng Y. Joint Training for Neural Machine Translation. Singapore: Springer Singapore, 2019. pp. 25-40.
  9. Apter E. Translation-9/11: terrorism, immigration, and the world of global language politics. Global South 2007;1:69-80.
  10. Rapp R. The back-translation score: automatic MT evaluation at the sentence level without reference translations. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (Su KY, Su J, Wiebe J, Li H, eds.), 2009 Aug 4, Singapore. Stroudsburg: Association for Computational Linguistics, 2009. pp. 133-136.
  11. Neveol A, Zweigenbaum P, Max A, Yvon F, Ivanishcheva Y, Ravaud P. Statistical machine translation of systematic reviews into French. Training 2013;15:366K.
  12. Neves M, Yepes AJ, Neveol A. The scielo corpus: a parallel corpus of scientific publications for biomedicine. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), 2016 May, Portoroz, Slovenia. Paris: European Language Resources Association, 2016. pp. 2942-2948.
  13. Soares F, Becker K. UFRGS participation on the WMT Biomedical Translation Shared Task. Preprint at (2018).
  14. Saunders D, Stahlberg F, Byrne B. UCAM biomedical translation at WMT19: transfer learning multi-domain ensembles. Preprint at (2019).
  15. Soares F, Krallinger M. BSC participation in the WMT translation of biomedical abstracts. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2) (Bojar O, Chatterjee R, Federmann C, Fishel M, Graham Y, Haddow B, et al., eds.), 2019 Aug 1-2, Florence, Italy. Stroudsburg: Association for Computational Linguistics, 2019. pp. 175-178.
  16. Peng W, Liu J, Li L, Liu Q. Huawei's NMT systems for the WMT 2019 Biomedical Translation Task. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2) (Bojar O, Chatterjee R, Federmann C, Fishel M, Graham Y, Haddow B, et al., eds.), 2019 Aug 1-2, Florence, Italy. Stroudsburg: Association for Computational Linguistics, 2019. pp. 164-168.
  17. Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002 Jul 6-12, Philadelphia, PA. Stroudsburg: Association for Computational Linguistics, 2002. pp. 311-318.
  18. Daems J, Macken L, Vandepitte S. Quality as the sum of its parts: a two-step approach for the identification of translation problems and translation quality assessment for HT and MT+PE. In: Proceedings of MT Summit XIV Workshop on Post-Editing Technology and Practice (O'Brien S, Simard M, Specia L, eds.), 2013 Sep 2, Nice, France. Allschwil: European Association for Machin Translation, 2013. pp. 63-71.
  19. Esselink B. A Practical Guide to Localization: Language International World Directory. Amsterdam: John Benjamins Publishing Company, 2000.
  20. Beberg AL, Ensign DL, Jayachandran G, Khaliq S, Pande VS. Folding@home: lessons from eight years of volunteer distributed computing. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009 May 23-29, Rome, Italy. New York: Institute of Electrical and Electronics Engineers, 2009.
  21. Anderson DP, Cobb J, Korpela E, Lebofsky M, Werthimer D. SETI@home: an experiment in public-resource computing. Commun ACM 2002;45:56-61.
  22. Sabou M, Bontcheva K, Derczynski L, Scharl A. Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014 May, Reykjavik, Iceland. Paris: European Language Resources Association, 2014. pp. 859-866.
  23. Bontcheva K, Derczynski L, Roberts I. Crowdsourcing named entity recognition and entity linking corpora. In: Handbook of Linguistic Annotation (Ide N, Pustejovsky J, eds.). Dordrecht: Springer Netherlands, 2017. pp. 449-464.
  24. Jurgens D, Navigli R. It's all fun and games until someone annotates: video games with a purpose for linguistic annotation. Trans Assoc Comput Linguist 2014;2:449-464.
  25. Rokicki M, Chelaru S, Zerr S, Siersdorfer S. Competitive game designs for improving the cost effectiveness of crowdsourcing. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM '14), 2014 Nov, Sanghai, China. New York: Association for Computing Machinery, 2014. pp. 1469-1478.
  26. Munezero M, Kakkonen T, Sedano CI, Sutinen E, Montero CS. EmotionExpert: Facebook game for crowdsourcing annotations for emotion detection. In: 2013 IEEE International Games Innovation Conference (IGIC), 2013 Sep 23-25, Vancouver, BC, Canada. New York: Institute of Electrical and Electronics Engineers, 2013.
  27. Chen N, Hoi SC, Li S, Xiao X. Mobile app tagging. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining (WSDM '16), 2016 Feb 22-25, San Francisco, CA, USA. New York: Association for Computing Machinery, 2016. pp. 63-72.
  28. Zaidan OF, Callison-Burch C. Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, 2011 Jun 19-24, Portland, OR, USA. Stroudsburg: Association for Computational Linguistics, 2011. pp. 1220-1229.
  29. Ambati V, Vogel S, Carbonell J. Collaborative workflow for crowdsourcing translation. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, 2012 Feb 11-15, Seattle, WA, USA. New York: Association for Computing Machinery, 2012. pp. 1191-1194.