DOI QR코드

DOI QR Code

Korean Text to Gloss: Self-Supervised Learning approach

  • Thanh-Vu Dang (Department of ICT Convergence System Engineering at Chonnam National University) ;
  • Gwang-hyun Yu (Department of ICT Convergence System Engineering at Chonnam National University) ;
  • Ji-yong Kim (Flight Control System at LIG Nex1) ;
  • Young-hwan Park (Satrec Initiative Company) ;
  • Chil-woo Lee (Department of School of Electronic & Computer Engineering at Chonnam National University) ;
  • Jin-Young Kim (Department of ICT Convergence System Engineering at Chonnam National University)
  • Received : 2022.11.30
  • Accepted : 2022.02.09
  • Published : 2023.02.28

Abstract

Natural Language Processing (NLP) has grown tremendously in recent years. Typically, bilingual, and multilingual translation models have been deployed widely in machine translation and gained vast attention from the research community. On the contrary, few studies have focused on translating between spoken and sign languages, especially non-English languages. Prior works on Sign Language Translation (SLT) have shown that a mid-level sign gloss representation enhances translation performance. Therefore, this study presents a new large-scale Korean sign language dataset, the Museum-Commentary Korean Sign Gloss (MCKSG) dataset, including 3828 pairs of Korean sentences and their corresponding sign glosses used in Museum-Commentary contexts. In addition, we propose a translation framework based on self-supervised learning, where the pretext task is a text-to-text from a Korean sentence to its back-translation versions, then the pre-trained network will be fine-tuned on the MCKSG dataset. Using self-supervised learning help to overcome the drawback of a shortage of sign language data. Through experimental results, our proposed model outperforms a baseline BERT model by 6.22%.

Keywords

Acknowledgement

This research is supported by the Ministry of Culture, Sports, and Tourism (MCST) and 373 Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research & 374 Development Program (R2020060002).

References

  1. C. Won, S. Moon and Y. Song, "Open Korean Corpora: A Practical Report," in Proceedings of Second Workshop for NLP Open Source Software, 2020. 
  2. C. Necati Cihan, H. Simon, K. Oscar, N. Hermann and B. Richard, "Neural Sign Language Translation," in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 
  3. L. Dongxu, O. Cristian Rodriguez, Y. Xin and L. Hongdong, "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020. 
  4. S. OZGE MERCANOGLU and K. HACER YALIM, "AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods," IEEE Access, vol. 8, pp. 181340-181355, 2020.  https://doi.org/10.1109/access.2020.3028072
  5. K. Sang-Ki, K. Chang Jo, H. J. and C. Choongsang, "Neural Sign Language Translation Based on Human Keypoint Estimation," Applied sciences, vol. 9, no. 13, p. 2683, 2019. 
  6. R. Razieh, K. Kourosh, E. Sergio and S. Mohammad, "Sign Language Production: A Review," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. 
  7. S. Z. Gurbuz, G. Ali Cafer, M. Evie A, G. Darrin J, C. Chris S, R. Mohammad Mahbubur, K. Emre, A. Ridvan, M. Trevor and M. Robiulhossain, "American sign language recognition using rf sensing," IEEE Sensors Journal, vol. 21, no. 3, pp. 3763-3775, 2020. 
  8. R. Razieh, K. Kourosh and E. Sergio, "Sign Language Recognition: A Deep Survey," Expert Systems With Applications, vol. 164, p. 113794, 2021. 
  9. C. C. Necati, K. Oscar, H. Simon and B. Richard, "Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation," in In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. 
  10. Z. Hao, Z. Wengang, Z. Yun and L. Houqiang, "Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 
  11. K. Yin and R. Jesse, "Better Sign Language Translation with STMC-Transformer," in Proceedings of the 28th International Conference on Computational Linguistics, 2020. 
  12. B. Saunders, C. Necati Cihan and B. Richard, "Progressive transformers for end-to-end sign language production," in European Conference on Computer Visio, 2020. 
  13. L. Ventura, D. Amanda and G.-i.-N. Xavier, "Can everybody sign now? Exploring sign language video generation from 2D poses," in arXiv preprint arXiv:2012.10941, 2020. 
  14. S. Stoll, C. Necati Cihan, H. Simon and B. Richard, "Text2Sign: towards sign language production using neural machine translation and generative adversarial networks," International Journal of Computer Vision, vol. 128, no. 4, pp. 891-908, 2020.  https://doi.org/10.1007/s11263-019-01281-2
  15. J. Zelinka and K. Jakub, "Neural sign language synthesis: Words are our glosses," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020. 
  16. C. Chan, G. Shiry, Z. Tinghui and E. Alexei A, "Everybody dance now," in Proceedings of the IEEE/CVF international conference on computer vision, 2019. 
  17. Z. Cao, G. Martinez, T. Simon, S. Wei and Y. and Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. 
  18. E. Park and S. Cho, "KoNLPy: Korean natural language processing in Python," in Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, 2014. 
  19. T. Kudo and R. John, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing," in arXiv preprint arXiv:1808.06226, 2018. 
  20. A. Vaswani, S. Noam, P. Niki, U. Jakob, J. Llion, N. G. Aidan, K. Lukasz and P. Illia, "Attention is all you need," in Advances in neural information processing systems, 2017. 
  21. J. Devlin, C. Ming-Wei, L. Kenton and T. Kristina, "Bert: Pre-training of deep bidirectional transformers for language understanding," in arXiv preprint arXiv:1810.04805, 2018. 
  22. Y. Liu, O. Myle, G. Naman, D. Jingfei, J. Mandar, C. Danqi, L. Omer, L. Mike, Z. Luke and S. Veselin, "Roberta: A robustly optimized bert pretraining approach," in arXiv preprint arXiv:1907.11692, 2019. 
  23. X. Song, S. Alex, S. Yang, D. Dave and Z. Denny, "Fast wordpiece tokenization," in arXiv preprint arXiv:2012.15524, 2020. 
  24. L. Sangah, J. Hansol, B. Yunmee, P. Suzi and S. Hyopil, "KR-BERT: A Small-Scale Korean-Specific Language Model," in arXiv:2008.03979, 2020. 
  25. L. Hyunjae, Y. Jaewoong, H. Bonggyu, J. Seongho, M. Seungjai and G. Youngjune, "KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding". 
  26. S. Edunov, O. Myle, A. Michael and G. David, "Understanding back-translation at scale," in arXiv preprint arXiv:1808.09381, 2018. 
  27. D. T. Vu, Y. Gwanghyun, L. Chilwoo and K. Jinyoung, "Text Data Augmentation for the Korean Language," Applied Sciences, vol. 12, no. 7, p. 3425, 2022. 
  28. Q. Xie, Z. Dai, E. Hovy, M.-T. Luong and . Q. V. Le, "Unsupervised data augmentation for consistency training," Advances in Neural Information Processing Systems, no. 33, pp. 6256-6268, 2020. 
  29. M. Johnson, S. Q. V. L. Mike, K. Maxim, W. Yonghui, C. Zhifeng and T. Nikhil, "Google's multilingual neural machine translation system: Enabling zero-shot translation," in Transactions of the Association for Computational Linguistics, 2017. 
  30. N. Reimers and G. Iryna, "Making monolingual sentence embeddings multilingual using knowledge distillation," in arXiv preprint arXiv:2004.09813, 2020. 
  31. B. Ban, "A Survey on Awesome Korean NLP Datasets," in arXiv preprint arXiv:2112.01624, 2021. https://doi.org/10.1109/ICTC55196.2022.9952930