Korean Text to Gloss: Self-Supervised Learning approach

Thanh-Vu Dang;Gwang-hyun Yu;Ji-yong Kim;Young-hwan Park;Chil-woo Lee;Jin-Young Kim;

doi:10.30693/SMJ.2023.12.1.32

Smart Media Journal (스마트미디어저널)

Volume 12 Issue 1
/
Pages.32-46
/
2023
/
2287-1322(pISSN)
/
2288-9671(eISSN)

THE KOREAN INSTITUTE OF SMART MEDIA (한국스마트미디어학회)

DOI QR Code

Korean Text to Gloss: Self-Supervised Learning approach

Thanh-Vu Dang (Department of ICT Convergence System Engineering at Chonnam National University) ;
Gwang-hyun Yu (Department of ICT Convergence System Engineering at Chonnam National University) ;
Ji-yong Kim (Flight Control System at LIG Nex1) ;
Young-hwan Park (Satrec Initiative Company) ;
Chil-woo Lee (Department of School of Electronic & Computer Engineering at Chonnam National University) ;
Jin-Young Kim (Department of ICT Convergence System Engineering at Chonnam National University)

Received : 2022.11.30
Accepted : 2022.02.09
Published : 2023.02.28

https://doi.org/10.30693/SMJ.2023.12.1.32 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Natural Language Processing (NLP) has grown tremendously in recent years. Typically, bilingual, and multilingual translation models have been deployed widely in machine translation and gained vast attention from the research community. On the contrary, few studies have focused on translating between spoken and sign languages, especially non-English languages. Prior works on Sign Language Translation (SLT) have shown that a mid-level sign gloss representation enhances translation performance. Therefore, this study presents a new large-scale Korean sign language dataset, the Museum-Commentary Korean Sign Gloss (MCKSG) dataset, including 3828 pairs of Korean sentences and their corresponding sign glosses used in Museum-Commentary contexts. In addition, we propose a translation framework based on self-supervised learning, where the pretext task is a text-to-text from a Korean sentence to its back-translation versions, then the pre-trained network will be fine-tuned on the MCKSG dataset. Using self-supervised learning help to overcome the drawback of a shortage of sign language data. Through experimental results, our proposed model outperforms a baseline BERT model by 6.22%.

Keywords

Acknowledgement

This research is supported by the Ministry of Culture, Sports, and Tourism (MCST) and 373 Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research & 374 Development Program (R2020060002).

References

C. Won, S. Moon and Y. Song, "Open Korean Corpora: A Practical Report," in Proceedings of Second Workshop for NLP Open Source Software, 2020.
C. Necati Cihan, H. Simon, K. Oscar, N. Hermann and B. Richard, "Neural Sign Language Translation," in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
L. Dongxu, O. Cristian Rodriguez, Y. Xin and L. Hongdong, "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020.
S. OZGE MERCANOGLU and K. HACER YALIM, "AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods," IEEE Access, vol. 8, pp. 181340-181355, 2020. https://doi.org/10.1109/access.2020.3028072
K. Sang-Ki, K. Chang Jo, H. J. and C. Choongsang, "Neural Sign Language Translation Based on Human Keypoint Estimation," Applied sciences, vol. 9, no. 13, p. 2683, 2019.
R. Razieh, K. Kourosh, E. Sergio and S. Mohammad, "Sign Language Production: A Review," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
S. Z. Gurbuz, G. Ali Cafer, M. Evie A, G. Darrin J, C. Chris S, R. Mohammad Mahbubur, K. Emre, A. Ridvan, M. Trevor and M. Robiulhossain, "American sign language recognition using rf sensing," IEEE Sensors Journal, vol. 21, no. 3, pp. 3763-3775, 2020.
R. Razieh, K. Kourosh and E. Sergio, "Sign Language Recognition: A Deep Survey," Expert Systems With Applications, vol. 164, p. 113794, 2021.
C. C. Necati, K. Oscar, H. Simon and B. Richard, "Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation," in In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020.
Z. Hao, Z. Wengang, Z. Yun and L. Houqiang, "Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
K. Yin and R. Jesse, "Better Sign Language Translation with STMC-Transformer," in Proceedings of the 28th International Conference on Computational Linguistics, 2020.
B. Saunders, C. Necati Cihan and B. Richard, "Progressive transformers for end-to-end sign language production," in European Conference on Computer Visio, 2020.
L. Ventura, D. Amanda and G.-i.-N. Xavier, "Can everybody sign now? Exploring sign language video generation from 2D poses," in arXiv preprint arXiv:2012.10941, 2020.
S. Stoll, C. Necati Cihan, H. Simon and B. Richard, "Text2Sign: towards sign language production using neural machine translation and generative adversarial networks," International Journal of Computer Vision, vol. 128, no. 4, pp. 891-908, 2020. https://doi.org/10.1007/s11263-019-01281-2
J. Zelinka and K. Jakub, "Neural sign language synthesis: Words are our glosses," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020.
C. Chan, G. Shiry, Z. Tinghui and E. Alexei A, "Everybody dance now," in Proceedings of the IEEE/CVF international conference on computer vision, 2019.
Z. Cao, G. Martinez, T. Simon, S. Wei and Y. and Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
E. Park and S. Cho, "KoNLPy: Korean natural language processing in Python," in Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, 2014.
T. Kudo and R. John, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing," in arXiv preprint arXiv:1808.06226, 2018.
A. Vaswani, S. Noam, P. Niki, U. Jakob, J. Llion, N. G. Aidan, K. Lukasz and P. Illia, "Attention is all you need," in Advances in neural information processing systems, 2017.
J. Devlin, C. Ming-Wei, L. Kenton and T. Kristina, "Bert: Pre-training of deep bidirectional transformers for language understanding," in arXiv preprint arXiv:1810.04805, 2018.
Y. Liu, O. Myle, G. Naman, D. Jingfei, J. Mandar, C. Danqi, L. Omer, L. Mike, Z. Luke and S. Veselin, "Roberta: A robustly optimized bert pretraining approach," in arXiv preprint arXiv:1907.11692, 2019.
X. Song, S. Alex, S. Yang, D. Dave and Z. Denny, "Fast wordpiece tokenization," in arXiv preprint arXiv:2012.15524, 2020.
L. Sangah, J. Hansol, B. Yunmee, P. Suzi and S. Hyopil, "KR-BERT: A Small-Scale Korean-Specific Language Model," in arXiv:2008.03979, 2020.
L. Hyunjae, Y. Jaewoong, H. Bonggyu, J. Seongho, M. Seungjai and G. Youngjune, "KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding".
S. Edunov, O. Myle, A. Michael and G. David, "Understanding back-translation at scale," in arXiv preprint arXiv:1808.09381, 2018.
D. T. Vu, Y. Gwanghyun, L. Chilwoo and K. Jinyoung, "Text Data Augmentation for the Korean Language," Applied Sciences, vol. 12, no. 7, p. 3425, 2022.
Q. Xie, Z. Dai, E. Hovy, M.-T. Luong and . Q. V. Le, "Unsupervised data augmentation for consistency training," Advances in Neural Information Processing Systems, no. 33, pp. 6256-6268, 2020.
M. Johnson, S. Q. V. L. Mike, K. Maxim, W. Yonghui, C. Zhifeng and T. Nikhil, "Google's multilingual neural machine translation system: Enabling zero-shot translation," in Transactions of the Association for Computational Linguistics, 2017.
N. Reimers and G. Iryna, "Making monolingual sentence embeddings multilingual using knowledge distillation," in arXiv preprint arXiv:2004.09813, 2020.
B. Ban, "A Survey on Awesome Korean NLP Datasets," in arXiv preprint arXiv:2112.01624, 2021. https://doi.org/10.1109/ICTC55196.2022.9952930

Smart Media Journal (스마트미디어저널)

Korean Text to Gloss: Self-Supervised Learning approach

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)