Sign language translation using video captioning and sign language recognition using action recognition

Gi-Duk Kim;Geun-Hoo Lee;

Proceedings of the Korean Society of Computer Information Conference (한국컴퓨터정보학회:학술대회논문집)

2024.01a
/
Pages.317-319
/
2024

Korean Society of Computer Information (한국컴퓨터정보학회)

Sign language translation using video captioning and sign language recognition using action recognition

비디오 캡셔닝을 적용한 수어 번역 및 행동 인식을 적용한 수어 인식

Gi-Duk Kim (3IFuture) ;
Geun-Hoo Lee (3IFuture)

김기덕 ((주)쓰리아이퓨처) ;
이근후 ((주)쓰리아이퓨처)

Published : 2024.01.17

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

본 논문에서는 비디오 캡셔닝 알고리즘을 적용한 수어 번역 및 행동 인식 알고리즘을 적용한 수어 인식 알고리즘을 제안한다. 본 논문에 사용된 비디오 캡셔닝 알고리즘으로 40개의 연속된 입력 데이터 프레임을 CNN 네트워크를 통해 임베딩 하고 트랜스포머의 입력으로 하여 문장을 출력하였다. 행동 인식 알고리즘은 랜덤 샘플링을 하여 한 영상에 40개의 인덱스에서 40개의 연속된 데이터에 CNN 네트워크를 통해 임베딩하고 GRU, 트랜스포머를 결합한 RNN 모델을 통해 인식 결과를 출력하였다. 수어 번역에서 BLEU-4의 경우 7.85, CIDEr는 53.12를 얻었고 수어 인식으로 96.26%의 인식 정확도를 얻었다.

Keywords

References

KIM, Youngmin, et al. Keypoint based sign language translation without glosses. arXiv preprint arXiv:2204.10511, 2022.
CAO, Zhe, et al. Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 7291-7299.
BAHDANAU, Dzmitry; CHO, Kyunghyun; BENGIO, Yoshua. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
LU, Kevin, et al. Frozen pretrained transformers as universal computation engines. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022. p. 7628-7636.
KIPF, Thomas N.; WELLING, Max. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
SIMONYAN, Karen; ZISSERMAN, Andrew. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 2014, 27.
DONAHUE, Jeffrey, et al. Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 2625-2634.
CAMGOZ, Necati Cihan, et al. Neural sign language translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 7784-7793.
OTHMAN, Achraf; JEMNI, Mohamed. English-asl gloss parallel corpus 2012: Aslg-pc12. In: sign-lang@ LREC 2012. European Language Resources Association (ELRA), 2012. p. 151-154.
KO, Sang-Ki, et al. Neural sign language translation based on human keypoint estimation. Applied sciences, 2019, 9.13: 2683.
YANG, Seunghan, et al. The Korean sign language dataset for action recognition. In: International conference on multimedia modeling. Cham: Springer International Publishing, 2019. p. 532-542.
DOSOVITSKIY, Alexey, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

Proceedings of the Korean Society of Computer Information Conference (한국컴퓨터정보학회:학술대회논문집)

Sign language translation using video captioning and sign language recognition using action recognition

비디오 캡셔닝을 적용한 수어 번역 및 행동 인식을 적용한 수어 인식

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)