Character Recognition and Search for Media Editing

Park, Yong-Suk;Kim, Hyun-Sik;

doi:10.5909/JBE.2022.27.4.519

Journal of Broadcast Engineering (방송공학회논문지)

Volume 27 Issue 4
/
Pages.519-526
/
2022
/
1226-7953(pISSN)
/
2287-9137(eISSN)

The Korean Institute of Broadcast and Media Engineers (한국방송∙미디어공학회)

DOI QR Code

Character Recognition and Search for Media Editing

미디어 편집을 위한 인물 식별 및 검색 기법

Park, Yong-Suk (Contents Convergence Research Center, Korea Electronics Technology Institute) ;
Kim, Hyun-Sik (Contents Convergence Research Center, Korea Electronics Technology Institute)

박용석 (한국전자기술연구원 콘텐츠응용연구센터) ;
김현식 (한국전자기술연구원 콘텐츠응용연구센터)

Received : 2022.05.17
Accepted : 2022.07.13
Published : 2022.07.30

https://doi.org/10.5909/JBE.2022.27.4.519 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

Identifying and searching for characters appearing in scenes during multimedia video editing is an arduous and time-consuming process. Applying artificial intelligence to labor-intensive media editing tasks can greatly reduce media production time, improving the creative process efficiency. In this paper, a method is proposed which combines existing artificial intelligence based techniques to automate character recognition and search tasks for video editing. Object detection, face detection, and pose estimation are used for character localization and face recognition and color space analysis are used to extract unique representation information.

동영상 콘텐츠 편집 시 등장인물을 구분하고 식별하는 작업은 많은 시간과 노력이 요구되는 작업이다. 노동 집약적 특성이 있는 미디어 편집 작업 시 인공지능 기술을 활용하면 미디어 제작 시간을 획기적으로 줄일 수 있어 창작과정의 효율성 향상에 도움을 줄 수 있다. 본 논문에서는 동영상 편집을 위한 인물 식별 및 검색 작업을 자동화하기 위해 다수의 인공지능 기술을 혼합하여 활용하는 기법을 제안한다. 객체 검출, 얼굴 검출, 자세 예측 기법을 사용하여 인물 객체에 대한 특징 정보를 수집하고, 수집된 정보를 바탕으로 얼굴 인식, 색 공간 분석 기법 등을 활용하여 인물 객체 식별 정보를 생성한다. 인물 특징 및 식별 정보는 편집 대상 영상의 각 프레임에 대해서 수집되며 영상 편집을 위한 프레임 단위 검색을 위한 메타데이터로 사용된다.

Keywords

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00804, Media production technology using learning based directing methods).

References

Q. Tang, B. Gu, and A. Whinston, "Content Contribution in Social Media: The Case of YouTube," Proceeding of 2012 45th Hawaii International Conference on System Sciences, Maui, HI, USA, pp. 4476-4485, 2012. doi: https://doi.org/10.1109/HICSS.2012.181
T. Soe, "AI video editing tools. What editors want and how far is AI from delivering?" arXiv:2109.07809 [cs.HC], pp. 1-7, 2021. doi: https://doi.org/10.48550/arXiv.2109.07809
L. Jiao et al., "New Generation Deep Learning for Video Object Detection: A Survey," IEEE Transactions on Neural Networks and Learning Systems (Early Access), pp.1-21, Feb. 2021. doi: https://doi.org/10.1109/TNNLS.2021.3053249
Y. Feng, S. Yu, H. Peng, Y. -R. Li, and J. Zhang, "Detect Faces Efficiently: A Survey and Evaluations," IEEE Transactions on Biometrics, Behavior, and Identity Science, Vol.4, No.1, pp.1-18, Jan. 2022. doi: https://doi.org/10.1109/TBIOM.2021.3120412
I. Masi, Y. Wu, T. Hassner, and P. Natarajan, "Deep Face Recognition: A Survey," Proceeding of 22018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Parana, Brazil, pp. 471-478, 2018. doi: https://doi.org/10.1109/SIBGRAPI.2018.00067
G. -S. Hsu and C. -H. Tang, "Dual-View Normalization for Face Recognition," IEEE Access, Vol.8, pp.147765-147775, July 2020. doi: https://doi.org/10.1109/ACCESS.2020.3014877
T. L. Munea, Y. Z. Jembre, H. T. Weldegebriel, L. Chen, C. Huang, and C. Yang, "The Progress of Human Pose Estimation: A Survey and Taxonomy of Models Applied in 2D Human Pose Estimation," IEEE Access, Vol.8, pp.133330-133348, July 2020. doi: https://doi.org/10.1109/ACCESS.2020.3010248
S. Du and S. Wang, "An Overview of Correlation-Filter-Based Object Tracking," IEEE Transactions on Computational Social Systems, Vol.9, No.1, pp.18-31, Feb. 2022. doi: https://doi.org/10.1109/TCSS.2021.3093298
A. Gautam and S. Singh, "Trends in Video Object Tracking in Surveillance: A Survey," Proceeding of 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, pp. 729-733, 2019. doi: https://doi.org/10.1109/I-SMAC47947.2019.9032529
C. -Y. Wang, I-H. Yeh, and H. -Y. Liao, "You Only Learn One Representation: Unified Network for Multiple Tasks," arXiv:2105.04206 [cs.CV], pp. 1-11, 2021. doi: https://doi.org/10.48550/arXiv.2105.04206
J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, "RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild," Proceeding of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 5202-5211, 2020. doi: https://doi.org/10.1109/CVPR42600.2020.00525
V. Bazarevsky, I. Grishchenko, K. Raveendran, T. Zhu, F. Zhang, and M. Grundmann, "BlazePose: On-device Real-time Body Pose tracking," arXiv:2006.10204 [cs.CV], pp. 1-4, 2020. doi: https://doi.org/10.48550/arXiv.2006.10204
J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," Proceeding of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 4685-4694, 2019. doi: https://doi.org/10.1109/CVPR.2019.00482