Structural similarity based efficient keyframes extraction from multi-view videos

구조적인 유사성에 기반한 다중 뷰 비디오의 효율적인 키프레임 추출

  • Received : 2018.09.19
  • Accepted : 2018.12.13
  • Published : 2018.12.31

Abstract

Salient information extraction from multi-view videos is a very challenging area because of inter-view, intra-view correlations, and computational complexity. There are several techniques developed for keyframes extraction from multi-view videos with very high computational complexities. In this paper, we present a keyframes extraction approach from multi-view videos using entropy and complexity information present inside frame. In first step, we extract representative shots of the whole video from each view based on structural similarity index measurement (SSIM) difference value between frames. In second step, entropy and complexity scores for all frames of shots in different views are computed. Finally, the frames with highest entropy and complexity scores are considered as keyframes. The proposed system is subjectively evaluated on available office benchmark dataset and the results are convenient in terms of accuracy and time complexity.

다중 뷰 비디오로부터 두드러진 정보 추출은 인터뷰, 인트라 뷰간 상관관계와 계산 비용 때문에 매우 어려운 영역입니다. 매우 높은 계산 복잡성을 지닌 멀티 뷰 비디오에서 키프레임을 추출하기 위해 개발된 몇 가지 기술이 있습니다. 이 논문에서, 우리는 내부에 존재하는 엔트로피와 복잡한 정보를 사용하여 멀티 뷰 비디오의 키프레임 추출 접근 방식을 제시합니다. 첫 번째 단계에서는 프레임 사이의 SSIM값을 기반으로 각 보기에서 전체 비디오의 대표 샷을 추출합니다. 두 번째 단계에서는 서로 다른 보기의 모든 샷 프레임에 대한 엔트로피와 복잡성 점수가 계산됩니다. 마지막으로 엔트로피와 복잡성 점수가 가장 높은 프레임은 키 프레임으로 간주됩니다. 제안된 시스템은 사용 가능한 Office벤치마크 데이터 세에서 주관적으로 평가되며, 정확성과 시간 복잡성의 측면에서 결과는 편리합니다.

Keywords

Acknowledgement

Supported by : National Research Foundation of Korea (NRF)

References

  1. K. Muhammad, T. Hussain, and S. W. Baik, "Efficient CNN based summarization of surveillance videos for resource-constrained devices," Pattern Recognition Letters, 2018.
  2. K. Pitstick, J. Hansen, M. Klein, E. Morris, and J. Vazquez-Trejo, "Applying video summarization to aerial surveillance," in SPIE Defense+Security, p.10, 2018.
  3. M. Paul and M. M. Salehin, "Spatial and Motion Saliency Prediction Method using Eye Tracker Data for Video Summarization," IEEE Transactions on Circuits and Systems for Video Technology, pp. 1-1, 2018.
  4. Z. Ji, Y. Su, R. Qian, and J. Ma, "Surveillance video summarization based on moving object detection and trajectory extraction," in Signal Processing Systems (ICSPS), 2010 2nd International Conference on, 2010, pp. V2-250-V2-253.
  5. U. Damnjanovic, V. Fernandez, E. Izquierdo, and J. M. Martinez, "Event detection and clustering for surveillance video summarization," in Imag Analysis for Multimedia Interactive Services, 2008. WIAMIS'08. Ninth International Workshop on, pp. 63-66, 2008.
  6. M. Ajmal, M. H. Ashraf, M. Shakir, Y. Abbas, and F. A. Shah, "Video summarization: techniques and classification," in International Conference on Computer Vision and Graphics, pp. 1-13, 2012.
  7. Y. Fu, Y. Guo, Y. Zhu, F. Liu, C. Song, and Z.-H. Zhou, "Multi-view video summarization," IEEE Transactions on Multimedia, vol. 12, pp. 717-729, 2010. https://doi.org/10.1109/TMM.2010.2052025
  8. R. Panda, A. Dasy, and A. K. Roy-Chowdhury, "Video summarization in a multi-view camera network," in Pattern Recognition (ICPR), 2016 23rd International Conference on, pp. 2971-2976, 2016.
  9. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, pp. 675-678, 2014.
  10. Y. Li and B. Merialdo, "Multi-video summarization based on Video-MMR," in Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on, pp. 1-4, 2010.
  11. Y.-G. Jiang and C.-W. Ngo, "Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval," Computer Vision and Image Understanding, vol. 113, pp. 405-414, 2009. https://doi.org/10.1016/j.cviu.2008.10.002
  12. M. Sajjad, S. Khan, T. Hussain, K. Muhammad, A. K. Sangaiah, A. Castiglione, et al., "CNN-based anti-spoofing two-tier multi-factor authentication system," Pattern Recognition Letters, 2018.
  13. A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. Baik, Action Recognition in Video Sequences using Deep Bi-directional LSTM with CNN Features vol. PP, 2017.
  14. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725-1732, 2014.
  15. M. Sajjad, M. Nasir, K. Muhammad, S. Khan, Z. Jan, A. Kumar Sangaiah, et al., Raspberry Pi assisted face recognition framework for enhanced law-enforcement services in smart cities, 2017.
  16. M. Sajjad, M. Nasir, F. U. M. Ullah, K. Muhammad, A. K. Sangaiah, and S. W. Baik, "Raspberry Pi assisted facial expression recognition framework for smart security in law-enforcement services," Information Sciences, 2018.