DOI QR코드

DOI QR Code

Using the fusion of spatial and temporal features for malicious video classification

공간과 시간적 특징 융합 기반 유해 비디오 분류에 관한 연구

  • 전재현 (한국과학기술원 전기 및 전자공학과) ;
  • 김세민 (한국과학기술원 정보통신공학과) ;
  • 한승완 (한국전자통신연구원) ;
  • 노용만 (한국과학기술원 전기 및 전자공학과)
  • Received : 2011.01.31
  • Accepted : 2011.07.05
  • Published : 2011.12.31

Abstract

Recently, malicious video classification and filtering techniques are of practical interest as ones can easily access to malicious multimedia contents through the Internet, IPTV, online social network, and etc. Considerable research efforts have been made to developing malicious video classification and filtering systems. However, the malicious video classification and filtering is not still being from mature in terms of reliable classification/filtering performance. In particular, the most of conventional approaches have been limited to using only the spatial features (such as a ratio of skin regions and bag of visual words) for the purpose of malicious image classification. Hence, previous approaches have been restricted to achieving acceptable classification and filtering performance. In order to overcome the aforementioned limitation, we propose new malicious video classification framework that takes advantage of using both the spatial and temporal features that are readily extracted from a sequence of video frames. In particular, we develop the effective temporal features based on the motion periodicity feature and temporal correlation. In addition, to exploit the best data fusion approach aiming to combine the spatial and temporal features, the representative data fusion approaches are applied to the proposed framework. To demonstrate the effectiveness of our method, we collect 200 sexual intercourse videos and 200 non-sexual intercourse videos. Experimental results show that the proposed method increases 3.75% (from 92.25% to 96%) for classification of sexual intercourse video in terms of accuracy. Further, based on our experimental results, feature-level fusion approach (for fusing spatial and temporal features) is found to achieve the best classification accuracy.

최근 인터넷, IPTV/SMART TV, 소셜 네트워크 (social network)와 같은 정보 유통 채널의 다양화로 유해 비디오 분류 및 차단 기술 연구에 대한 요구가 높아가고 있으나, 현재까지는 비디오에 대한 유해성을 판단하는 연구는 부족한 실정이다. 기존 유해 이미지 분류 연구에서는 이미지에서의 피부 영역의 비율이나 Bag of Visual Words (BoVW)와 같은 공간적 특징들 (spatial features)을 이용하고 있다. 그러나, 비디오에서는 공간적 특징 이외에도 모션 반복성 특징이나 시간적 상관성 (temporal correlation)과 같은 시간적 특징들 (temporal features)을 추가적으로 이용하여 유해성을 판단할 수 있다. 기존의 유해 비디오 분류 연구에서는 공간적 특징과 시간적 특징들에서 하나의 특징만을 사용하거나 두 개의 특징들을 단순히 결정 단계에서 데이터 융합하여 사용하고 있다. 일반적으로 결정 단계 데이터 융합 방법은 특징 단계 데이터 융합 방법보다 높은 성능을 가지지 못한다. 본 논문에서는 기존의 유해 비디오 분류 연구에서 사용되고 있는 공간적 특징과 시간적 특징들을 특징 단계 융합 방법을 이용하여 융합하여 유해 비디오를 분류하는 방법을 제안한다. 실험에서는 사용되는 특징이 늘어남에 따른 분류 성능 변화와 데이터 융합 방법의 변화에 따른 분류 성능 변화를 보였다. 공간적 특징만을 이용하였을 때에는 92.25%의 유해 비디오 분류 성능을 보이는데 반해, 모션 반복성 특징을 이용하고 특징 단계 데이터 융합 방법을 이용하게 되면 96%의 향상된 분류 성능을 보였다.

Keywords

References

  1. N. Rea, G. Lacey, C. Lambe, and R. Dahyot, "Multimodal periodicity analysis for illicit content detection in videos," The 3rd European Conference on Visual Media Production (CVMP 2006), pp.106-114, 2006.
  2. C. Y. Kim, O. J. Kwon, W. G. Kim, and S. R. Choi, "Automatic system for filtering obscene video," The 10th International Conference on Advanced Communication Technology (ICACT 2008), pp.1435-1438, 2008.
  3. Z. Qu, L. Ren, A. Guo, and J. Yu, "Implementation of pornographic videos detection system," 2nd International Congress on Image and Signal Processing (CISP 2009), pp.1-4, 2009.
  4. A. P. B. Lopes, S. E. F. de Avila, A. N. A. Peixoto, R. S. Oliveira, M. de M. Coelho, and A. de A. Araujo, "Nude detection in video using bag-of-visual-features," XXII Brazilian Symposium on Computer Graphics and Image Processing, pp.224-231, 2009.
  5. Z. Y. Qu, Y. Liu, Y. M. Liu, and L. N. Zhang, "A pornographic videos detection method based on optical flow direction's statistical histogram," International Symposium on Computer Network and Multimedia Technology (CNMT 2009), pp.1-4, 2009.
  6. Q. Zhiyi, L. Yanmin, L. Ying, J. Kang, and C. Yong, "A method for reciprocating motion detection in porn video based on motion features," 2nd IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT '09), pp.183-187, 2009.
  7. Z. QU, Y. Liu, Y. Liu, K. Jiu, and Yo. Chen, "A porn video detection method based on motion features using HMM," Second International Symposium on computational Intelligence and Design (ISCID '09), pp.461-464, 2009.
  8. C. Jansohn, A. Ulges, and T. M. Breuel, "Detecting pornographic video content by combining image features with motion information," ACM Multimedia 2009, pp.601-604, 2009.
  9. S. M. Lee, H. G. Lee, and T. K. Nam, "A comparative study of the objectionable video classification approaches using single and group frame features," The 16th International Conference on Artificial Neural Networks (ICANN 2006), pp.613-623, 2006.
  10. H. G. Lee, S. M. Lee and T. K. Nam, "Implementation of high performance objectionable video classification system," International Conference on Advanced Communication Technology (ICACT 2006), pp.959-962, 2006.
  11. S. M. Lee, W. C. Shim, and S. H. Kim, "Hierarchical system for objectionable video detection," IEEE Transactions on Consumer Electronics, Vol.55, No.2, pp.677-684, May, 2009. https://doi.org/10.1109/TCE.2009.5174439
  12. S. M. Kim, H. S. Min, J. H. Jeon, Y. M. Ro, and S. W. Han, "Malicious content filtering based on semantic features," The ACM International Conference Proceeding 2009, pp.802-806, 2009.
  13. J. H. Jeon, S. M. Kim, J. Y. Choi, H. S. Min, and Y. M. Ro, "Semantic detection of adult image using semantic features," The 4th International Conference on Multimedia and Ubiquitous Engineering (MUE 2010), pp.1-4, 2010.
  14. J. Yang, J. Y. Yang, D. Zhang, and J. F. Lu, "Feature fusion: parallel strategy vs. serial strategy," Pattern Recognition, Vol.36, Issue 6, pp.1369-1381, June, 2003. https://doi.org/10.1016/S0031-3203(02)00262-5
  15. X. Zhou, and B. Bhanu, "Integrating face and gait for human recognition at a distance in video," IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, Vol.37, No.5, pp.1119-1137, Oct., 2007. https://doi.org/10.1109/TSMCB.2006.889612
  16. X. Zhou, and B. Bhanu, "Feature fusion of side face and gait for video-based human identification," Pattern Recognition Vol.41, Issue 3, pp.778-795, Mar., 2008. https://doi.org/10.1016/j.patcog.2007.06.019
  17. H. T. Lin, C. J. Lin, and R. C. Weng, "A note on platt's probabilistic outputs for support vector machines," Dept. Comp. Sci., National Taiwan Univ., 2003 ([online]. Available: http://www.csie.ntu.deu/tw/-cjlin/papers/plattprob.ps,Tech. Rep)
  18. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-based image retrieval at the end of the early years," IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.22, No.12, pp.1349-1380, Dec., 2000. https://doi.org/10.1109/34.895972
  19. S. J. Yang, S. K. Kim, and Y. M. Ro, "Semantic home photo categorization," IEEE Transactions on Circuits and System for Video Technology, Vol.17, No.3, pp.324-335, Mar., 2007. https://doi.org/10.1109/TCSVT.2007.890829
  20. B. Li, J. H. Errico, H. Pan, I. Sezan, "Bridging the semantic gap in sports video retrieval and summarization," Journal of Visual Communication and Image Representation, Vol.15, Issue 3, pp.393-424, Sep., 2004. https://doi.org/10.1016/j.jvcir.2004.04.006