Trends on Broadcasting Content Analysis Techniques for Smart Broadcasting Service

스마트 방송서비스를 위한 방송콘텐츠 분석 기술동향

  • 손정우 (스마트미디어플랫폼연구실) ;
  • 김선중 (스마트미디어플랫폼연구실)
  • Published : 2016.06.01


스마트TV, 스마트폰, 태블릿 컴퓨터 등 다양한 스마트 기기의 이용이 급격하게 확산됨에 따라 방송콘텐츠의 소비패턴 또한 변화하고 있다. 시청자는 더 이상 방송콘텐츠를 시청하기 위해 TV 앞에 앉아 기다리지 않으며, 관심 있는 콘텐츠를 추천받거나 선택하는 방법을 적극적으로 이용하고자 한다. 이러한 소비패턴의 변화는 새로운 방송서비스에 대한 요구사항의 형태로 나타나고 있다. 스마트 방송서비스는 이와 같이 변화된 시청자 소비패턴에 대응하기 위한 새로운 방식의 콘텐츠 전달 서비스로, 스마트 방송서비스 실현을 위해서는 다양한 기술의 개발 및 적용이 요구된다. 본고에서는 스마트 방송서비스를 제공하는 데 필요한 기술 중 방송콘텐츠 분석 기술에 대한 연구동향을 살펴보고, 더불어 한국전자통신연구원에서 개발하고 있는 방송콘텐츠 분석 기술에 대해 소개하고자 한다.



  1. 정보통신정책연구원, "2014년 방송매체 이용행태 조사 보고서," 2014. 12.
  2. J. Breese, D. Heckerman, and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," Proc. 14th Conf. Uncertainty in Artificial Intelligence, 1998, pp. 43-52.
  3. D. park, H. Kim, I. Choi, and J. Kim, "A Literature Review and Classification of Recommender Systems Research," Expert Systems with Applications, vol. 38, no. 11, 2012, pp. 10059-10072.
  4. M. Fabro and L. Boszormenyi, "State-of-the-art and Future Challenges in Video Scene Detection: A Survey," Multimedia Systems, vol. 19, no. 5, 2013, pp. 427-454.
  5. B. Clarkson, A. Pentland, and K. Mase, "Recognizing User Context via Wearable Sensors," Proc. IEEE Inter. Symp. Wearable Computers, Oct. 2000, p. 69.
  6. NIST, "TREC Video Retrieval Evaluation: TRECVID,"
  7. A. Smeaton, P. Over, and A. Doherty, "Video Shot Boundary Detection: Seven Years of TRECVid Activity," Computer Vision Image Understanding, vol. 114, no. 4, 2010, pp. 411-418.
  8. J. Yuan et al., "A Formal Study on Shot Boundary Detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no.2, Feb. 2007, pp. 168-186.
  9. P. Over et al., "TRECVID 2007-Overview," July 31st, 2014, pp. 1-27,
  10. E. Apostolidis and V. Mezaris, "Fast Shot Segmentation Combining Global and Local Visual Descriptors," Proc. IEEE Inter. Conf. Acoustic, Speech and Signal Processing, 2014, pp. 6583-6587.
  11. R. Hannane et al., "An Efficient Method for Video Shot Boundary Detection and Keyframe Extraction using SIFT-point Distribution Histogram," Inter. J. Multimedia Information Retrieval, Mar. 16th, 2016, pp. 1-16.
  12. J. Monaco, "How to Read a Film: The World of Movies, Media, Multimedia: Language, History, Theory," Oxford University Press, 2000.
  13. E. Katz, F. Klein, and R.D. Nolen, "The Film Encyclopedia," Harper perennial, 1998.
  14. M. Fabro and L. Boszormenyu, "State-of-the-Art and Future Challenges in Video Scene Detection: a Survey," Multidedia Systems, vol. 19, no. 5, 2013, pp. 427-454.
  15. J. Huang, Z. Liu, and W. Yao, "Integration of Audio and Visual Information for Content-based Video Segmentation," Proc. Inter. Conf. Image Processing, vol. 3, 1998, pp. 526-529.
  16. J.R. Kender and B.-L. Yeo, "Video Scene Segmentation via Continuous Video Coherence," Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 1998, pp. 367-373.
  17. V.T. Chasanis, A.C. Likas, and N.P. Galatsanos, "Scene Detection in Videos Using Shot Clustering and Sequence Alignment," IEEE Transactions on Multimedia, vol. 11, no. 1, 2009, pp. 89-100.
  18. F. Gers, J. Schmidhuber, and F. Cummins, "Learning to Forget: Continual Prediction with LSTM," Neural Computation, vol. 12, no. 10, 2000, pp. 2451-2471.
  19. A. Graves, J. Schmidhuber, "Framewise Phoneme Classification with bidirectional LSTM and other neural network architectures," Neural Networks, vol. 18, no. 5-6, 2005, pp. 602-610.
  20. A. Kumar and H. Daume III, "A Co-training Approach for Multi-view Spectral Clustering," Proc. Inter. Conf. Machine Learning, 2011.
  21. ImageNet Large Scale Visual Recognition Challenge,
  22. A. Krizhevsky, I. Sutskever, and G.E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems 25, 2012, pp. 1097-1105.
  23. M. Tapaswi, M. Bauml, and R. Stiefelhagen, "Knock! Knock! Who is it? Probabilistic Person Identification in TV-Series," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp. 2658-2665.
  24. J.W. Son, A. Lee, and S.J. Kim, "Knowledge Construction for the Broadcasting Content by Using Audience Oriented Data," Proc. IEEE/WIC/ACM Inter. Conf. Web Intelligence and Intelligent Agent Technology, 2015, pp. 89-92.
  25. T. Masuda et al., "Video Scene Retrieval Using Online Video Annotation," New Fromtiers in Aritificial Intelligence, vol. 4914, 2008, pp. 54-62.
  26. S.A. Bhat et al., "Overview of Existing Content Based Video Retrieval Systems," Inter. J. Advanced Engineering and Global Technology, vol. 2, no. 2, 2014, pp. 476-483.
  27. A. Francois et al., "VERL: An Ontology Framework for Representing and Annotating Video Events," IEEE Multi-Media, vol. 12, no. 4, Oct.-Dec. 2005, pp. 76-86.
  28. V. Ramanathan, P. Liang, and L. Fei, " Video Event Understanding Using Natural Language Descriptions," Proc. IEEE Inter. Conf. Computer Vision, Dec. 2013, pp. 905-912.
  29. D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, 2003, pp. 993-1022.
  30. P. Das et al., "A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2013, pp. 2634-2641.