Browse > Article
http://dx.doi.org/10.13088/jiis.2015.21.1.83

Annotation Method based on Face Area for Efficient Interactive Video Authoring  

Yoon, Ui Nyoung (Department of Computer Science and Information Engineering, Inha University)
Ga, Myeong Hyeon (Department of Computer Science and Information Engineering, Inha University)
Jo, Geun-Sik (Department of Computer Science and Information Engineering, Inha University)
Publication Information
Journal of Intelligence and Information Systems / v.21, no.1, 2015 , pp. 83-98 More about this Journal
Abstract
Many TV viewers use mainly portal sites in order to retrieve information related to broadcast while watching TV. However retrieving information that people wanted needs a lot of time to retrieve the information because current internet presents too much information which is not required. Consequentially, this process can't satisfy users who want to consume information immediately. Interactive video is being actively investigated to solve this problem. An interactive video provides clickable objects, areas or hotspots to interact with users. When users click object on the interactive video, they can see additional information, related to video, instantly. The following shows the three basic procedures to make an interactive video using interactive video authoring tool: (1) Create an augmented object; (2) Set an object's area and time to be displayed on the video; (3) Set an interactive action which is related to pages or hyperlink; However users who use existing authoring tools such as Popcorn Maker and Zentrick spend a lot of time in step (2). If users use wireWAX then they can save sufficient time to set object's location and time to be displayed because wireWAX uses vision based annotation method. But they need to wait for time to detect and track object. Therefore, it is required to reduce the process time in step (2) using benefits of manual annotation method and vision-based annotation method effectively. This paper proposes a novel annotation method allows annotator to easily annotate based on face area. For proposing new annotation method, this paper presents two steps: pre-processing step and annotation step. The pre-processing is necessary because system detects shots for users who want to find contents of video easily. Pre-processing step is as follow: 1) Extract shots using color histogram based shot boundary detection method from frames of video; 2) Make shot clusters using similarities of shots and aligns as shot sequences; and 3) Detect and track faces from all shots of shot sequence metadata and save into the shot sequence metadata with each shot. After pre-processing, user can annotates object as follow: 1) Annotator selects a shot sequence, and then selects keyframe of shot in the shot sequence; 2) Annotator annotates objects on the relative position of the actor's face on the selected keyframe. Then same objects will be annotated automatically until the end of shot sequence which has detected face area; and 3) User assigns additional information to the annotated object. In addition, this paper designs the feedback model in order to compensate the defects which are wrong aligned shots, wrong detected faces problem and inaccurate location problem might occur after object annotation. Furthermore, users can use interpolation method to interpolate position of objects which is deleted by feedback. After feedback user can save annotated object data to the interactive object metadata. Finally, this paper shows interactive video authoring system implemented for verifying performance of proposed annotation method which uses presented models. In the experiment presents analysis of object annotation time, and user evaluation. First, result of object annotation average time shows our proposed tool is 2 times faster than existing authoring tools for object annotation. Sometimes, annotation time of proposed tool took longer than existing authoring tools, because wrong shots are detected in the pre-processing. The usefulness and convenience of the system were measured through the user evaluation which was aimed at users who have experienced in interactive video authoring system. Recruited 19 experts evaluates of 11 questions which is out of CSUQ(Computer System Usability Questionnaire). CSUQ is designed by IBM for evaluating system. Through the user evaluation, showed that proposed tool is useful for authoring interactive video than about 10% of the other interactive video authoring systems.
Keywords
Interactive Video; Authoring Tool; Annotation; Shot Sequence Alignment;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Chasanis, V. T., C. L. Likas, and N. P. Galatsanos, "Scene Detection in Videos Using Shot Clustering and Sequence Alignment," IEEE Transactions on Multimedia, Vol.11, No.1 (2009), 89-100.   DOI
2 Froba, B., A. Ernst, "Face Detection with the Modified Census Transform," Proceedings of Sixth IEEE International Conference on Automatic Face and Gesture Recognition, (2004), 91-96.
3 Lewis, J. R., "IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use," International Journal of Human-Computer Interaction, Vol.7, No.1 (1995), 57-78.   DOI
4 Lienhart, R., "Comparison of automatic shot boundary detection algorithms," Proceedings of the SPIE Conference, Vol.3656(1998), 290-301.
5 Lee, K.-S., A. N. Rosli, I. A. Supandi, and G.-S. Jo, "Dynamic sampling-based interpolation algorithm for representation of clickable moving object in collaborative video annotation," Neurocomputing, Vol.146(2014), 291-300.   DOI
6 Lee, K. A., C. H. You, H. Li, T. Kinnunen, and K. C. Sim, "Using Discrete Probabilities With Bhattacharyya Measure for SVM-Based Speaker Verification," IEEE Transactions on Audio, Speech, and Language Processing, Vol.19, No.4(2011), 861-870.   DOI
7 Lin, T. T. C., "Convergence and Regulation of Multi-Screen Television : The Singapore Experience," Telecommunications Policy, Vol.37, No.8(2013), 673-685.   DOI
8 Lucas, B. D., T. Kanade, "An iterative image registration technique with an application to stereo vision," Proceedings of the 7th International Joint Conference on Artificial Intelligence, (1981), 674-679.
9 Miller, G., S. Fels, M. Ilich, M. M. Finke, T. Bauer, K. Wong, and S. Mueller, "An End-to-End Framework for Multi-View Video Content: Creating Multiple-Perspective Hypervideo to View On Mobile Platforms," Proceedings of 10th International Conference on Entertainment Computing, (2011), 337-342.
10 Nielsen, Digital Consumer Report, 2014. Available at http://www.nielsen.com/us/en/reports.html(Accessed 13 November, 2014).
11 Mozilla, Popcorn Maker. Available at https://popcorn.webmaker.org(Accessed 13 November, 2014).
12 Rui, Y., T. S. Huang, and S. Mehrotra, "Constructing Table-of-Content for Videos," Multimedia Systems, Vol.7, No.5(1999), 359-368.   DOI
13 Swiki, Frontal Face Haar Cascade, Available at http://alereimondo.no-ip.org/OpenCV (Accessed 13 November, 2014).
14 Viola, P., and M. Jones, "Rapid object detection using a boosted cascade of simple features," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2001), 511-518.
15 wireWAX. Available at http://wirewax.com(Accessed 13 November, 2014).
16 Yoon, U. N., K. S. Lee, and G. S. Jo, "Interactive Video Annotation System based on Face Area," Korea Computer Congress, (2014), 755-757.
17 Zentrick. Available at https://www.zentrick.com (Accessed 13 November, 2014).