1 |
K. Zhang, Z. Zhang, Z. Li, S. Member, and Y. Qiao, “Joint Face Detection and Alignment Using Multitask Cascaded Convolution Networks,” Journal of IEEE Signal Processing Letters, Vol. 23, No. 10, pp. 1499-1503, 2016.
DOI
|
2 |
K.T. Kim and J.Y. Choi, “Development of Combined Architecture of Multiple Deep Convolutional Neural Networks for Improving Video Face Identification,” Journal of Korea Multimedia Society, Vol. 22, No. 6, pp. 655-664, 2019.
DOI
|
3 |
R. Grishick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
|
4 |
R. Girshick, "Fast R-CNN," Proceeding of IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
|
5 |
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, et al., "Describing Videos by Exploiting Temporal Structure," Proceeding of IEEE Conference International Conference on Computer Vision, 2015.
|
6 |
L. Gao, Z. Guo, H. Zhang, X. Xu, and H.T. Shen, “Video Captioning with Attention-Based LSTM and Semantic Consistency,” Proceeding of IEEE Transactions on Multimedia, Vol. 19, No. 9, pp. 2045-5055, 2017.
DOI
|
7 |
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, and C.Y. Fu, et al., "SSD: Single Shot Multibox Detector," Proceeding of European Conference on Computer Vision, pp. 21-37, 2016.
|
8 |
S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks," Proceeding of Conference on Neural Information Processing Systems, pp. 91-99, 2015.
|
9 |
K. He, G. Gkioxari, and P. Dollar, "Mask R-CNN," Proceeding of IEEE International Conference on Computer Vision, pp. 2961-2969, 2017.
|
10 |
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Realtime Object Detection," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
|
11 |
T.Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2980-2988, 2017.
|
12 |
T.Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, and J. Hays, et al., "Microsoft COCO: Common Objects in Context," Proceeding of the European Conference on Computer Vision, pp. 740-750, 2014.
|
13 |
K. Soomro, A.R. Zamir, and M. Shah, "UCF101: A Dataset of 101 Human Actions Classes From Videos in the Wild," arXiv Preprint arXiv:1212.0402, 2012.
|
14 |
J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263-7271, 2017.
|
15 |
J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv Preprint arXiv:1804.02767, 2018.
|
16 |
A. Bochkovskiy, C.Y. Wang, and H.Y.M. Liao, "YOLOv4: Optical Speed and Accuracy of Object Detection," arXiv Preprint arXiv:2004.10934, 2020.
|
17 |
Z. Shou, D. Wang, and S.F. Chang, "Temporal Action Localization in Untrimmed Videos via Multistage CNNs," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1049-1058, 2016.
|
18 |
Z. Shou, J. Chan, and S.F. Chang, "CDC: Convolutional De-convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos," arXiv Preprint arXiv:1703.01515, 2017.
|
19 |
S.F. Chang, T. Sikora, and A. Puri, “Overview of the MPEG-7 Standard,” Journal of IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 6, pp. 688-695, 2001.
DOI
|
20 |
Dublin Core(1995), https://www.dublincore.org/ (accessed July 1, 2020).
|
21 |
B. Zhou, A. Lapedriza A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 Million Image Database for Scene Recognition,” Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, No. 6, pp. 1452-1464, 2017.
|
22 |
V. Lombardo, R. Damiano, and A. Pizzo, "Drammar: A Comprehensive Ontological Resource on Drama," Proceeding of International Semantic Web Conference, pp. 103-118, 2018.
|
23 |
OntoMedia(2002), http://www.ontomedia.de/ (accessed July 1, 2020).
|
24 |
S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 1, pp. 221-231, 2013.
DOI
|
25 |
J.Y.H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, "Beyond Short Snippets: Deep Networks for Video Classification," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694-4702, 2015.
|
26 |
K. Wang, X. Long, R. Li, and L.J. Zhao, “A Discriminative Algorithm for Indoor Place Recognition Based on Clustering of Features and Images,” Journal of International Journal of Automation and Computing, Vol. 14, No. 4, pp. 407-419, 2017.
DOI
|
27 |
A. Hanni, S. Chickerur, and I. Bidari, "Deep Learning Framework for Scene Based Indoor Location Recognition," Proceeding of IEEE International Conference on Technological Advancements in Power and Energy, pp. 1-8, 2017.
|
28 |
J. Deng, W. Dong, R. Socher, L.J. Li, K. Li and L.F. Fei, "ImageNet: A Large-scale Hierarchical Image Database," Proceeding of Conference on Neural Information Processing
|
29 |
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko, "Sequence to Sequence: Video to Text," Proceeding of IEEE International Conference on Computer Vision, pp. 4534-4542, 2015.
|
30 |
Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui, "Jointly Modeling Embedding and Translation to Bridge Video and Language," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, 2016.
|
31 |
DarkLabel(2017), https://darkpgmr.tistory.com/16 (accessed July 1, 2020).
|
32 |
Advene(2002), http://folk.ntnu.no/heggland/ontolog-crawler/login.php (accessed July 1, 2020).
|
33 |
ELAN(2016), https://archive.mpi.nl/tla/elan (accessed July 1, 2020).
|
34 |
W.Y. Wong and P. Reimann, "Web Based Educational Video Teaching and Learning Platform with Collaborative Annotation," Proceeding of IEEE International Conference on Advanced Learning Technologies, pp. 696-700, 2009.
|
35 |
VoTT(2019), https://github.com/Microsoft/VoTT (accessed July 1, 2020).
|
36 |
VATIC(2012), https://github.com/cvondrick/vatic (accessed July 1, 2020).
|
37 |
Y. Wang, X. Ji, Z. Zhou, H. Wang, and Z. Li, "Detecting Faces Using Region-based Fully Convolutional Networks," arXiv Preprint arXiv: 1709.05256, 2017.
|
38 |
S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S.Z. Li, "S3FD: Single Shot Scale-invariant Face Detector," Proceeding of IEEE International Conference on Computer Vision, pp. 4203-4212, 2017.
|