1 |
J. Mao, J. Huang, A. Toshev, O. Camburu, A. L. Yuille, and K. Murphy, "Generation and Comprehension of Unambiguous Object Descriptions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp.11-20, 2016.
|
2 |
R. Luo and G. Shakhnarovich, "Comprehension-Guided Referring Expressions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2017.
|
3 |
V. K. Nagaraja, V. I. Morariu, and L. S. Davis, "Modeling Context Between Objects for Referring Expression Understanding," Proceedings of the European Conference on Computer Vision(ECCV), 2016.
|
4 |
R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell, "Natural Language Object Retrieval," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4555-4564, 2016.
|
5 |
R. Hu, M. Rohrbach, J. Andreas, T. Darrell, and K. Saenko, "Modeling Relationships in Referential Expressions with Compositional Modular Networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1115-1124, 2017.
|
6 |
L. Yu, P. Porison, S. Yang, A. C. Berg, and T. L. Berg, "Modeling Context in Referring Expressions," Proceedings of the European Conference on Computer Vision(ECCV), pp.69-85, 2016.
|
7 |
S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Proceedings of the Neural Information Processing Systems(NIPS), pp.91-99, 2015.
|
8 |
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp.779-788, 2016.
|
9 |
W. Liu, D. Anguelow, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single Shot MultiBox Detector," Proceedings of the European Conference on Computer Vision(ECCV), pp.21-37, 2016.
|
10 |
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," Proceedings of the European Conference on Computer Vision(ECCV), pp.740-755, 2014.
|
11 |
L. Yu, H. Tan, M. Bansal, and T. L. Berg, "A Joint Speaker-Listener-Reinforcer Model for Referring Expressions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp.7282-7290, 2017.
|
12 |
J. Krishnamurthy and T. Kollar, "Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World," Proceedings of the Transactions of the Association for Computational Linguistics(TACL), Vol.1, pp.193-206, 2013.
|
13 |
J. Pennington, R. Socher, and C. Manning, "GloVe: Global Vectors for Word Representation," Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP), pp.1532-1543, 2014.
|