Natural Photography Generation with Text Guidance from Spherical Panorama Image |
Kim, Beomseok
(POSTECH)
Jung, Jinwoong (POSTECH) Hong, Eunbin (POSTECH) Cho, Sunghyun (DGIST) Lee, Seungyong (POSTECH) |
1 | R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587. |
2 | R. Girshick, "Fast r-cnn," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440-1448. |
3 | J. Dai, "R-FCN: Object detection via region-based fully convolutional networks," arXiv preprint arXiv:1605.06409, 2016. |
4 | J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2625-2634. |
5 | R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell, "Natural language object retrieval," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4555-4564. |
6 | J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, "Selective search for object recognition," International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, 2013. DOI |
7 | L. Liu, R. Chen, L. Wolf, and D. Cohen-Or, "Optimizing photo composition," in Computer Graphics Forum, vol. 29, no. 2. Wiley Online Library, 2010, pp. 469-478. DOI |
8 | C. L. Zitnick and P. Dollar, "Edge boxes: Locating object proposals from edges," in European Conference on Computer Vision. Springer, 2014, pp. 391-405. |
9 | S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in Neural Information Processing Systems, 2015, pp. 91-99. |
10 | O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3156-3164. |
11 | K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. |
12 | K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention," in International Conference on Machine Learning, 2015, pp. 2048-2057. |
13 | J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, "Deep captioning with multimodal recurrent neural networks (m-rnn)," arXiv preprint arXiv:1412.6632, 2014. |
14 | S. Li, T. Xiao, H. Li, B. Zhou, D. Yue, and X. Wang, "Person search with natural language description," arXiv preprint arXiv:1702.05729, 2017. |
15 | M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. Torr, "Bing: Binarized normed gradients for objectness estimation at 300fps," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3286-3293. |
16 | O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., "Imagenet large scale visual recognition challenge," International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015. DOI |
17 | T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft coco: Common objects in context," in European Conference on Computer Vision. Springer, 2014, pp. 740-755. |
18 | S. Kazemzadeh, V. Ordonez, M. Matten, and T. L. Berg, "Referitgame: Referring to objects in photographs of natural scenes." in EMNLP, 2014, pp. 787-798. |
19 | J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba, "Recognizing scene viewpoint using panoramic place representation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2695-2702. |