DOI QR코드

DOI QR Code

Object Classification based on Weakly Supervised E2LSH and Saliency map Weighting

  • Zhao, Yongwei (China National Digital Switching System Engineering and Technological R&D Center) ;
  • Li, Bicheng (China National Digital Switching System Engineering and Technological R&D Center) ;
  • Liu, Xin (China National Digital Switching System Engineering and Technological R&D Center) ;
  • Ke, Shengcai (China National Digital Switching System Engineering and Technological R&D Center)
  • Received : 2015.07.10
  • Accepted : 2015.10.20
  • Published : 2016.01.31

Abstract

The most popular approach in object classification is based on the bag of visual-words model, which has several fundamental problems that restricting the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual words. In view of this, an object classification based on weakly supervised E2LSH and saliency map weighting is proposed. Firstly, E2LSH (Exact Euclidean Locality Sensitive Hashing) is employed to generate a group of weakly randomized visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of E2LSH. Secondly, graph-based visual saliency (GBVS) algorithm is applied to detect the saliency map of different images and weight the visual words according to the saliency prior. Finally, saliency map weighted visual language model is carried out to accomplish object classification. Experimental results datasets of Pascal 2007 and Caltech-256 indicate that the distinguishability of objects is effectively improved and our method is superior to the state-of-the-art object classification methods.

Keywords

References

  1. J. Sivic, A. Zisserman. "Video Google: a text retrieval approach to object matching in videos," in Proc. of 9th IEEE International Conference on Computer Vision, pp. 1470-1477, October 13-16, 2003. Article (CrossRef Link).
  2. H. Jegou, M. Douze, C. Schmid. "Packing bag-of features," in Proc. of IEEE 12th International Conference on Computer Vision, pp. 2357-2364, September 29-October 2, 2009. Article (CrossRef Link).
  3. Y. Z. Chen, A. Dick, X. Li, et al. “Spatially aware feature selection and weighting for object retrieval,” Image and Vision Computing, vol. 31, no. 6, pp. 935–948, December, 2013. Article (CrossRef Link). https://doi.org/10.1016/j.imavis.2013.09.005
  4. J. Y. Wang, H. Bensmail, X. Gao. “Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification,” Pattern Recognition, vol. 46, no. 3, pp. 3249-3255, June, 2013. Article (CrossRef Link). https://doi.org/10.1016/j.patcog.2013.05.001
  5. O. A. B. Penatti, F. B. Silva, Eduardo Valle, et al. “Visual word spatial arrangement for image retrieval and classification,” Pattern Recognition, vol. 47, no. 1, pp. 705-720, June, 2014. Article (CrossRef Link). https://doi.org/10.1016/j.patcog.2013.08.012
  6. D. G. Lowe. “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, April, 2004. Article (CrossRef Link). https://doi.org/10.1023/B:VISI.0000029664.99615.94
  7. J. C. Van Gemert, C. J. Veenman, A. W. M. Smeulders, et al. “Visual word ambiguity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, no. 32, pp. 1271-1283, July, 2010. Article (CrossRef Link). https://doi.org/10.1109/TPAMI.2009.132
  8. Raphaël Marée, Philippe Denis, Louis Wehenkel, et al. "Incremental indexing and distributed image search using shared randomized dictionaries," in Proc. of MIR 2010, pp. 91-100, May 05-07, 2010. Article (CrossRef Link).
  9. D. Nister, H. Stewenius. Scalable recognition with a vocabulary tree[C], in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2161-2168June . 17-22, 2006. Article (CrossRef Link).
  10. J. Philbin, O. Chum, M. Isard, et a1. "Object retrieval with large vocabularies and fast spatial matching," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, June 17-22, 2007. Article (CrossRef Link).
  11. R. J. Zhang, F.S Wei, B. C. Li. “E2LSH based Multiple Kernel Learning Approach for Object Detection,” Neurocomputing, vol. 124, no. 1, pp. 105-110, March, 2014. Article (CrossRef Link). https://doi.org/10.1016/j.neucom.2013.07.027
  12. Q. Zheng, W. Gao. “Constructing visual phrases for effective and efficient object-based image retrieval,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 5, no. 1, pp. 1-19, May, 2008. Article (CrossRef Link). https://doi.org/10.1145/1404880.1404887
  13. T. Chen, K. H. Yap and D.J. Zhang. “Discriminative soft bag-of-visual phrase for mobile landmark recognition,” IEEE Transactions on Multimedia, vol. 16, no. 3, pp. 612-622. April, 2014. Article (CrossRef Link). https://doi.org/10.1109/TMM.2014.2301978
  14. J. Philbin, O. Chum, M. Isard, et al. "Lost in quantization: Improving particular object retrieval in large scale image databases," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8. June 23-28. 2008. Article (CrossRef Link).
  15. W. Jing-yan, L. Yong-ping, Z. Ying, et a1. “Bag-of-features based medical image retrieval via multiple assignment and visual words weighting,” IEEE Transactions on Medical Imaging, vol. 30, no. 11, pp. 1996-2011, November, 2011. Article (CrossRef Link). https://doi.org/10.1109/TMI.2011.2161673
  16. S. Lazebnik, C. Schmid, J. Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169-2178. October 21-26. 2006. Article (CrossRef Link).
  17. G. Sharma, F. Jurie. "Learning discriminative spatial representation for image classification," in Proc. of the 22nd British Machine Vision Conference, pp. 1-11. July 08-11, 2011. Article (CrossRef Link).
  18. L. Xie, Q. Tian, B. Zhang. “Spatial Pooling of Heterogeneous Features for Image Classification,” IEEE Transactions on Image Processing, vol. 23, no. 5, pp. 1994-2008, May, 2014. Article (CrossRef Link). https://doi.org/10.1109/TIP.2014.2310117
  19. Wu Lei, Li Ming, Li Z, et al. "Visual language modeling for image classification," in Proc. of the International Workshop on Workshop on Multimedia Information Retrieval. pp. 115-124. June14-17, 2007. Article (CrossRef Link).
  20. Wu Lei, Hu Y, Li M, et al. “Scale-Invariant visual language modeling for object categorization,” IEEE Transactions on Multimedia, vol. 11, no. 2, pp. 286-294, February, 2009. Article (CrossRef Link). https://doi.org/10.1109/TMM.2008.2009692
  21. S. Nakamoto and T. Toriu. “Combination way of local properties, classifiers and saliency in bag-of-keypoints approach for generic object recognition,” International Journal of Computer Science and Network Security, vol. 11, no. 1, pp. 35-42, July, 2011. Article (CrossRef Link).
  22. M. Datar, N. Immorlica, P. Indyk, V.S. Mirrokni. "Locality-sensitive hashing scheme based on p-stable distributions," in Proc. of the 20th Annual Symposium on Computational Geometry, pp. 253-262, October 21-25, 2004. Article (CrossRef Link).
  23. M. Slaney, M. Casey, ‘Locality-sensitive hashing for finding nearest neighbors,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 128-131, March, 2008. Article (CrossRef Link). https://doi.org/10.1109/MSP.2007.914237
  24. J. Harel, C. Koch, and P. Perona. Graph-based visual saliency[C], in Proc. of Advances in Neural Information Processing Systems, pp. 545-552, November 12-15, 2007. Article (CrossRef Link).
  25. L. Breiman. "Random forests," http://www.stat.berkeley.edu/-breiman/RandomForests/ 2014. 07.
  26. L. Itti, C. Koch, and E. Niebur. “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, November, 1998. Article (CrossRef Link). https://doi.org/10.1109/34.730558
  27. B. Geng, L. Yang, and C. Xu. "A study of language model for image retrieval," In: Proc. of IEEE International Conference on Data Mining Workshops, pp. 158-163, December 6-6, 2009. Article (CrossRef Link).
  28. F.F. Li, R. Fergus, P. Perona. “Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories,” Computer Vision and Image Understanding, vol. 106, no. 1, pp. 59-70, Augest, 2005. Article (CrossRef Link). https://doi.org/10.1016/j.cviu.2005.09.012
  29. M. Everingham, L. Van Gool, C. K. I. Williams, et al. "The PASCAL Visual Object Classes Challenge Results,"http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/results/index.shtml, 08. 2014.
  30. S. hui, L. Zhenbao, Han Junwei et al. “Learning High-Level Feature by Deep Belief Networks for 3-D Model Retrieval and Recognition,” IEEE Transactions on Multimedia, vol. 16, no. 8, pp. 2154-2167, December, 2014. Article (CrossRef Link). https://doi.org/10.1109/TMM.2014.2351788