Browse > Article
http://dx.doi.org/10.5351/KJAS.2019.32.1.041

Multi-view learning review: understanding methods and their application  

Bae, Kang Il (Department of Applied Statistics, Chung-Ang University)
Lee, Yung Seop (Department of Statistics, Dongguk University)
Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
Publication Information
The Korean Journal of Applied Statistics / v.32, no.1, 2019 , pp. 41-68 More about this Journal
Abstract
Multi-view learning considers data from various viewpoints as well as attempts to integrate various information from data. Multi-view learning has been studied recently and has showed superior performance to a model learned from only a single view. With the introduction of deep learning techniques to a multi-view learning approach, it has showed good results in various fields such as image, text, voice, and video. In this study, we introduce how multi-view learning methods solve various problems faced in human behavior recognition, medical areas, information retrieval and facial expression recognition. In addition, we review data integration principles of multi-view learning methods by classifying traditional multi-view learning methods into data integration, classifiers integration, and representation integration. Finally, we examine how CNN, RNN, RBM, Autoencoder, and GAN, which are commonly used among various deep learning methods, are applied to multi-view learning algorithms. We categorize CNN and RNN-based learning methods as supervised learning, and RBM, Autoencoder, and GAN-based learning methods as unsupervised learning.
Keywords
multi-view learning; multi-modal learning; deep learning; machine learning; data integration;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Uludag, K. and Roebroeck, A. (2014). General overview on the merits of multimodal neuroimaging data fusion, Neuroimage, 102, 3-10.   DOI
2 Vargas, R., Mosavi, A., and Ruiz, L. (2017). Deep Learning: A Review,Advances in Intelligent Systems and Computing.
3 Vrigkas, M., Nikou, C., and Kakadiaris, I. A. (2015). A review of human activity recognition methods, Frontiers in Robotics and AI, 2, 28.
4 Wang, W., Ooi, B. C., Yang, X., Zhang, D., and Zhuang, Y. (2014). Effective multi-modal retrieval based on stacked auto-encoders. In Proceedings of the VLDB Endowment, 7, 649-660.
5 Xu, C., Tao, D., and Xu, C. (2013). A survey on multi-view learning, arXiv preprint arXiv:1304.5634.
6 Yu, Y., Lin, H., Meng, J., Wei, X., Guo, H., and Zhao, Z. (2017). Deep transfer learning for modality classification of medical images, Information, 8, 91.   DOI
7 Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., and Metaxas, D. (2017). StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, arXiv preprint.
8 Zhang, W., Zhang, Y., Ma, L., Guan, J., and Gong, S. (2015). Multimodal learning for facial expression recognition, Pattern Recognition, 48, 3191-3202.   DOI
9 Zhao, J., Xie, X., Xu, X., and Sun, S. (2017). Multi-view learning overview: recent progress and new challenges, Information Fusion, 38, 43-54.   DOI
10 Zhou, Z. H. and Li, M. (2005). Tri-training: exploiting unlabeled data using three classifiers, IEEE Transactions on knowledge and Data Engineering, 86, 660-689.
11 Zhu, X. (2006). Semi-supervised learning literature survey, Computer Science, University of Wisconsin-Madison, 2, 4.
12 Zhu, X. and Goldberg, A. B. (2007). Introduction to semi-supervised learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, 3, 1-130.
13 Bai, B., Weston, J., Grangier, D., Collobert, R., Sadamasa, K., Qi, Y., andWeinberger, K. (2010). Learning to rank with (a lot of) word features, Information retrieval, 13, 291-314.   DOI
14 Adetiba, E. and Olugbara, O. O. (2015). Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features, The Scientific World Journal.
15 Andrew, G., Arora, R., Bilmes, J., and Livescu, K. (2013). Deep canonical correlation analysis, In International Conference on Machine Learning, 1247-1255.
16 Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473.
17 Bengio, Y. (2009). Learning deep architectures for AI, Foundations and trends in Machine Learning, 2, 1-127.   DOI
18 Benton, A., Khayrallah, H., Gujral, B., Reisinger, D., Zhang, S., and Arora, R. (2017). Deep generalized canonical correlation analysis, arXiv preprint arXiv:1702.02519.
19 Bokhari, M. U. and Hasan, F. (2013). Multimodal information retrieval: Challenges and future trends, International Journal of Computer Applications, 74(14).
20 Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, 11, 92-100.
21 Cai, J., Tang, Y., and Wang, J. (2016). Kernel canonical correlation analysis via gradient descent, Neurocomputing, 182, 322-331.   DOI
22 Cho, K., Van Merrienboer, B., Bahdanau, D., and Bengio, Y. (2014a). On the properties of neural machine translation: Encoder-decoder approaches, arXiv preprint arXiv:1409.1259.
23 Dey, A. (2016). Machine learning algorithms: a review, (IJCSIT) International Journal of Computer Science and Information Technologies, 7, 1174-1179.
24 Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014b). Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078.
25 Cristianini, N., Shawe-Taylor, J., and Lodhi, H. (2002). Latent semantic kernels, Journal of Intelligent Information Systems, 18, 127-152.   DOI
26 Dasgupta, S., Littman, M. L., and McAllester, D. A. (2002). PAC generalization bounds for co-training. In Advances in Neural Information Processing Systems, 375-382.
27 Ding, C. and Tao, D. (2015). Robust face recognition via multimodal deep face representation, IEEE Transactions on Multimedia, 17, 2049-2058.   DOI
28 Doersch, C. (2016). Tutorial on variational autoencoders, arXiv preprint arXiv:1606.05908.
29 Dou, Q., Chen, H., Yu, L., Qin, J., and Heng, P. A. (2017). Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection, IEEE Transactions on Biomedical Engineering, 64, 1558-1567.   DOI
30 Du, J., Ling, C. X., and Zhou, Z. H. (2011). When does cotraining work in real data?, IEEE Transactions on Knowledge and Data Engineering, 23, 788-799.   DOI
31 Elman, J. L. (1990). Finding structure in time, Cognitive science, 14, 179-211.   DOI
32 Farquhar, J., Hardoon, D., Meng, H., Shawe-Taylor, J. S., and Szedmak, S. (2006). Two view learning: SVM-2K, theory and practice. In Advances in Neural Information Processing Systems, 355-362.
33 Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., and Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems, 11, 2672-2680.
34 Feng, F., Wang, X., and Li, R. (2014). Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM international conference on Multimedia, (pp. 7-16), ACM
35 Freund, Y. and Haussler, D. (1992). Unsupervised learning of distributions on binary vectors using two layer networks. In Advances in neural information processing systems, 912-919.
36 Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., and Mikolov, T. (2013). Devise: a deep visualsemantic embedding model. In Advances in Neural Information Processing Systems, 2121-2129.
37 Hardoon, D. R., Szedmak, S., and Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods, Neural Computation, 16, 2639-2664.   DOI
38 Hinton, G. E. and Salakhutdinov, R. R. (2006).Reducing the dimensionality of data with neural networks, Science, 313(5786), 504-507.   DOI
39 Hinton, G. E. and Salakhutdinov, R. R. (2009). Replicated softmax: an undirected topic model. In Advances in Neural Information Processing Systems, 1607-1614.
40 Horst, P. (1961). Generalized canonical correlations and their applications to experimental data, Journal of Clinical Psychology, 17, 331-347.   DOI
41 Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift., arXiv preprint arXiv:1502.03167.
42 Kang, G., Liu, K., Hou, B., and Zhang, N. (2017). 3D multi-view convolutional neural networks for lung nodule classification, PLoS One, 12(11).
43 Jain, A., Zamir, A. R., Savarese, S., and Saxena, A. (2016). Structural-RNN: Deep learning on spatiotemporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5308-5317.
44 Jaques, N., Taylor, S., Sano, A., and Picard, R. (2017). Multimodal Autoencoder: A Deep Learning Approach to Filling in Missing Sensor Data and Enabling Better Mood Prediction. In Proceedings of International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, Texas.
45 Jeni, L. A., Girard, J. M., Cohn, J. F., and De La Torre, F. (2013). Continuous au intensity estimation using localized, sparse facial feature space. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1-7.
46 Kingma, D. P. and Welling, M. (2013). Auto-encoding variational Bayes, arXiv preprint arXiv:1312.6114.
47 Kiros, R., Popuri, K., Cobzas, D., and Jagersand, M. (2014). Stacked multiscale feature learning for domain independent medical image segmentation. In International Workshop on Machine Learning in Medical Imaging, 25-32. Springer, Cham.
48 Kullback, S. and Leibler, R. A. (1951). On information and sufficiency, The Annals of Mathematical Statistics, 22, 79-86.   DOI
49 Kumari, J., Rajesh, R., and Pooja, K. M. (2015). Facial expression recognition: a survey, Procedia Computer Science, 58, 486-491.   DOI
50 Lahat, D., Adali, T., and Jutten, C. (2015). Multimodal data fusion: an overview of methods, challenges, and prospects, Proceedings of the IEEE, 103, 1449-1477.   DOI
51 Lorincz, A., Jeni, L., Szabo, Z., Cohn, J., and Kanade, T. (2013). Emotional expression classification using time-series kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 889-895
52 Li, Y., Yang, M., and Zhang, Z. (2016). Multi-view representation learning: a survey from shallow methods to deep methods, arXiv preprint arXiv:1610.01206.
53 Liu, J., Jiang, Y., Li, Z., Zhou, Z. H., and Lu, H. (2015a). Partially shared latent factor learning with multiview data, IEEE Transactions on Neural Networks and Learning Systems, 26, 1233-1246.   DOI
54 Liu, S., Liu, S., Cai, W., Che, H., Pujol, S., Kikinis, R., and Fulham, M. J. (2015b). Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer's disease, IEEE Transactions on Biomedical Engineering, 62, 1132-1140.   DOI
55 Ma, L., Lu, Z., Shang, L., and Li, H. (2015). Multimodal convolutional neural networks for matching image and sentence. In Proceedings of the IEEE International Conference on Computer Vision, 2623-2631.
56 Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., and Yuille, A. (2014). Deep captioning with multimodal recurrent neural networks (m-rnn), arXiv preprint arXiv:1412.6632.
57 Neverova, N., Wolf, C., Taylor, G., and Nebout, F. (2016). Moddrop: adaptive multi-modal gesture recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 1692-1706.   DOI
58 Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 689-696.
59 Nigam, K. and Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management, 86-93.
60 Pohl, C., Ali, R. M., Chand, S. J. H., Tamin, S. S., Nazirun, N. N. N., and Supriyanto, E. (2014). Interdisciplinary approach to multimodal image fusion for vulnerable plaque detection, In Biomedical Engineering and Sciences (IECBES), 2014 IEEE Conference on, 11-16.
61 Qi, C. R., Su, H., Nie$\ss$ner, M., Dai, A., Yan, M., and Guibas, L. J. (2016). Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5648-5656.
62 Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434.
63 Ramachandram, D. and Taylor, G. W. (2017). Deep multimodal learning: a survey on recent advances and trends, IEEE Signal Processing Magazine, 34, 96-108.   DOI
64 Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016). Generative adversarial text to image synthesis, arXiv preprint arXiv:1605.05396.
65 Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, Social Service Review, 61, 85-117.   DOI
66 Shen, D., Wu, G., and Suk, H. I. (2017). Deep learning in medical image analysis, Annual Review of Biomedical Engineering, 19, 221-248.   DOI
67 Sindhwani, V., Niyogi, P., and Belkin, M. (2005). A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML Workshop on Learning with Multiple Views, 74-79, Citeseer.
68 Sitova, Z., Sedenka, J., Yang, Q., Peng, G., Zhou, G., Gasti, P., and Balagani, K. S. (2016). HMOG: New behavioral biometric features for continuous authentication of smartphone users IEEE Transactions on Information Forensics and Security, 11(5), 877-892.   DOI
69 Sousa, R. T. and Gama, J. (2017). Comparison between Co-training and Self-training for single-target regression in data streams using AMRules.
70 Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory (No. CU-CS-321-86), COLORADO UNIV AT BOULDER DEPT OF COMPUTER SCIENCE.
71 Srivastava, N. and Salakhutdinov, R. R. (2012). Multimodal learning with deep Boltzmann machines, In Advances in neural information processing systems, 2222-2230.
72 Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In In Proceedings of the IEEE International Conference on Computer Vision, 945-953.
73 Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, 3104-3112.
74 Suzuki, M., Nakayama, K., and Matsuo, Y. (2016). Joint multimodal learning with deep generative models, arXiv preprint arXiv:1611.01891.
75 Suzuki, M., Nakayama, K., and Matsuo, Y. (2018). Improving Bi-directional Generation between Different Modalities with Variational Autoencoders, arXiv preprint arXiv:1801.08702.
76 Tatulli, E. and Hueber, T. (2017). Feature extraction using multimodal convolutional neural networks for visual speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, 2971-2975.
77 Taylor, G. W., Fergus, R., LeCun, Y., and Bregler, C. (2010). Convolutional learning of spatio-temporal features. In European conference on computer vision, (pp. 140-153), Springer, Berlin, Heidelberg.