[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/KJAS.2019.32.1.041

Multi-view learning review: understanding methods and their application

Bae, Kang Il (Department of Applied Statistics, Chung-Ang University)
Lee, Yung Seop (Department of Statistics, Dongguk University)
Lim, Changwon (Department of Applied Statistics, Chung-Ang University)

Publication Information

The Korean Journal of Applied Statistics / v.32, no.1, 2019 , pp. 41-68 More about this Journal

Abstract

Multi-view learning considers data from various viewpoints as well as attempts to integrate various information from data. Multi-view learning has been studied recently and has showed superior performance to a model learned from only a single view. With the introduction of deep learning techniques to a multi-view learning approach, it has showed good results in various fields such as image, text, voice, and video. In this study, we introduce how multi-view learning methods solve various problems faced in human behavior recognition, medical areas, information retrieval and facial expression recognition. In addition, we review data integration principles of multi-view learning methods by classifying traditional multi-view learning methods into data integration, classifiers integration, and representation integration. Finally, we examine how CNN, RNN, RBM, Autoencoder, and GAN, which are commonly used among various deep learning methods, are applied to multi-view learning algorithms. We categorize CNN and RNN-based learning methods as supervised learning, and RBM, Autoencoder, and GAN-based learning methods as unsupervised learning.

Keywords

multi-view learning; multi-modal learning; deep learning; machine learning; data integration;

Citations & Related Records

Reference

1	Uludag, K. and Roebroeck, A. (2014). General overview on the merits of multimodal neuroimaging data fusion, Neuroimage, 102, 3-10. DOI
2	Vargas, R., Mosavi, A., and Ruiz, L. (2017). Deep Learning: A Review,Advances in Intelligent Systems and Computing.
3	Vrigkas, M., Nikou, C., and Kakadiaris, I. A. (2015). A review of human activity recognition methods, Frontiers in Robotics and AI, 2, 28.
4	Wang, W., Ooi, B. C., Yang, X., Zhang, D., and Zhuang, Y. (2014). Effective multi-modal retrieval based on stacked auto-encoders. In Proceedings of the VLDB Endowment, 7, 649-660.
5	Xu, C., Tao, D., and Xu, C. (2013). A survey on multi-view learning, arXiv preprint arXiv:1304.5634.
6	Yu, Y., Lin, H., Meng, J., Wei, X., Guo, H., and Zhao, Z. (2017). Deep transfer learning for modality classification of medical images, Information, 8, 91. DOI
7	Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., and Metaxas, D. (2017). StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, arXiv preprint.
8	Zhang, W., Zhang, Y., Ma, L., Guan, J., and Gong, S. (2015). Multimodal learning for facial expression recognition, Pattern Recognition, 48, 3191-3202. DOI
9	Zhao, J., Xie, X., Xu, X., and Sun, S. (2017). Multi-view learning overview: recent progress and new challenges, Information Fusion, 38, 43-54. DOI
10	Zhou, Z. H. and Li, M. (2005). Tri-training: exploiting unlabeled data using three classifiers, IEEE Transactions on knowledge and Data Engineering, 86, 660-689.
11	Zhu, X. (2006). Semi-supervised learning literature survey, Computer Science, University of Wisconsin-Madison, 2, 4.
12	Zhu, X. and Goldberg, A. B. (2007). Introduction to semi-supervised learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, 3, 1-130.
13	Bai, B., Weston, J., Grangier, D., Collobert, R., Sadamasa, K., Qi, Y., andWeinberger, K. (2010). Learning to rank with (a lot of) word features, Information retrieval, 13, 291-314. DOI
14	Adetiba, E. and Olugbara, O. O. (2015). Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features, The Scientific World Journal.
15	Andrew, G., Arora, R., Bilmes, J., and Livescu, K. (2013). Deep canonical correlation analysis, In International Conference on Machine Learning, 1247-1255.
16	Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473.
17	Bengio, Y. (2009). Learning deep architectures for AI, Foundations and trends in Machine Learning, 2, 1-127. DOI
18	Benton, A., Khayrallah, H., Gujral, B., Reisinger, D., Zhang, S., and Arora, R. (2017). Deep generalized canonical correlation analysis, arXiv preprint arXiv:1702.02519.
19	Bokhari, M. U. and Hasan, F. (2013). Multimodal information retrieval: Challenges and future trends, International Journal of Computer Applications, 74(14).
20	Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, 11, 92-100.
21	Cai, J., Tang, Y., and Wang, J. (2016). Kernel canonical correlation analysis via gradient descent, Neurocomputing, 182, 322-331. DOI
22	Cho, K., Van Merrienboer, B., Bahdanau, D., and Bengio, Y. (2014a). On the properties of neural machine translation: Encoder-decoder approaches, arXiv preprint arXiv:1409.1259.
23	Dey, A. (2016). Machine learning algorithms: a review, (IJCSIT) International Journal of Computer Science and Information Technologies, 7, 1174-1179.
24	Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014b). Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078.
25	Cristianini, N., Shawe-Taylor, J., and Lodhi, H. (2002). Latent semantic kernels, Journal of Intelligent Information Systems, 18, 127-152. DOI
26	Dasgupta, S., Littman, M. L., and McAllester, D. A. (2002). PAC generalization bounds for co-training. In Advances in Neural Information Processing Systems, 375-382.
27	Ding, C. and Tao, D. (2015). Robust face recognition via multimodal deep face representation, IEEE Transactions on Multimedia, 17, 2049-2058. DOI
28	Doersch, C. (2016). Tutorial on variational autoencoders, arXiv preprint arXiv:1606.05908.
29	Dou, Q., Chen, H., Yu, L., Qin, J., and Heng, P. A. (2017). Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection, IEEE Transactions on Biomedical Engineering, 64, 1558-1567. DOI
30	Du, J., Ling, C. X., and Zhou, Z. H. (2011). When does cotraining work in real data?, IEEE Transactions on Knowledge and Data Engineering, 23, 788-799. DOI
31	Elman, J. L. (1990). Finding structure in time, Cognitive science, 14, 179-211. DOI
32	Farquhar, J., Hardoon, D., Meng, H., Shawe-Taylor, J. S., and Szedmak, S. (2006). Two view learning: SVM-2K, theory and practice. In Advances in Neural Information Processing Systems, 355-362.
33	Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., and Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems, 11, 2672-2680.
34	Feng, F., Wang, X., and Li, R. (2014). Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM international conference on Multimedia, (pp. 7-16), ACM
35	Freund, Y. and Haussler, D. (1992). Unsupervised learning of distributions on binary vectors using two layer networks. In Advances in neural information processing systems, 912-919.
36	Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., and Mikolov, T. (2013). Devise: a deep visualsemantic embedding model. In Advances in Neural Information Processing Systems, 2121-2129.
37	Hardoon, D. R., Szedmak, S., and Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods, Neural Computation, 16, 2639-2664. DOI
38	Hinton, G. E. and Salakhutdinov, R. R. (2006).Reducing the dimensionality of data with neural networks, Science, 313(5786), 504-507. DOI
39	Hinton, G. E. and Salakhutdinov, R. R. (2009). Replicated softmax: an undirected topic model. In Advances in Neural Information Processing Systems, 1607-1614.
40	Horst, P. (1961). Generalized canonical correlations and their applications to experimental data, Journal of Clinical Psychology, 17, 331-347. DOI
41	Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift., arXiv preprint arXiv:1502.03167.
42	Kang, G., Liu, K., Hou, B., and Zhang, N. (2017). 3D multi-view convolutional neural networks for lung nodule classification, PLoS One, 12(11).
43	Jain, A., Zamir, A. R., Savarese, S., and Saxena, A. (2016). Structural-RNN: Deep learning on spatiotemporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5308-5317.
44	Jaques, N., Taylor, S., Sano, A., and Picard, R. (2017). Multimodal Autoencoder: A Deep Learning Approach to Filling in Missing Sensor Data and Enabling Better Mood Prediction. In Proceedings of International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, Texas.
45	Jeni, L. A., Girard, J. M., Cohn, J. F., and De La Torre, F. (2013). Continuous au intensity estimation using localized, sparse facial feature space. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1-7.
46	Kingma, D. P. and Welling, M. (2013). Auto-encoding variational Bayes, arXiv preprint arXiv:1312.6114.
47	Kiros, R., Popuri, K., Cobzas, D., and Jagersand, M. (2014). Stacked multiscale feature learning for domain independent medical image segmentation. In International Workshop on Machine Learning in Medical Imaging, 25-32. Springer, Cham.
48	Kullback, S. and Leibler, R. A. (1951). On information and sufficiency, The Annals of Mathematical Statistics, 22, 79-86. DOI
49	Kumari, J., Rajesh, R., and Pooja, K. M. (2015). Facial expression recognition: a survey, Procedia Computer Science, 58, 486-491. DOI
50	Lahat, D., Adali, T., and Jutten, C. (2015). Multimodal data fusion: an overview of methods, challenges, and prospects, Proceedings of the IEEE, 103, 1449-1477. DOI
51	Lorincz, A., Jeni, L., Szabo, Z., Cohn, J., and Kanade, T. (2013). Emotional expression classification using time-series kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 889-895
52	Li, Y., Yang, M., and Zhang, Z. (2016). Multi-view representation learning: a survey from shallow methods to deep methods, arXiv preprint arXiv:1610.01206.
53	Liu, J., Jiang, Y., Li, Z., Zhou, Z. H., and Lu, H. (2015a). Partially shared latent factor learning with multiview data, IEEE Transactions on Neural Networks and Learning Systems, 26, 1233-1246. DOI
54	Liu, S., Liu, S., Cai, W., Che, H., Pujol, S., Kikinis, R., and Fulham, M. J. (2015b). Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer's disease, IEEE Transactions on Biomedical Engineering, 62, 1132-1140. DOI
55	Ma, L., Lu, Z., Shang, L., and Li, H. (2015). Multimodal convolutional neural networks for matching image and sentence. In Proceedings of the IEEE International Conference on Computer Vision, 2623-2631.
56	Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., and Yuille, A. (2014). Deep captioning with multimodal recurrent neural networks (m-rnn), arXiv preprint arXiv:1412.6632.
57	Neverova, N., Wolf, C., Taylor, G., and Nebout, F. (2016). Moddrop: adaptive multi-modal gesture recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 1692-1706. DOI
58	Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 689-696.
59	Nigam, K. and Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management, 86-93.
60	Pohl, C., Ali, R. M., Chand, S. J. H., Tamin, S. S., Nazirun, N. N. N., and Supriyanto, E. (2014). Interdisciplinary approach to multimodal image fusion for vulnerable plaque detection, In Biomedical Engineering and Sciences (IECBES), 2014 IEEE Conference on, 11-16.
61	Qi, C. R., Su, H., Nie $\ss$ ner, M., Dai, A., Yan, M., and Guibas, L. J. (2016). Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5648-5656.
62	Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434.
63	Ramachandram, D. and Taylor, G. W. (2017). Deep multimodal learning: a survey on recent advances and trends, IEEE Signal Processing Magazine, 34, 96-108. DOI
64	Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016). Generative adversarial text to image synthesis, arXiv preprint arXiv:1605.05396.
65	Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, Social Service Review, 61, 85-117. DOI
66	Shen, D., Wu, G., and Suk, H. I. (2017). Deep learning in medical image analysis, Annual Review of Biomedical Engineering, 19, 221-248. DOI
67	Sindhwani, V., Niyogi, P., and Belkin, M. (2005). A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML Workshop on Learning with Multiple Views, 74-79, Citeseer.
68	Sitova, Z., Sedenka, J., Yang, Q., Peng, G., Zhou, G., Gasti, P., and Balagani, K. S. (2016). HMOG: New behavioral biometric features for continuous authentication of smartphone users IEEE Transactions on Information Forensics and Security, 11(5), 877-892. DOI
69	Sousa, R. T. and Gama, J. (2017). Comparison between Co-training and Self-training for single-target regression in data streams using AMRules.
70	Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory (No. CU-CS-321-86), COLORADO UNIV AT BOULDER DEPT OF COMPUTER SCIENCE.
71	Srivastava, N. and Salakhutdinov, R. R. (2012). Multimodal learning with deep Boltzmann machines, In Advances in neural information processing systems, 2222-2230.
72	Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In In Proceedings of the IEEE International Conference on Computer Vision, 945-953.
73	Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, 3104-3112.
74	Suzuki, M., Nakayama, K., and Matsuo, Y. (2016). Joint multimodal learning with deep generative models, arXiv preprint arXiv:1611.01891.
75	Suzuki, M., Nakayama, K., and Matsuo, Y. (2018). Improving Bi-directional Generation between Different Modalities with Variational Autoencoders, arXiv preprint arXiv:1801.08702.
76	Tatulli, E. and Hueber, T. (2017). Feature extraction using multimodal convolutional neural networks for visual speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, 2971-2975.
77	Taylor, G. W., Fergus, R., LeCun, Y., and Bregler, C. (2010). Convolutional learning of spatio-temporal features. In European conference on computer vision, (pp. 140-153), Springer, Berlin, Heidelberg.

KSCI

Multi-view learning review: understanding methods and their application 멀티 뷰 기법 리뷰: 이해와 응용

Multi-view learning review: understanding methods and their application