Browse > Article
http://dx.doi.org/10.6109/jkiice.2014.18.4.992

A Performance Comparison of Multi-Label Classification Methods for Protein Subcellular Localization Prediction  

Chi, Sang-Mun (School of Computer Science and Engineering, Kyungsung University)
Abstract
This paper presents an extensive experimental comparison of a variety of multi-label learning methods for the accurate prediction of subcellular localization of proteins which simultaneously exist at multiple subcellular locations. We compared several methods from three categories of multi-label classification algorithms: algorithm adaptation, problem transformation, and meta learning. Experimental results are analyzed using 12 multi-label evaluation measures to assess the behavior of the methods from a variety of view-points. We also use a new summarization measure to find the best performing method. Experimental results show that the best performing methods are power-set method pruning a infrequently occurring subsets of labels and classifier chains modeling relevant labels with an additional feature. futhermore, ensembles of many classifiers of these methods enhance the performance further. The recommendation from this study is that the correlation of subcellular locations is an effective clue for classification, this is because the subcellular locations of proteins performing certain biological function are not independent but correlated.
Keywords
Multi-label classification; Multi-label evaluation measures; Multiplex proteins; Protein subcellular localization;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Mei, "Multi-label multi-kernel transfer learning for human protein subcellular localization," Plos One, vol. 7, no. 6, e37716, 2012.   DOI
2 H.-B. Shen and K.-C. Chou, "A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0," Anaytical Biochemistry, vol. 394, no. 2, pp. 269-274, 2009.   DOI   ScienceOn
3 S.-M. Chi and D. Nam, "WegoLoc: accurate prediction of protein subcellular localization using weighted gene ontology terms," Bioinformatics, vol. 28, no. 7, pp. 1028- 1030, 2012.   DOI   ScienceOn
4 G.-Z. Li, X. Wang, X. Hu, J.-M. Liu, and R.-W. Zhao, "Multilabel learning for protein subcellular location prediction," IEEE transactions on Nanobioscience, vol. 11, no. 3, pp. 237-243, 2012.   DOI   ScienceOn
5 S. Wan, M.-W. Mak, and S.-Y. Kung, "mGOASVM: multilabel protein subcellular localization based on gene ontology and support vector machines," BMC Bioinformatics, 13:290, 2012.   DOI
6 W.-Z. Lin, J.-A. Fang, X. Xiao, and K.-C. Chou, "iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins," Molecular BioSystems, vol. 9, no. 4, pp. 634-644, 2013.   DOI   ScienceOn
7 X. Wang and G.-Z. Li, "Multilabel learning via random label selection for protein subcellular multilocations prediction," IEEE transactions on computational biology and bioinformatics, vol. 10, no. 2, pp. 436-446, 2013.   DOI   ScienceOn
8 G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining multilabel data," in Data Mining and Knowledge Discovery Handbook. Boston, MA: Springer, ch. 34, pp. 667-685, 2010.
9 G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. Dzeroski, "An extensive experimental comparison of methods for multi-label learning," Pattern Recognition, vol. 45, no. 9, pp. 3084-3104, 2012.   DOI   ScienceOn
10 M.-L. Zhang and Z-H. Zhou, "A review on multi-label learning algorithms," IEEE transactions on knowledge and data engineering, http://doi.ieeecomputersociety.org/10.1109 /TKDE.2013.39.
11 M.-L. Zhang and Z-H. Zhou, "Multi-label neural networks with applications to functional genomics and text categorization," IEEE transactions on knowledge and data engineering, vol. 18, no. 10, pp. 1338-1351, 2006.   DOI
12 M.-L. Zhang and Z-H. Zhou, "Ml-knn: A lazy learning approach to multi-label learning," Pattern Recognition, vol. 40, no. 7, pp. 2038-2048, 2007.   DOI   ScienceOn
13 E. Spyromitros, G. Tsoumakas, and I. Vlahavas, "An Empirical Study of Lazy Multilabel Classification Algorithms," in Proceeding of the 5th Hellenic Conference on Artificial Intelligence, pp. 401-406, 2008.
14 W. Cheng and E. Hullermeier, "Combining instance-based learning and logistic regression for multilabel classification," Machine Learning, vol. 76, no. 2-3, pp. 211-225, 2009.   DOI
15 J. Read, B. Pfahringer, H. Geoff, and F. Eibe, "Classifier Chains for Multi-label Classification," Machine Learning, vol. 85, no. 3. pp. 335-359, 2011.
16 J. Read, B. Pfahringer, and H. Geoff, "Multi-Label Classification using Ensembles of Pruned Sets," in Proceeding of the 8th IEEE International Conference on Data Mining, pp. 995-1000, 2008.
17 J. Furnkranz, E. Hullermeier, E. L. Mencia, and K. Brinker, "Multilabel classification via calibrated label ranking," Machine Learning, vol. 73, no. 2, pp. 133-153, 2008.   DOI
18 G. Tsoumakas, I. Katakis, and I. Vlahavas, "Effective and Efficient Multilabel Classification in Domains with Large Number of Labels," in Proceeding of ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD'08), pp. 30-44. 2008.
19 G. Nasierding, G. Tsoumakas, and A. Kouzani, "Clustering Based Multi-Label Classification for Image Annotation and Retrieval," in Proceeding of 2009 IEEE International Conference on Systems, Man, and Cybernetics, pp. 4514- 4519, 2009.
20 G. Tsoumakas, I. Katakis, and I. Vlahavas, "Random k-Labelsets for Multi-Label Classification," IEEE transactions on knowledge and data engineering, vol. 23, no. 7, pp. 1079- 1089, 2011.   DOI
21 R. E. Schapire and Y. Singer, "BoosTexter: A boostingbased system for text categorization," Machine learning, vol. 39, no. 2-3, pp. 135-168, 2000.   DOI   ScienceOn
22 M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA Data Mining Software: An Update," ACM SIGKDD explorations newsletter, vol. 11, no.1, pp. 10-18, 2009.   DOI
23 S.-M. Chi, "Prediction of protein subcellular localization by weighted gene ontology terms," Biochemical and biophysical research communications, vol. 399, no. 3, pp. 402-405, 2010.   DOI   ScienceOn
24 J. He, H. Gu, and W. Liu, "Imbalanced multi-modal multilabel learning for subcellular localization prediction of human proteins with both single and multiple sites," Plos One, vol. 7, no. 6, e37155, 2012.   DOI   ScienceOn
25 G. Tsoumakas, A. Dimou, E. Spyromitros, V. Mezaris, I. Kompatsiaris, and I. Vlahavas, "Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning," in Proceeding of ECML/PKDD 2009 Workshop on Learning from Multi-Label Data (MLD'09), pp. 101- 116, 2009.