DOI QR코드

DOI QR Code

Prediction of Protein Subcellular Localization using Label Power-set Classification and Multi-class Probability Estimates

레이블 멱집합 분류와 다중클래스 확률추정을 사용한 단백질 세포내 위치 예측

  • Chi, Sang-Mun (Department of Computer Science and Engineering, Kyungsung University)
  • Received : 2014.08.20
  • Accepted : 2014.09.29
  • Published : 2014.10.31

Abstract

One of the important hints for inferring the function of unknown proteins is the knowledge about protein subcellular localization. Recently, there are considerable researches on the prediction of subcellular localization of proteins which simultaneously exist at multiple subcellular localization. In this paper, label power-set classification is improved for the accurate prediction of multiple subcellular localization. The predicted multi-labels from the label power-set classifier are combined with their prediction probability to give the final result. To find the accurate probability estimates of multi-classes, this paper employs pair-wise comparison and error-correcting output codes frameworks. Prediction experiments on protein subcellular localization show significant performance improvement.

단백질의 기능을 유추할 수 있는 중요한 정보중의 하나는 단백질이 존재하는 세포내 위치이다. 최근에는 하나의 단백질이 동시에 존재하는 여러 세포내 위치를 예측하는 연구가 활발하다. 본 논문에서는 단백질이 존재하는 세포내의 다중위치를 예측하기 위해서 레이블 멱집합 방법을 개선한다. 레이블 멱집합 방법으로 분류한 다중위치들을 예측 확률에 따라 결합하여 최종적인 다중레이블로 분류한다. 각 다중위치에 대한 정확한 확률적 기여를 구하기 위하여 쌍별 비교와 오류정정 출력코드를 사용한 다중클래스 확률추정 방법을 적용하였다. 단백질 세포내 위치 예측 실험에 제안한 방법을 적용하여 성능이 향상됨을 보였다.

Keywords

References

  1. H.-B. Shen and K.-C. Chou, "A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0," Anaytical Biochemistry, vol. 394, no. 2, pp. 269-274, 2009. https://doi.org/10.1016/j.ab.2009.07.046
  2. S.-M. Chi and D. Nam, "WegoLoc: accurate prediction of protein subcellular localization using weighted gene ontology terms," Bioinformatics, vol. 28, no. 7, pp. 1028-1030, 2012. https://doi.org/10.1093/bioinformatics/bts062
  3. J. He, H. Gu, and W. Liu, "Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites," Plos One, vol. 7, no. 6, e37155, 2012. https://doi.org/10.1371/journal.pone.0037155
  4. S. Mei, "Multi-label multi-kernel transfer learning for human protein subcellular localization," Plos One, vol. 7, no. 6, e37716, 2012. https://doi.org/10.1371/journal.pone.0037716
  5. G.-Z. Li, X. Wang, X. Hu, J.-M. Liu, and R.-W. Zhao, "Multilabel learning for protein subcellular location prediction," IEEE transactions on Nanobioscience, vol. 11, no. 3, pp. 237-243, 2012. https://doi.org/10.1109/TNB.2012.2212249
  6. S. Wan, M.-W. Mak, and S.-Y. Kung, "mGOASVM: multi-label protein subcellular localization based on gene ontology and support vector machines," BMC Bioinformatics, 13:290, 2012. https://doi.org/10.1186/1471-2105-13-290
  7. W.-Z. Lin, J.-A. Fang, X. Xiao, and K.-C. Chou, "iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins," Molecular BioSystems, vol. 9, no. 4, pp. 634-644, 2013. https://doi.org/10.1039/c3mb25466f
  8. X. Wang and G.-Z. Li, "Multilabel learning via random label selection for protein subcellular multilocations prediction," IEEE transactions on computational biology and bioinformatics, vol. 10, no. 2, pp. 436-446, 2013. https://doi.org/10.1109/TCBB.2013.21
  9. S.-M. Chi, "A performance comparison of multi-label classification methods for protein subcellular localization prediction," Journal of the Korea Institute of Information and Communication Engineering, vol. 18, no. 4, pp. 992-999, Apr. 2014. https://doi.org/10.6109/jkiice.2014.18.4.992
  10. H. Lodish, et al., Molecular cell biology, 6th ed. New York, NY:W. H. Freeman and Company, 2008.
  11. G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining multi-label data," in Data Mining and Knowledge Discovery Handbook. Boston, MA: Springer, ch. 34, pp. 667-685, 2010.
  12. G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. Dzeroski, "An extensive experimental comparison of methods for multi-label learning," Pattern Recognition, vol. 45, no. 9, pp. 3084-3104, 2012. https://doi.org/10.1016/j.patcog.2012.03.004
  13. M.-L. Zhang and Z-H. Zhou, "A review on multi-label learning algorithms," IEEE transactions on knowledge and data engineering, http://doi.ieeecomputersociety.org/10.1109/TKDE.2013.39.
  14. J. Read, B. Pfahringer, H. Geoff, and F. Eibe, "Classifier Chains for Multi-label Classification," Machine Learning, vol. 85, no. 3. pp. 335-359, 2011.
  15. J. Read, B. Pfahringer, and H. Geoff, "Multi-Label Classification using Ensembles of Pruned Sets," in Proceeding of the 8th IEEE International Conference on Data Mining, pp. 995-1000, 2008.
  16. D. Price, S. Knerr, L. Personnaz, and G. Dreyfus, "Pairwise neural network classifiers with probabilistic outputs," in Neural Information Processing Systems, vol. 7, pp. 1109-1116, 1995.
  17. T. Hastie and R. Tibshirani, "Classification by pairwise coupling," The Annals of Statistics, vol. 26, no. 1, pp. 451-471, 1998. https://doi.org/10.1214/aos/1028144844
  18. T.-F. Wu, C.-J. Lin, and R.C. Weng, "Probability estimates for multi-class classification by pairwise coupling," Journal of Machine Learning Research, vol. 5, pp. 975-1005. 2004.
  19. T.G. Dietterich and G. Bakiri, "Solving multiclass learning problems via error-correcting output codes," Journal of Artificial Intelligence Research, vol. 2, pp. 263-286. 1995.
  20. E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing multiclass to binary: a unifying approach for margin classifier," Journal of Machine Learning Research, vol. 1, pp. 113-141. 2001.
  21. S. Escalera, O. Pujol, and P. Radeva, "Separability of ternary codes for sparse designs of error-correcting output codes," Pattern Recognition Letters, vol. 30, pp. 285-297. 2009. https://doi.org/10.1016/j.patrec.2008.10.002
  22. T.-K. Huang, R.C. Weng, and C.-J. Lin, "Generalized Bradley-Terry models and multi-class probability estimates," Journal of Machine Learning Research, vol. 7, pp. 85-115. 2006.
  23. S.-M. Chi, "Prediction of protein subcellular localization by weighted gene ontology terms," Biochemical and biophysical research communications, vol. 399, no. 3, pp. 402-405, 2010. https://doi.org/10.1016/j.bbrc.2010.07.086
  24. G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, I. Vlahavas, "Mulan: a java library for multi-Label learning," Journal of Machine Learning Research, vol. 12, pp. 2411-2414. 2011.
  25. C.-C. Chang and C.-J. Lin, "LIBSVM : a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, Issue 3, pp. 27:1-27:27, 2011.