Browse > Article
http://dx.doi.org/10.5483/BMBRep.2010.43.10.670

A novel method for predicting protein subcellular localization based on pseudo amino acid composition  

Ma, Junwei (School of Control Science and Engineering, Dalian University of Technology)
Gu, Hong (School of Control Science and Engineering, Dalian University of Technology)
Publication Information
BMB Reports / v.43, no.10, 2010 , pp. 670-676 More about this Journal
Abstract
In this paper, a novel approach, ELM-PCA, is introduced for the first time to predict protein subcellular localization. Firstly, Protein Samples are represented by the pseudo amino acid composition (PseAAC). Secondly, the principal component analysis (PCA) is employed to extract essential features. Finally, the Elman Recurrent Neural Network (RNN) is used as a classifier to identify the protein sequences. The results demonstrate that the proposed approach is effective and practical.
Keywords
Elman neural network; Five-fold cross validation; Principal component analysis; Protein subcellular localization; Pseudo amino acid composition;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
Times Cited By Web Of Science : 2  (Related Records In Web of Science)
Times Cited By SCOPUS : 3
연도 인용수 순위
1 Yeung, K. Y. and Ruzzo, W. L. (2001) Principal component analysis for clustering gene expression data. Bioinformatics. 17, 763-774.   DOI   ScienceOn
2 Bishop, C. (2006) Pattern recognition and machine learning. pp. 225-284. Springer, New York, USA.
3 Gardy, J., Laird, M., Chen, F., Rey, S., Walsh, C., Ester, M. and Brinkman, F. (2005) Psortb v. 2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617-623.   DOI   ScienceOn
4 Yu, C. S., Lin, C. J. and Hwang, J. K. (2004) Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 13, 1402-1406.   DOI   ScienceOn
5 Yang J., Gao, X., Zhang D. and Yang, J. Y. (2005) Kernel ICA: an alternative formulation and its application to face recognition. Pattern Recognition 38, 1784-1787.   DOI   ScienceOn
6 Szafron, D., Lu, P., Greiner, R., Wishart, D., Poulin, B., Eisner, R., Lu, Z., Anvik, J., Macdonell, C., Fyshe, A. and Meeuwis, D. (2004) Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids. Res. 32, 365-371.   DOI   ScienceOn
7 Imai, K., Asakawa, N., Tsuji, T., Akazawa, F., Ino, A., Sonoyama, M. and Mitaku, S. (2008) SOSUI-GramN: high performance prediction for sub-cellular localization of proteins in Gram-negative bacteria. Bioinformatics. 2, 417-421.
8 Hoffmann, H. (2007) Kernel pca for novelty detection. Pattern Recognition. 40, 863-874.   DOI   ScienceOn
9 Yu, U., Lee, S. H., Kim, Y. J. and Kim, S. (2004) Bioinformatics in the post-genome era. BMB Rep. 37, 75-82.   DOI
10 Ma, J. W., Liu, W. Q. and Gu, H. (2009) Predicting protein subcellular locations for gram-negative bacteria using neural networks ensemble. Proceedings of the 6th Annual IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp.114-120, Tennessee, USA.
11 Hinton, G. and Salakhutdinov, R. (2006) Reducing the dimensionality of data with neural networks. Science 313, 504-507.   DOI   ScienceOn
12 Zeng, Y., Guo, Y., Xiao, R., Yang, L., Yu, L. and Li, M. (2009) Using the augmented chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J. Theor. Biol. 259, 366-372.   DOI   ScienceOn
13 Shen, H. B. and Chou, K. C. (2007) Nuc-PLoc: a new web- server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng. Des. Sel. 20, 561-567.   DOI   ScienceOn
14 Shi, X. H., Liang, Y. C., Lee, H. P., Lin, W. Z., Xu, X. and Lim, S. P. (2004) Improved elman networks and applications for controlling ultrasonic motors. Appl. Artif. Intell. 18, 603-629.   DOI   ScienceOn
15 Shen, H. B. and Chou, K. C. (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics. 22, 1717-1722.   DOI   ScienceOn
16 Jolliffe, I. (2002) Principal component analysis. pp. 29-43, Springer-Verlag, Second Edition, New York, USA
17 Elman, J. (1990) Finding structure in time. Cog. Sci. 14, 179-211.   DOI
18 Dehling, H., Fleurke, S. and Klske, C. (2008) Parking on a random tree. J. Stat. Phys. 133, 151-157.   DOI
19 Witten, I. and Frank, E. (2005) Data Mining: practical machine learning tools and techniques. pp.189-283, Morgan Kaufmann Publishers, Second Edition, San Francisco, USA.
20 Yousef, M., Jung, S., Kossenkov, A., Showe, L. S. and Showe, M. (2007) Naive Bayes for microRNA target predictions-machine learning for microRNA targets. Bioinformatics. 23, 2987-2992.   DOI   ScienceOn
21 Bhasin, M., Garg, A. and Raghava, G. P. S. (2005) PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 21, 2522-2524.   DOI   ScienceOn
22 Cedano, J. Aloy, P. Perez-Pons, J. and Querol, E. (1997) Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266, 594-600.   DOI   ScienceOn
23 Shen, H. B. and Chou, K. C. (2005) Predicting protein subnuclear location with optimized evidence-theoretic k-nearest classifier and pseudo amino acid composition. Biochem. Biophys. Res. Commun. 337, 752-756.   DOI   ScienceOn
24 Shen, H. B. and Chou, K. C. (2009) A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Anal. Biochem. 394, 269-274.   DOI   ScienceOn
25 Lei, Z. and Dai, Y. (2005) An SVM-based system for predicting protein subnuclear localizations. BMC Bioinformatics. 6, 291-298.   DOI
26 Huang, W. Tung, C., Huang, H. and Ho, S. (2009) Predicting protein subnuclear localization using GO-amino-acid composition features. Biosystems. 98, 73-79.   DOI   ScienceOn
27 Glory, E. and Murphy, R. F. (2007) Automated subcellular location determination and high-throughput microscopy. Dev. Cell. 12, 7-16.   DOI   ScienceOn
28 Chou, K. C. and Shen, H. B. (2008) Cell-PLoc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc. 3, 153-162.   DOI   ScienceOn
29 Chou, K. C. (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. 43, 246-255.   DOI   ScienceOn
30 Ding, Y. S. and Zhang, T. L. (2008) Using chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recognit. Lett. 29, 1887-1892.   DOI   ScienceOn
31 Shen, H. B. and Chou, K. C. (2007) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 373, 386-388.   DOI   ScienceOn
32 Chou, K. C. (1995) A novel approach to predicting protein structural classes in a (20-1)-d amino acid composition space. Proteins: Struct. Funct. Genet. 21, 319-344   DOI   ScienceOn
33 Glory, E. and Murphy, R. (2007) Automated subcellular location determination and high-throughput microscopy. Dev. Cell 12, 7-16.   DOI   ScienceOn
34 Chou, K. C. and Shen, H. B. (2008) Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc. 3, 153-162.   DOI   ScienceOn
35 Nakashima, H. and Nishikawa, K. (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol. 238, 54-61.   DOI   ScienceOn