Browse > Article
http://dx.doi.org/10.4014/jmb.1203.03050

Algorithm for Predicting Functionally Equivalent Proteins from BLAST and HMMER Searches  

Yu, Dong Su (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology)
Lee, Dae-Hee (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology)
Kim, Seong Keun (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology)
Lee, Choong Hoon (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology)
Song, Ju Yeon (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology)
Kong, Eun Bae (Department of Computer Science and Engineering, Chungnam National University)
Kim, Jihyun F. (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology)
Publication Information
Journal of Microbiology and Biotechnology / v.22, no.8, 2012 , pp. 1054-1058 More about this Journal
Abstract
In order to predict biologically significant attributes such as function from protein sequences, searching against large databases for homologous proteins is a common practice. In particular, BLAST and HMMER are widely used in a variety of biological fields. However, sequence-homologous proteins determined by BLAST and proteins having the same domains predicted by HMMER are not always functionally equivalent, even though their sequences are aligning with high similarity. Thus, accurate assignment of functionally equivalent proteins from aligned sequences remains a challenge in bioinformatics. We have developed the FEP-BH algorithm to predict functionally equivalent proteins from protein-protein pairs identified by BLAST and from protein-domain pairs predicted by HMMER. When examined against domain classes of the Pfam-A seed database, FEP-BH showed 71.53% accuracy, whereas BLAST and HMMER were 57.72% and 36.62%, respectively. We expect that the FEP-BH algorithm will be effective in predicting functionally equivalent proteins from BLAST and HMMER outputs and will also suit biologists who want to search out functionally equivalent proteins from among sequence-homologous proteins.
Keywords
Functionally equivalent protein; error backpropagation algorithm; sequence-based method; artificial neural network; bioinformatics;
Citations & Related Records

Times Cited By Web Of Science : 0  (Related Records In Web of Science)
연도 인용수 순위
  • Reference
1 Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410.
2 Caspi, R., T. Altman, K. Dreher, C. A. Fulcher, P. Subhraveti, I. M. Keseler, et al. 2012. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/ genome databases. Nucleic Acids Res. 40: D742-D753.   DOI   ScienceOn
3 Deb, K. and A. Raji Reddy. 2003. Reliable classification of two-class cancer data using evolutionary algorithms. Biosystems 72: 111-129.   DOI   ScienceOn
4 Eddy, S. R. 1998. Profile hidden Markov models. Bioinformatics 14: 755-763.   DOI   ScienceOn
5 Finn, R. D., J. Mistry, J. Tate, P. Coggill, A. Heger, J. E. Pollington, et al. 2010. The Pfam protein families database. Nucleic Acids Res. 38: D211-D222.   DOI   ScienceOn
6 Fischer, S., B. P. Brunk, F. Chen, X. Gao, O. S. Harb, J. B. Iodice, et al. 2011. Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinformatics 35: 6.12.1-6.12.19.
7 Haft, D. H., B. J. Loftus, D. L. Richardson, F. Yang, J. A. Eisen, I. T. Paulsen, and O. White. 2001. TIGRFAMs: A protein family resource for the functional identification of proteins. Nucleic Acids Res. 29: 41-43.   DOI   ScienceOn
8 Karlik, B., M. O. Tokhi, and M. Alci. 2003. A fuzzy clustering neural network architecture for multifunction upper-limb prosthesis. IEEE Trans. Biomed. Eng. 50: 1255-1261.   DOI   ScienceOn
9 Keim, D. A., D. Oelke, R. Truman, and K. Neuhaus. 2006. Finding correlations in functionally equivalent proteins by integrating automated and visual data exploration, pp. 183-192. In: Proceedings of the Sixth IEEE Symposium on BioInformatics and BioEngineering, 16-18 October 2006. IEEE Computer Society Washington, DU, USA.
10 Koski, L. B., M. W. Gray, B. F. Lang, and G. Burger. 2005. AutoFACT: An automatic functional annotation and classification tool. BMC Bioinformatics 6: 151.   DOI   ScienceOn
11 Ludwig, W., O. Strunk, R. Westram, L. Richter, H. Meier, Yadhukumar, et al. 2004. ARB: A software environment for sequence data. Nucleic Acids Res. 32: 1363-1371.   DOI   ScienceOn
12 Magrane, M. and U. Consortium. 2011. UniProt Knowledgebase: A hub of integrated protein data. Database (Oxford) 2011: bar009.
13 Mardis, E. R. 2008. The impact of next-generation sequencing technology on genetics. Trends Genet. 24: 133-141.   DOI   ScienceOn
14 Ma, Z., C. Zhou, L. Lu, Y. Ma, P. Sun, and Y. Cui. 2007. Predicting protein-protein interactions based on BP neural network, pp. 3-7. In: Proceedings of IEEE International Conference on Bioinformatics and Biomedicine Workshops, 2007. IEEE Computer Society Washington, DC, USA.
15 Naik, A. D. and S. S. Bhagwat. 2005. Optimization of an artificial neural network for modeling protein solubility. J. Chem. Eng. Data 50: 460-467.   DOI   ScienceOn
16 McMillan, L. E. and A. C. Martin. 2008. Automatically extracting functionally equivalent proteins from SwissProt. BMC Bioinformatics 9: 418.   DOI   ScienceOn
17 Michalopoulos, D. and C.-K. Hu. 2002. An error backpropagation artificial neural networks application in automatic car license plate recognition, pp. 1-8. In: Lecture Notes in Computer Science. Vol. 2358. Springer Berlin/Heidelberg.
18 Moreno-Hagelsieb, G. and K. Latimer. 2008. Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics 24: 319-324.   DOI   ScienceOn
19 Nair, T. M., S. S. Tambe, and B. D. Kulkarni. 1994. Application of artificial neural networks for prokaryotic transcription terminator prediction. FEBS Lett. 346: 273-277.   DOI   ScienceOn
20 Needleman, S. B. and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48: 443-453.   DOI
21 Oh, S.-H. 2011. Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74: 1058-1061.   DOI   ScienceOn
22 Ponting, C. P. 2001. Issues in predicting protein function from sequence. Briefings Bioinformatics 2: 19-29.   DOI   ScienceOn
23 Smith, T. F. and M. S. Waterman. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147: 195-197.   DOI
24 Watson, J. D., R. A. Laskowski, and J. M. Thornton. 2005. Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15: 275-284.   DOI   ScienceOn
25 Zhang, W., J. Chen, Y. Yang, Y. Tang, J. Shang, and B. Shen. 2011. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS One 6: e17915.   DOI   ScienceOn
26 Wilamowski, B. M. 2009. Neural network architectures and learning algorithms. Ind. Electron. Mag. IEEE 3: 56-63.