Browse > Article
http://dx.doi.org/10.5483/BMBRep.2013.46.1.159

Partial AUC maximization for essential gene prediction using genetic algorithms  

Hwang, Kyu-Baek (School of Computer Science and Engineering, Soongsil University)
Ha, Beom-Yong (School of Computer Science and Engineering, Soongsil University)
Ju, Sanghun (School of Computer Science and Engineering, Soongsil University)
Kim, Sangsoo (School of Systems Biomedical Science, Soongsil University)
Publication Information
BMB Reports / v.46, no.1, 2013 , pp. 41-46 More about this Journal
Abstract
Identifying genes indispensable for an organism's life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, protein-protein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature's relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods.
Keywords
AUC; Classification; Essential genes; Genetic algorithms; Partial AUC;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hwang, Y. C., Lin, C. C., Chang, J. Y., Mori, H., Juan, H. F. and Huang, H. C. (2009) Predicting essential genes based on network and sequence analysis. Mol. Biosyst. 5, 1672-1678.   DOI   ScienceOn
2 Choi, J. K., Kim, S. C., Seo, J., Kim, S. and Bhak, J. (2007) Impact of transcriptional properties on essentiality and evolutionary rate. Genetics. 175, 199-206.
3 Zhou, L., Ma, X. and Sun, F. (2008) The effects of protein interactions, gene essentiality and regulatory regions on expression variation. BMC Syst. Biol. 2, 54.   DOI   ScienceOn
4 Deng, J., Deng, L., Su, S., Zhang, M., Lin, X., Wei, L., Minai, A. A., Hassett, D. J. and Lu, L. J. (2011) Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39, 795-807.   DOI   ScienceOn
5 Takenouchi, T., Komori, O. and Eguchi, S. (2012) An extension of the receiver operating characteristic curve and auc-optimal classification. Neural Comput. 24, 2789-2824.   DOI   ScienceOn
6 Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U. and Eisenberg, D. (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449-451.   DOI   ScienceOn
7 Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., Marshall, K. A., Phillippy, K. H., Sherman, P. M., Muertter, R. N., Holko, M., Ayanbule, O., Yefanov, A. and Soboleva, A. (2011) NCBI GEO: archive for functional genomics data sets-10 years on. Nucleic Acids Res. D1005-1010.
8 Zhang, C. T. and Zhang, R. (2008) Gene essentiality analysis based on DEG, a database of essential genes. Methods Mol. Biol. 416, 391-400.   DOI   ScienceOn
9 Krylov, D. M., Wolf, Y. I., Rogozin, I. B. and Koonin, E. V. (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229-2235.   DOI   ScienceOn
10 Zhang, R. and Lin, Y. (2009) DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37, D455-D458.   DOI   ScienceOn
11 Plaimas, K., Eils, R. and Konig, R. (2010) Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Syst. Biol. 4, 56.   DOI   ScienceOn
12 Jordan, K., Rogozin, I. B., Wolf, Y. I. and Koonin, E. V. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962-968.   DOI
13 Seringhaus, M., Paccanaro, A., Borneman, A., Snyder, M. and Gerstein, M. (2006) Predicting essential genes in fungal genomes. Genome Res. 16, 1126-1135.   DOI   ScienceOn
14 Gustafson, A. M., Snitkin, E. S., Parker, S. C., DeLisi, C. and Kasif, S. (2006) Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics. 7, 265.   DOI
15 Dezso, Z., Oltvai, Z. N. and Barabasi, A. L. (2003) Bioinformatics analysis of experimentally determined protein complexes in the yeast Saccharomyces cerevisiae. Genome Res. 13, 2450-2454.   DOI   ScienceOn
16 Jeong, H., Oltvai, Z. N. and Barabasi, A. -L. (2003) Prediction of protein essentiality based on genomic data. ComPlexUs 1, 19-28.   DOI   ScienceOn
17 Yu, H., Greenbaum, D., Xin Lu, H., Zhu, X. and Gerstein, M. (2004) Genomic analysis of essentiality within protein networks. Trends Genet. 20, 227-231.   DOI   ScienceOn
18 Jansen, R., Greenbaum, D. and Gerstein, M. (2002) Relating whole-genome expression data with protein-protein interactions. Genome Res. 12, 37-46.   DOI   ScienceOn
19 Jeong, H., Mason, S. P., Barabási, A. -L. and Oltvai, Z. N. (2001) Lethality and centrality in protein networks. Nature 411, 41-42.   DOI   ScienceOn
20 Sharp, P. M., Tuohy, T. M. and Mosurski, K. R. (1986) Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14, 5125-5143.   DOI   ScienceOn
21 Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., and Speed, T. P. (2003) Exploration, normalization, and summarization of high density oligonucleotide array probe level data. Biostatistics. 4, 249-264.   DOI   ScienceOn
22 Kohavi, R. and John, G. H. (1997) Wrappers for feature selection. Artif. Intell. 97, 249-256.