Browse > Article

Computational Approaches to Gene Prediction  

Do Jin-Hwan (Bio-food and Drug Research Center, Konkuk University)
Choi Dong-Kug (Department of Biotechnology, Konkuk University)
Publication Information
Journal of Microbiology / v.44, no.2, 2006 , pp. 137-144 More about this Journal
Abstract
The problems associated with gene identification and the prediction of gene structure in DNA sequences have been the focus of increased attention over the past few years with the recent acquisition by large-scale sequencing projects of an immense amount of genome data. A variety of prediction programs have been developed in order to address these problems. This paper presents a review of the computational approaches and gene-finders used commonly for gene prediction in eukaryotic genomes. Two approaches, in general, have been adopted for this purpose: similarity-based and ab initio techniques. The information gleaned from these methods is then combined via a variety of algorithms, including Dynamic Programming (DP) or the Hidden Markov Model (HMM), and then used for gene prediction from the genomic sequences.
Keywords
gene prediction; signal/content sensors; similarity-based; gene-finders; ab initio gene-finders;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
Times Cited By Web Of Science : 18  (Related Records In Web of Science)
Times Cited By SCOPUS : 21
연도 인용수 순위
1 Alexandersson, M., S. Cawley, and L. Pachter. 2003. SLAM: cross-species gene finding and alignment with a generalized pair Markov model. Genome Res. 13, 496- 502   DOI   ScienceOn
2 Gribskov, M., J. Devereux, and R.R. Burgess. 1984. The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res. 12, 539-549   DOI
3 Huang, X., M.D. Adams, H. Zhou, and A.R. Kerlavage. 1997. A tool for analyzing and annotating genomic sequences. Genomics 46, 37-45   DOI   ScienceOn
4 Juvvadi, P.R., Y. Seshime, and K. Kitamoto. 2005. Genomics reveals traces of fungal phenylpropanoid-flavonoid metabolic pathway in the filamentous fungus Aspergillus oryzae. J. Microbiol. 43(6). 475-486   과학기술학회마을
5 Rogozin, I.B. and L. Milanesi. 1997. Analysis of donor splice signals in different organisms. J. Mol. Evol. 45, 50-59   DOI
6 Staden, R. 1984. Measurements of the effect that coding for a protein has on DNA sequence and their use for finding genes. Nucleic Acids Res. 12, 551-567   DOI
7 Tolstrup, N., P. Rouze, and S. Brunak. 1997. A branch point consensus from Arabidopsis found by non-circular analysis allows for better prediction of acceptor sites. Nucleic Acids Res. 25, 3159-3163   DOI
8 Zhang, C.T. and J. Wang. 2000. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res. 28, 2804-2814   DOI
9 Zhang, C.T. and R. Zhang. 1991. Analysis of distribution of bases in the coding sequences by a diagrammatic technique. Nucleic Acids Res. 19, 6313-6317   DOI
10 Salamov, A.A. and V.V. Solovyev. 2000. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 391-393   DOI   ScienceOn
11 Takamatsu, K., K. Maekawa, T. Togashi, D.K. Choi, Y. Suzuki, T.D. Taylor et al. 2002. Identification of two novel primate-specific genes in DSCR. DNA Res. 9, 89-97   DOI
12 Harris, N.L. 1997. Genotator: a workbench for sequence annotation. Genome Res. 7, 754-762   DOI
13 Usuka, J., W. Zhu, and V. Brendel. 2000. Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16, 203-211   DOI   ScienceOn
14 Do, J.H., B.Y. Lim, W.S. Choi, and D.-K. Choi. 2005b. Exploring the Phospholipid Biosynthetic Pathways of Aspergillus fumigatus by Computational Genome Analysis. Eng. Life Sci. 5(6). 574-579   DOI   ScienceOn
15 Fukunishi, Y., H. Suzuki, M. Yoshino, H. Konno, and Y. Hayashizaki. 1999. Prediction of human cDNA from its homologous mouse full-length cDNA and human shotgun database. FEBS Lett. 464, 129-132   DOI   ScienceOn
16 Robison, K., W. Gilbert, and G. Church. 1994. Large-scale bacterial gene discovery by similarity search. Nat. Genet. 7, 205-214   DOI   ScienceOn
17 Togashi, T., D.K. Choi, T.D. Taylor, Y. Suzuki, S. Sugano, M. Hattor et al. 2000. A novel gene, DSCR5, from the distal Down syndrome critical region on chromosome 21q22.2. DNA Res. 7, 207-212   DOI
18 Flicek, P., E. Keibler, P. Hu, I. Korf, and M.R. Brent. 2003. Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. Genome Res. 13, 46-54   DOI   ScienceOn
19 Guigo, R., P. Agarwal, J.F. Abril, M. Burset, and J.W. Fickett. 2000. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631-1642   DOI
20 Reese, M.G., D. Kulp, H. Tammana, and D. Haussler. 2000. Genie–gene finding in Drosophila melanogaster. Genome Res. 10, 529-538   DOI
21 Kim, K.B. and J.S. Sim. 2005. Computational detection of prokaryotic core promoters in genomic sequences. J. Microbiol. 43, 411-416   과학기술학회마을
22 Burge, C. and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78- 94   DOI   ScienceOn
23 Kleffe, J., K. Hermann, W. Vahrson, B. Wittig, and V. Brendel. 1996. Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences. Nucleic Acids Res. 24, 4709-4718   DOI
24 Krogh, A. 2000. Using database matches with HMMgene for automated gene detection in Drosophila. Genome Res. 10, 523-528   DOI
25 Schiex, T., A. Moisan, and P. Rouzé. 2001. EuGène: an eukaryotic gene finder that combines several sources of evidence, p. 111-125. In O. Gascuel and M.-F. Sagot (eds). Lecture Notes in Computer Science, Vol. 2006, First International Conference on Biology, Informatics, and Mathematics, JOBIM 2000. Springer-Verlag, Germany
26 Fickett, J.W. and C.S. Tung. 1992. Assessment of protein coding measures. Nucleic Acids Res. 20, 6441-6450   DOI   ScienceOn
27 Voss, R. 1992. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68, 3805-3808   DOI   ScienceOn
28 Allen, J.E., M. Pertea, and S.L. Salzberg. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14, 142-148   DOI   ScienceOn
29 Burge, C.B. and S. Karlin. 1998. Finding the genes in genomic DN. Curr. Opin. Struct. Biol. 8, 346-354   DOI   ScienceOn
30 Cawley, S.E., A.I. Wirth, and T.P. Speed. 2001. Phat–a gene finding program for Plasmodium falciparum. Mol. Biochem. Parasitol. 118, 167-174   DOI   ScienceOn
31 Trifonov, E.N. and J.L. Sussman. 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc. Natl. Acad. Sci. U.S.A. 77, 3816-3820
32 Zhang, R. and C.T. Zhang. 1994. Zcurves, an intuitive tool for visualizing and analyzing the DNA sequences. J. Biomol. Struct. Dyn. 11, 767-782   DOI   ScienceOn
33 Staden, R. and A.D. McLachlan. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 10, 141-156   DOI
34 Tiwari, S., S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy. 1997. Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci. 113, 263-270
35 Chechetkin, V.R. and A.Y. Turygin. 1995. Size-dependence of three-periodicity and long-range correlations in DNA sequences. Phys. Lett. A. 199, 75-80   DOI   ScienceOn
36 Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science 269, 496-512   DOI
37 Hubbard, T., D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, et al. 2002. The Ensembl genome database project. Nucleic Acids Res. 30, 38-41   DOI   ScienceOn
38 Bucher P. 1990. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212, 563-578   DOI
39 Salzberg, S., A. Delcher, S. Kasif, and O. White. 1998. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26, 544-548   DOI   ScienceOn
40 Yada, T., T. Takagi, Y. Totoki, and Y. Sakaki. 2003. DIGIT: a novel gene finding program by combing gene-finders. Pac. Symp. Biocomput. 375-387
41 Do, J.H., T.K. Park, and D.-K. Choi. 2005a. A computational approach to the inference of sphingolipid pathways from the genome of Aspergillus fumigatus. Curr. Genet. 48, 134-141   DOI
42 Fickett, J. 1982. Recognition of protein-coding regions in DNA sequences. Nucleic Acids Res. 10, 5303-5318   DOI
43 Kotlar, D. and Y. Lavner. 2003. Gene prediction by spectral rotation measure: a new method for identifying proteincoding regions. Genome Res. 13, 1930-1937
44 Mathe, C., M-F. Sagot, T. Schiex, and P. Rouze. 2002. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 30, 4103-4117   DOI   ScienceOn
45 Pedersen, A.G. and H. Nielsen. 1997. Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis, p. 226-233. In T. Gaasterland et al. (eds). The Fifth International Conference on Intelligence Systems for Molecular Biology. AAAI Press, Menlo Park, CA
46 Stormo, G.D. 2000. Gene-finding approaches for eukaryotes. Genome Res. 10, 394-397   DOI   ScienceOn
47 Borodovsky, M. and J. McIninch. 1993. GeneMark: parallel gene recognition for both DNA strands. Comput. Chem. 17, 123-133   DOI   ScienceOn
48 Guo, F.B., H.Y. Ou, and C.T. Zhang. 2003. ZCURVE: a new system for recognizing protein-coding genes in bacterial and archael genomes. Nucleic Acids Res. 31, 1780-1789   DOI   ScienceOn
49 Maniatis, T. and B. Tasic. 2002. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418. 236-243   DOI   ScienceOn
50 Do, J.H., M.J. Anderson, D.W. Denning, and E. Bornberg- Bauer. 2004. Inference of Aspergillus fumigatus pathways by comparative genome analysis: tricarboxylic acid cycle (TCA). J. Microbiol. Biotechnol. 14, 74-80
51 Florea, L., G. Hartzell, Z. Zhang, G.M. Rubin, and W. Miller. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8, 967-974   DOI
52 Salzberg, S.L., M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin. 1999. Interpolated Markov models for eukaryotic gene finding. Genomics 59, 24-31   DOI   ScienceOn
53 Hutchinson, G.B. and M.R. Hayden. 1992. The prediction of exons through an analysis of spliceable open reading frames. Nucleic Acids Res. 20, 3453-3462   DOI   ScienceOn
54 Gaasterland, T. and C.W. Sensen. 1996. Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78, 302-310   DOI   ScienceOn