Computational Approaches to Gene Prediction

Do Jin-Hwan;Choi Dong-Kug;

Journal of Microbiology

Volume 44 Issue 2
/
Pages.137-144
/
2006
/
1225-8873(pISSN)
/
1976-3794(eISSN)

The Microbiological Society of Korea (한국미생물학회)

Computational Approaches to Gene Prediction

Do Jin-Hwan (Bio-food and Drug Research Center, Konkuk University) ;
Choi Dong-Kug (Department of Biotechnology, Konkuk University)

Published : 2006.04.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The problems associated with gene identification and the prediction of gene structure in DNA sequences have been the focus of increased attention over the past few years with the recent acquisition by large-scale sequencing projects of an immense amount of genome data. A variety of prediction programs have been developed in order to address these problems. This paper presents a review of the computational approaches and gene-finders used commonly for gene prediction in eukaryotic genomes. Two approaches, in general, have been adopted for this purpose: similarity-based and ab initio techniques. The information gleaned from these methods is then combined via a variety of algorithms, including Dynamic Programming (DP) or the Hidden Markov Model (HMM), and then used for gene prediction from the genomic sequences.

Keywords

References

Alexandersson, M., S. Cawley, and L. Pachter. 2003. SLAM: cross-species gene finding and alignment with a generalized pair Markov model. Genome Res. 13, 496- 502 https://doi.org/10.1101/gr.424203
Allen, J.E., M. Pertea, and S.L. Salzberg. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14, 142-148 https://doi.org/10.1101/gr.1562804
Borodovsky, M. and J. McIninch. 1993. GeneMark: parallel gene recognition for both DNA strands. Comput. Chem. 17, 123-133 https://doi.org/10.1016/0097-8485(93)85004-V
Bucher P. 1990. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212, 563-578 https://doi.org/10.1016/0022-2836(90)90223-9
Burge, C. and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78- 94 https://doi.org/10.1006/jmbi.1997.0951
Burge, C.B. and S. Karlin. 1998. Finding the genes in genomic DN. Curr. Opin. Struct. Biol. 8, 346-354 https://doi.org/10.1016/S0959-440X(98)80069-9
Cawley, S.E., A.I. Wirth, and T.P. Speed. 2001. Phat–a gene finding program for Plasmodium falciparum. Mol. Biochem. Parasitol. 118, 167-174 https://doi.org/10.1016/S0166-6851(01)00363-2
Chechetkin, V.R. and A.Y. Turygin. 1995. Size-dependence of three-periodicity and long-range correlations in DNA sequences. Phys. Lett. A. 199, 75-80 https://doi.org/10.1016/0375-9601(95)00047-7
Do, J.H., M.J. Anderson, D.W. Denning, and E. Bornberg- Bauer. 2004. Inference of Aspergillus fumigatus pathways by comparative genome analysis: tricarboxylic acid cycle (TCA). J. Microbiol. Biotechnol. 14, 74-80
Do, J.H., T.K. Park, and D.-K. Choi. 2005a. A computational approach to the inference of sphingolipid pathways from the genome of Aspergillus fumigatus. Curr. Genet. 48, 134-141 https://doi.org/10.1007/s00294-005-0009-2
Do, J.H., B.Y. Lim, W.S. Choi, and D.-K. Choi. 2005b. Exploring the Phospholipid Biosynthetic Pathways of Aspergillus fumigatus by Computational Genome Analysis. Eng. Life Sci. 5(6). 574-579 https://doi.org/10.1002/elsc.200520102
Kim, K.B. and J.S. Sim. 2005. Computational detection of prokaryotic core promoters in genomic sequences. J. Microbiol. 43, 411-416
Fickett, J. 1982. Recognition of protein-coding regions in DNA sequences. Nucleic Acids Res. 10, 5303-5318 https://doi.org/10.1093/nar/10.17.5303
Fickett, J.W. and C.S. Tung. 1992. Assessment of protein coding measures. Nucleic Acids Res. 20, 6441-6450 https://doi.org/10.1093/nar/20.24.6441
Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science 269, 496-512 https://doi.org/10.1126/science.7542800
Flicek, P., E. Keibler, P. Hu, I. Korf, and M.R. Brent. 2003. Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. Genome Res. 13, 46-54 https://doi.org/10.1101/gr.830003
Florea, L., G. Hartzell, Z. Zhang, G.M. Rubin, and W. Miller. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8, 967-974 https://doi.org/10.1101/gr.8.9.967
Fukunishi, Y., H. Suzuki, M. Yoshino, H. Konno, and Y. Hayashizaki. 1999. Prediction of human cDNA from its homologous mouse full-length cDNA and human shotgun database. FEBS Lett. 464, 129-132 https://doi.org/10.1016/S0014-5793(99)01696-8
Gaasterland, T. and C.W. Sensen. 1996. Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78, 302-310 https://doi.org/10.1016/0300-9084(96)84761-4
Gribskov, M., J. Devereux, and R.R. Burgess. 1984. The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res. 12, 539-549 https://doi.org/10.1093/nar/12.1Part2.539
Guigo, R., P. Agarwal, J.F. Abril, M. Burset, and J.W. Fickett. 2000. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631-1642 https://doi.org/10.1101/gr.122800
Guo, F.B., H.Y. Ou, and C.T. Zhang. 2003. ZCURVE: a new system for recognizing protein-coding genes in bacterial and archael genomes. Nucleic Acids Res. 31, 1780-1789 https://doi.org/10.1093/nar/gkg254
Harris, N.L. 1997. Genotator: a workbench for sequence annotation. Genome Res. 7, 754-762 https://doi.org/10.1101/gr.7.7.754
Huang, X., M.D. Adams, H. Zhou, and A.R. Kerlavage. 1997. A tool for analyzing and annotating genomic sequences. Genomics 46, 37-45 https://doi.org/10.1006/geno.1997.4984
Hubbard, T., D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, et al. 2002. The Ensembl genome database project. Nucleic Acids Res. 30, 38-41 https://doi.org/10.1093/nar/30.1.38
Hutchinson, G.B. and M.R. Hayden. 1992. The prediction of exons through an analysis of spliceable open reading frames. Nucleic Acids Res. 20, 3453-3462 https://doi.org/10.1093/nar/20.13.3453
Juvvadi, P.R., Y. Seshime, and K. Kitamoto. 2005. Genomics reveals traces of fungal phenylpropanoid-flavonoid metabolic pathway in the filamentous fungus Aspergillus oryzae. J. Microbiol. 43(6). 475-486
Kleffe, J., K. Hermann, W. Vahrson, B. Wittig, and V. Brendel. 1996. Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences. Nucleic Acids Res. 24, 4709-4718 https://doi.org/10.1093/nar/24.23.4709
Kotlar, D. and Y. Lavner. 2003. Gene prediction by spectral rotation measure: a new method for identifying proteincoding regions. Genome Res. 13, 1930-1937
Krogh, A. 2000. Using database matches with HMMgene for automated gene detection in Drosophila. Genome Res. 10, 523-528 https://doi.org/10.1101/gr.10.4.523
Maniatis, T. and B. Tasic. 2002. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418. 236-243 https://doi.org/10.1038/418236a
Mathe, C., M-F. Sagot, T. Schiex, and P. Rouze. 2002. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 30, 4103-4117 https://doi.org/10.1093/nar/gkf543
Pedersen, A.G. and H. Nielsen. 1997. Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis, p. 226-233. In T. Gaasterland et al. (eds). The Fifth International Conference on Intelligence Systems for Molecular Biology. AAAI Press, Menlo Park, CA
Reese, M.G., D. Kulp, H. Tammana, and D. Haussler. 2000. Genie–gene finding in Drosophila melanogaster. Genome Res. 10, 529-538 https://doi.org/10.1101/gr.10.4.529
Robison, K., W. Gilbert, and G. Church. 1994. Large-scale bacterial gene discovery by similarity search. Nat. Genet. 7, 205-214 https://doi.org/10.1038/ng0694-205
Rogozin, I.B. and L. Milanesi. 1997. Analysis of donor splice signals in different organisms. J. Mol. Evol. 45, 50-59 https://doi.org/10.1007/PL00006200
Salamov, A.A. and V.V. Solovyev. 2000. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 391-393 https://doi.org/10.1101/gr.10.4.391
Salzberg, S., A. Delcher, S. Kasif, and O. White. 1998. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26, 544-548 https://doi.org/10.1093/nar/26.2.544
Salzberg, S.L., M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin. 1999. Interpolated Markov models for eukaryotic gene finding. Genomics 59, 24-31 https://doi.org/10.1006/geno.1999.5854
Schiex, T., A. Moisan, and P. Rouzé. 2001. EuGène: an eukaryotic gene finder that combines several sources of evidence, p. 111-125. In O. Gascuel and M.-F. Sagot (eds). Lecture Notes in Computer Science, Vol. 2006, First International Conference on Biology, Informatics, and Mathematics, JOBIM 2000. Springer-Verlag, Germany
Staden, R. 1984. Measurements of the effect that coding for a protein has on DNA sequence and their use for finding genes. Nucleic Acids Res. 12, 551-567 https://doi.org/10.1093/nar/12.1Part2.551
Staden, R. and A.D. McLachlan. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 10, 141-156 https://doi.org/10.1093/nar/10.1.141
Stormo, G.D. 2000. Gene-finding approaches for eukaryotes. Genome Res. 10, 394-397 https://doi.org/10.1101/gr.10.4.394
Takamatsu, K., K. Maekawa, T. Togashi, D.K. Choi, Y. Suzuki, T.D. Taylor et al. 2002. Identification of two novel primate-specific genes in DSCR. DNA Res. 9, 89-97 https://doi.org/10.1093/dnares/9.3.89
Togashi, T., D.K. Choi, T.D. Taylor, Y. Suzuki, S. Sugano, M. Hattor et al. 2000. A novel gene, DSCR5, from the distal Down syndrome critical region on chromosome 21q22.2. DNA Res. 7, 207-212 https://doi.org/10.1093/dnares/7.3.207
Tiwari, S., S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy. 1997. Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci. 113, 263-270
Tolstrup, N., P. Rouze, and S. Brunak. 1997. A branch point consensus from Arabidopsis found by non-circular analysis allows for better prediction of acceptor sites. Nucleic Acids Res. 25, 3159-3163 https://doi.org/10.1093/nar/25.15.3159
Trifonov, E.N. and J.L. Sussman. 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc. Natl. Acad. Sci. U.S.A. 77, 3816-3820
Usuka, J., W. Zhu, and V. Brendel. 2000. Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16, 203-211 https://doi.org/10.1093/bioinformatics/16.3.203
Voss, R. 1992. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68, 3805-3808 https://doi.org/10.1103/PhysRevLett.68.3805
Yada, T., T. Takagi, Y. Totoki, and Y. Sakaki. 2003. DIGIT: a novel gene finding program by combing gene-finders. Pac. Symp. Biocomput. 375-387
Zhang, C.T. and J. Wang. 2000. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res. 28, 2804-2814 https://doi.org/10.1093/nar/28.14.2804
Zhang, C.T. and R. Zhang. 1991. Analysis of distribution of bases in the coding sequences by a diagrammatic technique. Nucleic Acids Res. 19, 6313-6317 https://doi.org/10.1093/nar/19.22.6313
Zhang, R. and C.T. Zhang. 1994. Zcurves, an intuitive tool for visualizing and analyzing the DNA sequences. J. Biomol. Struct. Dyn. 11, 767-782 https://doi.org/10.1080/07391102.1994.10508031

Journal of Microbiology

Computational Approaches to Gene Prediction

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)