DOI QR코드

DOI QR Code

Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions

  • Yang, Long (Tobacco Laboratory, Shandong Agricultural University) ;
  • Cho, Hwan-Gue (Graphics Application Laboratory, Department of Computer Science and Engineering, Pusan National University)
  • Received : 2012.02.02
  • Accepted : 2012.02.17
  • Published : 2012.03.31

Abstract

Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and $Arabidopsis$) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in $Ostreococcus$ $lucimarinus$ was excessive (47.7%), while in $Ostreococcus$ $tauri$, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.

Keywords

References

  1. Purdom E, Simpson KM, Robinson MD, Conboy JG, Lapuk AV, Speed TP. FIRMA: a method for detection of alternative splicing from exon array data. Bioinformatics 2008;24:1707-1714. https://doi.org/10.1093/bioinformatics/btn284
  2. Zhou L, Pertea M, Delcher AL, Florea L. Sim4cc: a cross-species spliced alignment program. Nucleic Acids Res 2009;37:e80. https://doi.org/10.1093/nar/gkp319
  3. Harrington ED, Bork P. Sircah: a tool for the detection and visualization of alternative transcripts. Bioinformatics 2008;24:1959-1960. https://doi.org/10.1093/bioinformatics/btn361
  4. Rambaldi D, Felice B, Praz V, Bucher P, Cittaro D, Guffanti A. Splicy: a web-based tool for the prediction of possible alternative splicing events from Affymetrix probeset data. BMC Bioinformatics 2007;8 Suppl 1:S17. https://doi.org/10.1186/1471-2105-8-S1-S17
  5. Mitchell RA, Castells-Brooke N, Taubert J, Verrier PJ, Leader DJ, Rawlings CJ. Wheat Estimated Transcript Server (WhETS): a tool to provide best estimate of hexaploid wheat transcript sequence. Nucleic Acids Res 2007;35:W148-W151. https://doi.org/10.1093/nar/gkm220
  6. Lazzarato F, Franceschinis G, Botta M, Cordero F, Calogero RA. RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets. Bioinformatics 2004;20:2848-2850. https://doi.org/10.1093/bioinformatics/bth287
  7. Milanesi L, Rogozin IB. ESTMAP: a system for expressed sequence tags mapping on genomic sequences. IEEE Trans Nanobioscience 2003;2:75-78. https://doi.org/10.1109/TNB.2003.813928
  8. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 2010;38:e178. https://doi.org/10.1093/nar/gkq622
  9. Dimon MT, Sorber K, DeRisi JL. HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. PLoS One 2010;5:e13875. https://doi.org/10.1371/journal.pone.0013875
  10. Foissac S, Bardou P, Moisan A, Cros MJ, Schiex T. EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 2003;31:3742-3745. https://doi.org/10.1093/nar/gkg586
  11. Kent WJ. BLAT: the BLAST-like alignment tool. Genome Res 2002;12:656-664.
  12. Lee C, Atanelov L, Modrek B, Xing Y. ASAP: the Alternative Splicing Annotation Project. Nucleic Acids Res 2003;31:101-105. https://doi.org/10.1093/nar/gkg029
  13. Odenwald WF, Rasband W, Kuzin A, Brody T. EVOPRINTER, a multigenomic comparative tool for rapid identification of functionally important DNA. Proc Natl Acad Sci U S A 2005;102:14700-14705. https://doi.org/10.1073/pnas.0506915102
  14. Castrignano T, De Meo PD, Grillo G, Liuni S, Mignone F, Talamo IG, et al. GenoMiner: a tool for genome-wide search of coding and non-coding conserved sequence tags. Bioinformatics 2006;22:497-499. https://doi.org/10.1093/bioinformatics/bti754
  15. Tamaki S, Arakawa K, Kono N, Tomita M. Restauro-G: a rapid genome re-annotation system for comparative genomics. Genomics Proteomics Bioinformatics 2007;5:53-58. https://doi.org/10.1016/S1672-0229(07)60014-X
  16. Kent WJ, Zahler AM. The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans . Nucleic Acids Res 2000;28:91-93. https://doi.org/10.1093/nar/28.1.91
  17. Irimia M, Roy SW. Spliceosomal introns as tools for genomic and evolutionary analysis. Nucleic Acids Res 2008;36:1703-1712. https://doi.org/10.1093/nar/gkn012
  18. Roy SW, Penny D. Intron length distributions and gene prediction. Nucleic Acids Res 2007;35:4737-4742. https://doi.org/10.1093/nar/gkm281
  19. Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia- Hernandez M, Foerster H, et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 2008;36:D1009-D1014.
  20. Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica ). Science 2002;296:92-100. https://doi.org/10.1126/science.1068275
  21. Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica) . Science 2002;296:79-92. https://doi.org/10.1126/science.1068037
  22. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 maize genome: complexity, diversity, and dynamics. Science 2009;326:1112-1115. https://doi.org/10.1126/science.1178534
  23. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, et al. The Sorghum bicolor genome and the diversification of grasses. Nature 2009;457:551-556. https://doi.org/10.1038/nature07723
  24. Han YH, Zhang ZH, Liu JH, Lu JY, Huang SW, Jin WW. Distribution of the tandem repeat sequences and karyotyping in cucumber (Cucumis sativus L.) by fluorescence in situ hybridization. Cytogenet Genome Res 2008;122:80-88. https://doi.org/10.1159/000151320
  25. Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman GB, et al. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 2007;318:245-250. https://doi.org/10.1126/science.1143609
  26. Palenik B, Grimwood J, Aerts A, Rouze P, Salamov A, Putnam N, et al. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A 2007;104:7705-7710. https://doi.org/10.1073/pnas.0611046104
  27. Young ND, Cannon SB, Sato S, Kim D, Cook DR, Town CD, et al. Sequencing the genespaces of Medicago truncatula and Lotus japonicus . Plant Physiol 2005;137:1174-1181. https://doi.org/10.1104/pp.104.057034
  28. Lanier W, Moustafa A, Bhattacharya D, Comeron JM. EST analysis of Ostreococcus lucimarinus , the most compact eukaryotic genome, shows an excess of introns in highly expressed genes. PLoS One 2008;3:e2171. https://doi.org/10.1371/journal.pone.0002171
  29. Derelle E, Ferraz C, Rombauts S, Rouze P, Worden AZ, Robbens S, et al. Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc Natl Acad Sci U S A 2006;103: 11647-11652. https://doi.org/10.1073/pnas.0604795103