Prediction of ORFs in Metagenome by Using Cis-acting Transcriptional and Translational Factors

메타게놈 서열에 존재하는 보존적인 전사와 번역 인자를 이용한 ORF 예측

  • Cheong, Dea-Eun (Department of Biological Sciences, College of Natural Sciences, Chonnam National University) ;
  • Kim, Geun-Joong (Department of Biological Sciences, College of Natural Sciences, Chonnam National University)
  • Received : 2010.09.10
  • Accepted : 2010.10.24
  • Published : 2010.10.31

Abstract

As sequencing technologies are steadily improving, massive sequence data have been accumulated in public databases. Thereby, programs based on various algorithms are developed to mine useful information, such as genes, operons and regulatory factors,from these sequences. However, despite its usefulness in a wide range of applications, comprehensive analyses of metagenome using these programs have some drawbacks, thereby yielding inaccurate or complex results. We here provide a possibility of signature sequences (cis-acting transcriptional and translational factors of metagenome) as a hallmark of ORFs finding from metagenome.

미생물은 지구상에 약 $5\;{\times}\;10^{30}$ 정도의 개체가 존재하며, 350~550 Pg (1Pg = 1015g)의 탄소, 85~130 Pg의 질소, 9~14 Pg의 인 등, 지구상의 어떠한 생물 종보다 거대한 양의 원소를 포함하고 있다. 또한 이러한 미생물과 생태계를 구성하는 다른 유기체나 무기물과의 관계가 지속적으로 밝혀지고 있다. 이러한 연구들의 기본적인 목표는 상호작용에 중요한 인자들의 규명 (대표적으로 유전자)하는 것이기 때문에, 염색체에 존재하는 true ORF의 검색과 확인은 가장 중요한 기본 수단이 된다. 그러나 다양한 미생물로 구성된 환경 유전체는 기존 정보로 검색 가능한 비율을 정확하게 유추할 수 없기에 많은 어려움이 있다. 이렇게 경계가 불분명한 자료의 검색을 위해서는 보다 많은 정보를 필요 (training이나 space를 규정하기 위한 보다 많은 유전자 서열)로 하며, 다른 검색 방법이나 기법들이 추가적으로 개발되어야 할 것이다. 이러한 방법의 대안으로써, 미생물의 유전자간 서열에 존재하는 전사/번역인자의 보존성에 근거한 검색방법은 개량 여하에 따라 광범위한 적용 범위를 지닐 것이다. 현 수준에서도 조합 탐색, 즉 기존의 방법과 혼용하거나 기존의 방법을 보완하는 과정으로 충분한 가치를 지니고 있다. 이러한 추정은, 기존의 ORF 중심의 발굴 결과와 전혀 일치되지 않는 경우에서부터 90% 이상 일치하는 등의 결과로서 확인하였다. 일치 되지 않는 많은 경우가 BLASTing으로 검색되지 않는 새로운 ORF를 포함하기 때문이다.

Keywords

References

  1. Sanger, F. and A. R. Coulson (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94: 441-446. https://doi.org/10.1016/0022-2836(75)90213-2
  2. Sanger, F., S. Nicklen, and A. R. Coulson (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74: 5463-5467. https://doi.org/10.1073/pnas.74.12.5463
  3. Sorek, R., Y. Zhu, C. J. Creevey, M. P. Francino, P. Bork, and E. M. Rubin (2007) Genome-Wide Experimental Determination of Barriers to Horizontal Gene Transfer. Science 318: 1449-1452. https://doi.org/10.1126/science.1147112
  4. Picardi, E. and G. Pesole (2010) Computational Methods for Ab Initio and Comparative Gene Finding. pp. 269-284 In: Carugo, O., and F. Eisenhaber (eds.). Data Mining Techniques for the Life Science. Humana Press, New York.
  5. Pinard, R., A. de Winter, G. Sarkis, M. Gerstein, K. Tartaro, R. Plant, M. Egholm, J. Rothberg, and J. Leamon (2006) Assessment of whole genome amplificationinduced bias through high-throughput, massively parallel whole genome sequencing. BMC Genomics. 7: 216. https://doi.org/10.1186/1471-2164-7-216
  6. Handelsman, J., M. R. Rondon, S. F. Brady, J. Clardy, and R. M. Goodman (1998) Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem. Biol. 5: R245-R249. https://doi.org/10.1016/S1074-5521(98)90108-9
  7. Batzoglou, S., D. B. Jaffe, K. Stanley, J. Butler, S. Gnerre, E. Mauceli, B. Berger, J. P. Mesirov, and E. S. Lander (2002) ARACHNE: A Whole-Genome Shotgun Assembler. Genome Res. 12: 177-189. https://doi.org/10.1101/gr.208902
  8. Aparicio, S., J. Chapman, E. Stupka, N. Putnam, J.-m. Chia, P. Dehal, A. Christoffels, S. Rash, S. Hoon, A. Smit, M. D. S. Gelpke, J. Roach, T. Oh, I. Y. Ho, M. Wong, C. Detter, F. Verhoef, P. Predki, A. Tay, S. Lucas, P. Richardson, S. F. Smith, M. S. Clark, Y. J. K. Edwards, N. Doggett, A. Zharkikh, S. V. Tavtigian, D. Pruss, M. Barnstead, C. Evans, H. Baden, J. Powell, G. Glusman, L. Rowen, L. Hood, Y. H. Tan, G. Elgar, T. Hawkins, B. Venkatesh, D. Rokhsar, and S. Brenner (2002) Whole- Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes. Science 297: 1301-1310. https://doi.org/10.1126/science.1072104
  9. Myers, E. W., G. G. Sutton, A. L. Delcher, I. M. Dew, D. P. Fasulo, M. J. Flanigan, S. A. Kravitz, C. M. Mobarry, K. H. Reinert, nbsp, J., K. A. Remington, E. L. Anson, R. A. Bolanos, H.-H. Chou, C. M. Jordan, A. L. Halpern, S. Lonardi, E. M. Beasley, R. C. Brandon, L. Chen, P. J. Dunn, Z. Lai, Y. Liang, D. R. Nusskern, M. Zhan, Q. Zhang, X. Zheng, G. M. Rubin, M. D. Adams, and J. C. Venter (2000) A Whole-Genome Assembly of Drosophila. Science 287: 2196-2204. https://doi.org/10.1126/science.287.5461.2196
  10. Chaisson, M. J. and P. A. Pevzner (2008) Short read fragment assembly of bacterial genomes. Genome Res. 18: 324-330. https://doi.org/10.1101/gr.7088808
  11. Pevzner, P. A., H. Tang, and M. S. Waterman (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA. 98: 9748-9753. https://doi.org/10.1073/pnas.171285098
  12. Mavromatis, K., N. Ivanova, K. Barry, H. Shapiro, E. Goltsman, A. C. McHardy, I. Rigoutsos, A. Salamov, F. Korzeniewski, M. Land, A. Lapidus, I. Grigoriev, P. Richardson, P. Hugenholtz, and N. C. Kyrpides (2007) Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Meth. 4: 495-500. https://doi.org/10.1038/nmeth1043
  13. Altschul, S., T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25: 3389-3402. https://doi.org/10.1093/nar/25.17.3389
  14. Azad, R. K. and M. Borodovsky (2004) Probabilistic methods of identifying genes in prokaryotic genomes: Connections to the HMM theory. Brief. Bioinformatics 5: 118-130. https://doi.org/10.1093/bib/5.2.118
  15. Yooseph, S., W. Li, and G. Sutton (2008) Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering. BMC Bioinformatics 9: 182. https://doi.org/10.1186/1471-2105-9-182
  16. Yooseph, S., G. Sutton, D. B. Rusch, A. L. Halpern, S. J. Williamson, K. Remington, J. A. Eisen, K. B. Heidelberg, G. Manning, W. Li, L. Jaroszewski, P. Cieplak, C. S. Miller, H. Li, S. T. Mashiyama, M. P. Joachimiak, C. van Belle, J.-M. Chandonia, D. A. Soergel, Y. Zhai, K. Natarajan, S. Lee, B. J. Raphael, V. Bafna, R. Friedman, S. E. Brenner, A. Godzik, D. Eisenberg, J. E. Dixon, S. S. Taylor, R. L. Strausberg, M. Frazier, and J. C. Venter (2007) The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families. PLoS Biol. 5: e16. https://doi.org/10.1371/journal.pbio.0050016
  17. Epshtein, V., D. Dutta, J. Wade, and E. Nudler (2010) An allosteric mechanism of Rho-dependent transcription termination. Nature 463: 245-249. https://doi.org/10.1038/nature08669
  18. Salis, H. M., E. A. Mirsky, and C. A. Voigt (2009) Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27: 946-950. https://doi.org/10.1038/nbt.1568
  19. Koo, B. M., V. A. Rhodius, G. Nonaka, P. L. deHaseth, and C. A. Gross (2009) Reduced capacity of alternative sigmas to melt promoters ensures stringent promoter recognition. Genes Dev. 23: 2426-2436. https://doi.org/10.1101/gad.1843709
  20. Young, B. A., T. M. Gruber, and C. A. Gross (2004) Minimal machinery of RNA polymerase holoenzyme sufficient for promoter melting. Science 303: 1382-1384. https://doi.org/10.1126/science.1092462
  21. Huerta, A. M. and J. Collado-Vides (2003) Sigma70 Promoters in Escherichia coli: Specific Transcription in Dense Regions of Overlapping Promoter-like Signals. J. Mol. Biol. 333: 261-278. https://doi.org/10.1016/j.jmb.2003.07.017
  22. Venter, J. C., K. Remington, J. F. Heidelberg, A. L. Halpern, D. Rusch, J. A. Eisen, D. Wu, I. Paulsen, K. E. Nelson, W. Nelson, D. E. Fouts, S. Levy, A. H. Knap, M. W. Lomas, K. Nealson, O. White, J. Peterson, J. Hoffman, R. Parsons, H. Baden-Tillson, C. Pfannkoch, Y.-H. Rogers, and H. O. Smith (2004) Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304: 66-74. https://doi.org/10.1126/science.1093857
  23. Rusch, D. B., A. L. Halpern, G. Sutton, K. B. Heidelberg, S. Williamson, S. Yooseph, D. Wu, J. A. Eisen, J. M. Hoffman, K. Remington, K. Beeson,B. Tran, H. Smith, H. Baden-Tillson, C. Stewart, J. Thorpe, J. Freeman, C. Andrews-Pfannkoch, J. E. Venter, K. Li, S. Kravitz, J. F. Heidelberg, T. Utterback, Y.-H. Rogers, L. I. Falcón, V. Souza, G. Bonilla-Rosso, L. E. Eguiarte, D. M. Karl, S. Sathyendranath, T. Platt, E. Bermingham, V. Gallardo, G. Tamayo-Castillo, M. R. Ferrari, R. L. Strausberg, K. Nealson, R. Friedman, M. Frazier, and J. C. Venter (2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol. 5: e77. https://doi.org/10.1371/journal.pbio.0050077
  24. Walsh, D. A., E. Zaikova, C. G. Howes, Y. C. Song, J. J. Wright, S. G. Tringe, P. D. Tortell, and S. J. Hallam (2009) Metagenome of a Versatile Chemolithoautotroph from Expanding Oceanic Dead Zones. Science 326: 578-582. https://doi.org/10.1126/science.1175309
  25. Kanhere, A. and M. Bansal (2005) A novel method for prokaryotic promoter prediction based on DNA stability. BMC Bioinformatics 6: 1. https://doi.org/10.1186/1471-2105-6-1
  26. Reese, M. G. (2001) Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26: 51-56. https://doi.org/10.1016/S0097-8485(01)00099-7
  27. Galperin, M. Y. and E. V. Koonin (2010) From complete genome sequence to 'complete' understanding? Trends Biotechnol. 28: 398-406. https://doi.org/10.1016/j.tibtech.2010.05.006
  28. Whitman, W. B., D. C. Coleman, and W. J. Wiebe (1998) Prokaryotes: The unseen majority. Proc. Natl. Acad. Sci. USA. 95: 6578-6583. https://doi.org/10.1073/pnas.95.12.6578