DOI QR코드

DOI QR Code

A Method for Identifying Splice Sites and Translation Start Sites in Human Genomic Sequences

  • Kim, Ki-Bong (Information Technology Institute, SmallSoft Co., Ltd.) ;
  • Park, Kie-Jung (Information Technology Institute, SmallSoft Co., Ltd.) ;
  • Kong, Eun-Bae (Department of Computer Engineering, Chungnam National University)
  • Published : 2002.09.30

Abstract

We describe a new method for identifying the sequences that signal the start of translation, and the boundaries between exons and introns (donor and acceptor sites) in human mRNA. According to the mandatory keyword, ORGANISM, and feature key, CDS, a large set of standard data for each signal site was extracted from the ASCII flat file, gbpri.seq, in the GenBank release 108.0. This was used to generate the scoring matrices, which summarize the sequence information for each signal site. The scoring matrices take into account the independent nucleotide frequencies between adjacent bases in each position within the signal site regions, and the relative weight on each nucleotide in proportion to their probabilities in the known signal sites. Using a scoring scheme that is based on the nucleotide scoring matrices, the method has great sensitivity and specificity when used to locate signals in uncharacterized human genomic DNA. These matrices are especially effective at distinguishing true and false sites.

Keywords

References

  1. Fickett. J. (1996) The gene identification problem: an overview for developers. Comput. Chem. 20, 103-118 https://doi.org/10.1016/S0097-8485(96)80012-X
  2. Green. M. R. (1986) PRE-mRNA SPLICING. Ann. Rev. Genetics 20. 671-708. https://doi.org/10.1146/annurev.ge.20.120186.003323
  3. Kozak, M. (1987) An analysis of 5-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 15, 8125-8148. https://doi.org/10.1093/nar/15.20.8125
  4. Kozak. M. (1992) A consideration of alternative models for the initiation of translation in eukaryotes. Crit. Rev. Biochem. Mol. BioI. 27. 385-402. https://doi.org/10.3109/10409239209082567
  5. Lee. S. H., Park. K. H., Kim, D. H., Choung, D. H., Suk, J. E., Kim, D. H., Chang, J., Sung, Y. C., Choi. K. Y. and Han K. (2001) Structural origin for the transcriptional activity of human p53. J. Biochem. Mol. Biol. 34, 73-79.
  6. Mount. S. (1996) AT-AC introns: An Attack on dogma. Science 271. 1690-1692 https://doi.org/10.1126/science.271.5256.1690
  7. Mount. S., Burks. C., Hertz, G., Stonno. G., White, O. and Fields, C. (1992) Splicing signals in drosophila: intron size, information content, and consensus sequences. Nucleic Acids Res. 20. 4255-4262. https://doi.org/10.1093/nar/20.16.4255
  8. Roytrakul, S., Eurwilaichitr. L., Suprasongsin C. and Panyim, S. (2001) A rapid and simple method for construction and expression of a synthetic human growth hormone gene in Escherichia coli. J. Biochem. Mol. BioI. 34, 502-508.
  9. Senathy. P., Shapiro, M. B. and Harris, N. L. (1990) Splice junctions, branch points. and exons: sequence statistics, identification, and applications to genome project. Methods Enzymol. 183. 252-278. https://doi.org/10.1016/0076-6879(90)83018-5
  10. Sharp, P. A. (1987) Splicing of Messenger RNA Precursors. Science 235. 766-771. https://doi.org/10.1126/science.3544217
  11. Snyder, E. E. and Stonno, G. D. (1995) Identification of coding regions in genomic DNA. J. Mol. Biol. 248. 1-18 https://doi.org/10.1006/jmbi.1995.0198
  12. Solovyev, V. V., Salamov. A. A. and Lawrence, C. B. (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22, 5156-5163. https://doi.org/10.1093/nar/22.24.5156
  13. Xu, Y., Mural, R., Einstein. J. R.. Shah, M. and Eberbacher. E. (1996) Grail: A multi-agent neural network system for gene identitication. Proc. IEEE 84, 1544-1552. https://doi.org/10.1109/5.537117

Cited by

  1. Pateamine A-sensitive ribosome profiling reveals the scope of translation in mouse embryonic stem cells vol.17, pp.1, 2016, https://doi.org/10.1186/s12864-016-2384-0