Browse > Article

A Statistical Analysis of SNPs, In-Dels, and Their Flanking Sequences in Human Genomic Regions  

Shin, Seung-Wook (Interdisciplinary Program in Bioinformatics, Seoul National University)
Kim, Young-Joo (Functional Genomics Research Center, Korea Research Institute of Bioscience and Biotechnology)
Kim, Byung-Dong (Interdisciplinary Program in Bioinformatics, Seoul National University)
Abstract
Due to the increasing interest in SNPs and mutational hot spots for disease traits, it is becoming more important to define and understand the relationship between SNPs and their flanking sequences. To study the effects of flanking sequences on SNPs, statistical approaches are necessary to assess bias in SNP data. In this study we mainly applied Markov chains for SNP sequences, particularly those located in intronic regions, and for analysis of in-del data. All of the pertaining sequences showed a significant tendency to generate particular SNP types. Most sequences flanking SNPs had lower complexities than average sequences, and some of them were associated with microsatellites. Moreover, many Alu repeats were found in the flanking sequences. We observed an elevated frequency of single-base-pair repeat-like sequences, mirror repeats, and palindromes in the SNP flanking sequence data. Alu repeats are hypothesized to be associated with C-to-T transition mutations or A-to-I RNA editing. In particular, the in-del data revealed an association between particular changes such as palindromes or mirror repeats. Results indicate that the mechanism of induction of in-del transitions is probably very different from that which is responsible for other SNPs. From a statistical perspective, frequent DNA lesions in some regions probably have effects on the occurrence of SNPs.
Keywords
single nucleotide polymorphisms; SNPs; Intron; Markov chain;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Asicioglu, F., Oguz-Savran, F., and Ozbek, U. (2004). Mutation rate at commonly used forensic STR loci: paternity testing experience. Dis. Markers. 20, 313-315   DOI
2 Brendel, V., Beckman, J. S., and Trifonov, E. N. (1986). E. N. Linguistics of nucleotide sequences. J. Biomol. Struct. Dyn. 4, 11-21   DOI   ScienceOn
3 Flomen, R., Knight, J., Sham, P., Kerwin, R., and Makoff, A. (2004). Evidence that RNA editing modulates splice site selection in the 5-HT2C receptor gene. Nucleic Acids Res. 32, 2113-2122   DOI   ScienceOn
4 Hellmann-Blumberg, U., McCarthy Hintz, M. F., Gatewood, J. M., and Schmid, C. W. (1993). Developmental differences in methylation of human Alu repeats. Mol. Cell. Biol. 13, 4523-4530   DOI
5 Robertson, K. D. and Jones, P. A. (2000). DNA methylation: past, present and future. Carcinogenesis 21, 461-467   DOI   ScienceOn
6 Roos, D., de Boer, M., Kuribayashi, F., Meischl, C., Weening, R. S., Segal, A. W., Ahlin, A., Nemet, K., Hossle, J. P., Bernatowska-Matuszkiewicz, E., and Middleton-Price, H. (1996). Mutations in the X-linked and autosomal recessive forms of chronic granulomatous disease. Blood 87, 1663-1681
7 Wang, G. and Vasquez, K. M. (2004). Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells. Proc. Natl. Acad. Sci. USA 101, 13448-13453
8 Wolfe, K. H., Sharp, P. M., and Li, W. H. (1989). Mutation rates vary among regions of the mammalian genome. Nature 337, 283-285   DOI   ScienceOn
9 Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. (2000). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 28, 10-14   DOI   ScienceOn
10 Leung, M.-Y., Marsh, G. M., and Speed, T. P. (1996). Overand underrepresentation of short DNA words in herpesvirus genomes. J. Comput. Biol. 3, 345-360   DOI   ScienceOn
11 Levinson, G. and Gutman, G. A. (1987). Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4, 203-221
12 Kim, D. D., Kim, T. T., Walsh, T., Kobayashi, Y., Matise, T. C., Buyske, S., and Gabriel, A. (2004). Widespread RNA editing of embedded Alu elements in the human transcriptome. Genome Res. 14, 1719-1725   DOI   ScienceOn
13 Rocha, E. P. C., Viari, A., and Danchin, A. (1998). Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. Nucleic Acids Res. 26, 2971-2980   DOI   ScienceOn
14 Schbath, S., Prum, B., and Turckheim, É. (1995). Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol. 2, 417-437   DOI   ScienceOn
15 Brinkmann, B., Klintschar, M., Neuhuber, F., Hu Hne, J., and Rolf, B. (1998). Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62, 1408-1415   DOI   ScienceOn
16 Levanon, E. Y., Eisenberg, E., Yelin, R., Nemzer, S., Hallegger, M., Shemesh, R., Fligelman, Z. Y., Shoshan, A., Pollock, S.R., Sztybel, D., Olshansky, M., Rechavi, G., and Jantsch, M. F. (2004). Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001-1005   DOI   ScienceOn
17 Taylor, J. G., Choi, E. H., Foster, C. B., and Chanock, S. J. (2001). Using genetic variation to study human disease. Trends Mol. Med. 7, 507-512   DOI   ScienceOn
18 Vasquez, K. M., Christensen, J., Li, L., Finch, R. A., and Glazer, P. M. (2002). Human XPA and RPA DNA repair proteins participate in specific recognition of triplex-induced helical distortions. Proc. Natl. Acad. Sci. USA 99, 5848-5853
19 Eisenberg, E., Adamsky, K., Cohen, L., Amariglio, N., Hirshberg, A., Rechavi, G., and Levanon, E. Y. (2005). Identification of RNA editing sites in the SNP Database Eisenberg. Nucleic Acids Res. 33, 4612-4617   DOI   ScienceOn
20 Batzer, M. A., Rubin, C. M., Hellmann-Blumberg, U., Alegria-Hartman, M., Leeflang, E. P., Stern, J. D., Bazan, H. A., Shaikh, T. H., Deininger, P. L., and Schmid, C. W. (1995). Dispersion and insertion polymorphism in two small subfamilies of recently amplified human Alu repeats. J. Mol. Biol. 247, 418-427   DOI   ScienceOn
21 Batzer, M.A., Deininger, P.L., Hellmann-Blumberg, U., Jurka, J., Labuda, D., Rubin, C.M., Schmid, C.W., Zietkiewicz, E., and Zuckerkandl, E. (1996). Standardized nomenclature for Alu repeats. J. Mol. Evol. 42, 3-6   DOI
22 Jurka, J. (1997). Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl. Acad. Sci. USA 94, 1872-1877
23 Francino, M. P. and Ochman, H. (1997). Strand asymmetries in DNA evolution. Trends Genet. 13, 240-245   DOI   ScienceOn
24 International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921   DOI   ScienceOn
25 Schbath, S. (1997). An efficient statistic to detect over- and under-represented words in DNA sequences. J. Comput. Biol. 4, 189-192   DOI   ScienceOn
26 Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M., and Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308-311   DOI   ScienceOn
27 The International SNP Map Working Group. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928-933   DOI   ScienceOn
28 Belotserkovskii, B. P., Krasilnikova, M. M., Veselkov, A. G., and Frank-Kamenetskii, M. D. (1992). Kinetic trapping of H-DNA by oligonucleotide binding. Nucleic Acids Res. 20, 1903-1908   DOI   ScienceOn
29 Ikehata, H., Nakamura, S., Asamura, T., and Ono, T. (2004). Mutation spectrum in sunlight-exposed mouse skin epidermis: Small but appreciable contribution of oxidative stress-mediated mutagenesis. Mutat. Res. 556, 11-24   DOI
30 McCarthy, J. G. and Rich, A. (1991). Detection of an unusual distortion in A-tract DNA using KMnO4: effect of temperature and distamycin on the altered conformation. Nucleic Acids Res. 19, 3421-3429   DOI   ScienceOn
31 Mirkin, S. M., Lyamichev, V. I., Drushlyak, K. N., Dobrynin, V. N., Filippov, S. A., and Frank-Kamenetskii, M. D. (1987). DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature 330, 495-497   DOI   ScienceOn
32 Burns, D. P. and Temin, H. M. (1994). High rates of frameshift mutations within homo-oligomeric runs during a single cycle of retroviral replication. J. Virol. 68, 4196-4203
33 Liu, Z., Sun, H. X., Zhang, Y. W., Li, Y. F., Zuo, J., Meng, Y., and Fang, F. D. (2004). Effect of SNPs in protein kinase Cz gene on gene expression in the reporter gene detection system. World J.Gastroenterol. 10, 2357-2360   DOI
34 Zingg, J. M. and Jones, P. A. (1997). Genetic and epigenetic aspects of DNA methylation on genome expression, evolution, mutation and carcinogenesis. Carcinogenesis 18, 869-882   DOI   ScienceOn
35 Burge, C., Campbell, A. M., and Karlin, S. (1992). Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA 89, 1358-1362
36 Knight, A., Batzer, M. A., Stoneking, M., Tiwari, H. K., Scheer, W. D., Herrera, R. J., and Deininger, P. L. (1996). DNA sequences of Alu elements indicate a recent replacement of the human autosomal genetic complement. Proc. Natl.Acad. Sci. USA 93, 4360-4364
37 Kim, B. D. (1985). Four-stranded DNA: An intermediate of homologous recombination and transposition. Kor. J. Breed. 17, 453-466