DOI QR코드

DOI QR Code

In Silico Functional Assessment of Sequence Variations: Predicting Phenotypic Functions of Novel Variations

  • Won, Hong-Hee (Samsung Biomedical Research Institute, Samsung Medical Center) ;
  • Kim, Jong-Won (Department of Laboratory Medicine and Genetics, Sungkyunkwan University School of Medicine, Samsung Medical Center)
  • Published : 2008.12.31

Abstract

A multitude of protein-coding sequence variations (CVs) in the human genome have been revealed as a result of major initiatives, including the Human Variome Project, the 1000 Genomes Project, and the International Cancer Genome Consortium. This naturally has led to debate over how to accurately assess the functional consequences of CVs, because predicting the functional effects of CVs and their relevance to disease phenotypes is becoming increasingly important. This article surveys and compares variation databases and in silico prediction programs that assess the effects of CVs on protein function. We also introduce a combinatorial approach that uses machine learning algorithms to improve prediction performance.

Keywords

References

  1. Amos, C.I., Wu, X., Broderick, P., et al. (2008). Genomewide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat. Genet. 40, 616-622 https://doi.org/10.1038/ng.109
  2. Bao, L., and Cui, Y. (2005). Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information. Bioinformatics 21, 2185-2190 https://doi.org/10.1093/bioinformatics/bti365
  3. Bao, L., Zhou, M., and Cui, Y. (2005). nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res. 33(Web Server issue), W480-482 https://doi.org/10.1093/nar/gki372
  4. Bromberg, Y., and Rost, B. (2007). SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 35, 3823-3835 https://doi.org/10.1093/nar/gkm238
  5. Bromberg, Y., and Rost, B. (2008a). Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics 24, i207-212 https://doi.org/10.1093/bioinformatics/btn268
  6. Bromberg, Y., Yachdav, G., and Rost, B. (2008b). SNAP predicts effect of mutations on protein function. Bioinformatics 24, 2397-2398 https://doi.org/10.1093/bioinformatics/btn435
  7. Campbell, P.J., Pleasance, E.D., Stephens, P.J., et al. (2008). Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl. Acad. Sci. U. S. A. 105, 13081-13086 https://doi.org/10.1073/pnas.0801523105
  8. Care, M.A., Needham, C.J., Bulpitt, A.J., and Westhead, D.R. (2007). Deleterious SNP prediction: be mindful of your training data! Bioinformatics 23, 664-672 https://doi.org/10.1093/bioinformatics/btl649
  9. Chasman, D., and Adams, R.M. (2001). Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol. 307, 683-706 https://doi.org/10.1006/jmbi.2001.4510
  10. Ferrer-Costa, C., Gelpi, J.L., Zamakola, L., et al. (2005). PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 21, 3176-3178 https://doi.org/10.1093/bioinformatics/bti486
  11. Ferrer-Costa, C., Orozco, M., and de la Cruz, X. (2002). Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J. Mol. Biol. 315, 771-786 https://doi.org/10.1006/jmbi.2001.5255
  12. Ferrer-Costa, C., Orozco, M., and de la Cruz, X. (2004). Sequence-based prediction of pathological mutations. Proteins 57, 811-819 https://doi.org/10.1002/prot.20252
  13. Frazer, K.A., Ballinger, D.G., Cox, D.R., et al. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851-861 https://doi.org/10.1038/nature06258
  14. Greenman, C., Stephens, P., Smith, R., et al. (2007). Patterns of somatic mutation in human cancer genomes. Nature 446, 153-158 https://doi.org/10.1038/nature05610
  15. Hamosh, A., Scott, A.F., Amberger, J.S., et al. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33(Database issue), D514-517 https://doi.org/10.1093/nar/gki033
  16. Han, J., Kraft, P., Nan, H., et al. (2008). A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet. 4, e1000074 https://doi.org/10.1371/journal.pgen.1000074
  17. Harley, J.B., Alarcon-Riquelme, M.E., Criswell, L.A., et al. (2008). Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat. Genet. 40, 204-210 https://doi.org/10.1038/ng.81
  18. Jiang, R., Yang, H., Zhou, L., et al. (2007). Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. Am. J. Hum. Genet. 81, 346-360 https://doi.org/10.1086/519747
  19. Jones, S., Zhang, X., Parsons, D.W., et al. (2008). Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801-1806 https://doi.org/10.1126/science.1164368
  20. Kawabata, T., Ota, M., and Nishikawa, K. (1999). The protein mutant database. Nucleic Acids Res. 27, 355-357 https://doi.org/10.1093/nar/27.1.355
  21. Kim, H.J., Sohn, K.M., Shy, M.E., et al. (2007). Mutations in PRPS1, which encodes the phosphoribosyl pyrophosphate synthetase enzyme critical for nucleotide biosynthesis, cause hereditary peripheral neuropathy with hearing loss and optic neuropathy (cmtx5). Am. J. Hum. Genet. 81, 552-558 https://doi.org/10.1086/519529
  22. Krawczak, M., Ball, E.V., Fenton, I., et al. (2000). Human gene mutation database-a biomedical information and research resource. Hum. Mutat. 15, 45-51 https://doi.org/10.1002/(SICI)1098-1004(200001)15:1<45::AID-HUMU10>3.0.CO;2-T
  23. Mailman, M.D., Feolo, M., Jin, Y., et al. (2007). The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181-1186 https://doi.org/10.1038/ng1007-1181
  24. Ng, P.C., and Henikoff, S. (2001). Predicting deleterious amino acid substitutions. Genome Res. 11, 863-874 https://doi.org/10.1101/gr.176601
  25. Ng, P.C., and Henikoff, S. (2002). Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12, 436-446 https://doi.org/10.1101/gr.212802
  26. Ng, P.C., and Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812-3814 https://doi.org/10.1093/nar/gkg509
  27. Ng, P.C., and Henikoff, S. (2006). Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7, 61-80 https://doi.org/10.1146/annurev.genom.7.080505.115630
  28. Porter, C.J., Talbot, C.C., and Cuticchia, A.J. (2000). Central mutation databases-a review. Hum. Mutat. 15, 36-44 https://doi.org/10.1002/(SICI)1098-1004(200001)15:1<36::AID-HUMU9>3.0.CO;2-D
  29. Ramensky, V., Bork, P., and Sunyaev, S. (2002). Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 30, 3894-3900 https://doi.org/10.1093/nar/gkf493
  30. Sherry, S.T., Ward, M.H., Kholodov, M., et al. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308-311 https://doi.org/10.1093/nar/29.1.308
  31. Sjoblom, T., Jones, S., Wood, L.D., et al. (2006). The consensus coding sequences of human breast and colorectal cancers. Science 314, 268-274 https://doi.org/10.1126/science.1133427
  32. Stenson, P.D., Ball, E., Howells, K., et al. (2008). Human Gene Mutation Database: towards a comprehensive central mutation database. J. Med. Genet. 45, 124-126 https://doi.org/10.1136/jmg.2007.055210
  33. Stenson, P.D., Ball, E.V., Mort, M., et al. (2003). Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577-581 https://doi.org/10.1002/humu.10212
  34. Sulem, P., Gudbjartsson, D.F., Stacey, S.N., et al. (2007). Genetic determinants of hair, eye and skin pigmentation in Europeans. Nat. Genet. 39, 1443-1452 https://doi.org/10.1038/ng.2007.13
  35. Sunyaev, S., Ramensky, V., and Bork, P. (2000). Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet. 16, 198-200 https://doi.org/10.1016/S0168-9525(00)01988-0
  36. Sunyaev, S., Ramensky, V., Koch, I., et al. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591-597 https://doi.org/10.1093/hmg/10.6.591
  37. Tenesa, A., Farrington, S.M., Prendergast, J.G., et al. (2008). Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631-637 https://doi.org/10.1038/ng.133
  38. Thomas, P.D., Campbell, M.J., Kejariwal, A., et al. (2003). PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129-2141 https://doi.org/10.1101/gr.772403
  39. Thomas, P.D., and Kejariwal, A. (2004). Coding single- nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. Proc. Natl. Acad. Sci. U. S. A. 101, 15398-15403 https://doi.org/10.1073/pnas.0404380101
  40. Won, H.H., Kim, H.J., Lee, K.A., and Kim, J.W. (2008). Cataloging coding sequence variations in human genome databases. PLoS ONE 3, e3575 https://doi.org/10.1371/journal.pone.0003575
  41. WTCCC. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-678 https://doi.org/10.1038/nature05911
  42. Yip, Y.L., Scheib, H., Diemand, A.V., et al. (2004). The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum. Mutat. 23, 464-470 https://doi.org/10.1002/humu.20021