Calibrating Thresholds to Improve the Detection Accuracy of Putative Transcription Factor Binding Sites

  • Kim, Young-Jin (Center for Genome Science, National Institute of Health, KCDC) ;
  • Ryu, Gil-Mi (Center for Genome Science, National Institute of Health, KCDC) ;
  • Park, Chan (College of Pharmacy, Seoul National University) ;
  • Kim, Kyu-Won (College of Pharmacy, Seoul National University) ;
  • Oh, Berm-Seok (Center for Genome Science, National Institute of Health, KCDC) ;
  • Kim, Young-Youl (Center for Genome Science, National Institute of Health, KCDC) ;
  • Gu, Man-Bok (School of Life Science & Biotechnology, Korea University)
  • Published : 2007.12.31

Abstract

To understand the mechanism of transcriptional regulation, it is essential to detect promoters and regulatory elements. Various kinds of methods have been introduced to improve the prediction accuracy of regulatory elements. Since there are few experimentally validated regulatory elements, previous studies have used criteria based solely on the level of scores over background sequences. However, selecting the detection criteria for different prediction methods is not feasible. Here, we studied the calibration of thresholds to improve regulatory element prediction. We predicted a regulatory element using MATCH, which is a powerful tool for transcription factor binding site (TFBS) detection. To increase the prediction accuracy, we used a regulatory potential (RP) score measuring the similarity of patterns in alignments to those in known regulatory regions. Next, we calibrated the thresholds to find relevant scores, increasing the true positives while decreasing possible false positives. By applying various thresholds, we compared predicted regulatory elements with validated regulatory elements from the Open Regulatory Annotation (ORegAnno) database. The predicted regulators by the selected threshold were validated through enrichment analysis of muscle-specific gene sets from the Tissue-Specific Transcripts and Genes (T-STAG) database. We found 14 known muscle-specific regulators with a less than a 5% false discovery rate (FDR) in a single TFBS analysis, as well as known transcription factor combinations in our combinatorial TFBS analysis.

Keywords

References

  1. Alkema, W.B., Johansson, O., Lagergren, J., and Wasserman, W.W. (2004). MSCAN: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res. 32(Web Server issue), W195-8. https://doi.org/10.1093/nar/gkh387
  2. Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G. M., and Eisen, M.B. (2002). Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci USA 99(2), 757-62
  3. Bluthgen, N., Kielbasa, S.M., and Herzel, H. (2005). Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res. 33(1), 272-9 https://doi.org/10.1093/nar/gki167
  4. Cohen, C. D., Klingenhoff, A., Boucherot, A., Nitsche, A., Henger, A., Brunner, B., Schmid, H., Merkle, M., Saleem, M.A., Koller, K.P., Werner, T., Grone, H.J., Nelson, P.J., and Kretzler, M. (2006). Comparative promoter analysis allows de novo identification of specialized cell junctionassociated proteins. Proc Natl Acad Sci USA 103(15), 5682-7
  5. Conway, K., Pin, C., Kiernan, J.A., and Merrifield, P. (2004). The E protein HEB is preferentially expressed in developing muscle. Differentiation 72(7), 327-40 https://doi.org/10.1111/j.1432-0436.2004.07207004.x
  6. Dennis, G. Jr., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C., and Lempicki, R.A. (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4(5), P3 https://doi.org/10.1186/gb-2003-4-5-p3
  7. Dottori, M., Gross, M.K., Labosky, P., and Goulding, M. (2001). The winged-helix transcription factor Foxd3 suppresses interneuron differentiation and promotes neural crest cell fate. Development. 128(21), 4127-38
  8. Fickett, J.W. (1996). Quantitative discrimination of MEF2 sites. Mol Cell Biol. 16(1), 437-41 https://doi.org/10.1128/MCB.16.1.437
  9. Fukami-Kobayashi, K., and Saito, N. (2002). How to make good use of CLUSTALW. Tanpakushitsu Kakusan Koso 47(9), 1237-9
  10. Guo, C.S., Degnin, C., Fiddler, T.A., Stauffer, D., and Thayer, M.J. (2003). Regulation of MyoD activity and muscle cell differentiation by MDM2, pRb, and Sp1. J Biol Chem 278(25), 22615-22 https://doi.org/10.1074/jbc.M301943200
  11. Gupta, S., Vingron, M., and Haas, S.A. (2005). T-STAG: resource and web-interface for tissue-specific transcripts and genes. Nucleic Acids Res. 33(Web Server issue), W654-8 https://doi.org/10.1093/nar/gki350
  12. Halfon, M.S., Grad, Y., Church, G.M., and Michelson, A.M., (2002). Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12(7), 1019-28
  13. Hinrichs, A. S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., Hillman-Jackson, J., Kuhn, R.M., Pedersen, J.S., Pohl, A., Raney, B.J., Rosenbloom, K.R., Siepel, A., Smith, K.E., Sugnet, C.W., Sultan-Qurraie, A., Thomas, D.J., Trumbower, H., Weber, R.J., Weirauch, M., Zweig, A.S., Haussler, D., and Kent, W.J. (2006). The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34(Database issue), D590-8 https://doi.org/10.1093/nar/gkj144
  14. Ho Sui, S.J., Mortimer, J.R., Arenillas, D.J., Brumm, J., Walsh, C.J., Kennedy, B.P., and Wasserman, W.W. (2005). oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 33(10), 3154-64 https://doi.org/10.1093/nar/gki624
  15. Kel, A., Konovalova, T., Waleev, T., Cheremushkin, E., Kel-Margoulis, O., and Wingender, E. (2006). Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics. 22(10), 1190-7 https://doi.org/10.1093/bioinformatics/btl041
  16. King, D.C., Taylor, J., Elnitski, L., Chiaromonte, F., Miller, W., and Hardison, R.C. (2005). Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 15(8), 1051-60 https://doi.org/10.1101/gr.3642605
  17. Knoepfler, P.S., Bergstrom, D.A., Uetsuki, T., Dac-Korytko, I., Sun, Y.H., Wright, W.E., Tapscott, S.J., and Kamps, M.P. (1999). A conserved motif N-terminal to the DNAbinding domains of myogenic bHLH transcription factors mediates cooperative DNA binding with pbx-Meis1/Prep1. Nucleic Acids Res. 27(18), 3752-61 https://doi.org/10.1093/nar/27.18.3752
  18. Kreiman, G. (2004). Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. Nucleic Acids Res. 32(9), 2889-900 https://doi.org/10.1093/nar/gkh614
  19. Lingbeck, J.M., Trausch-Azar, J.S., Ciechanover, A., and Schwartz, A.L. (2005). E12 and E47 modulate cellular localization and proteasome-mediated degradation of MyoD and Id1. Oncogene 24(42), 6376-84 https://doi.org/10.1038/sj.onc.1208789
  20. Margulies, E.H., and Green, E.D. (2003). Detecting highly conserved regions of the human genome by multispecies sequence comparisons. Cold Spring Harb Symp Quant Biol. 68, 255-63
  21. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374-8 https://doi.org/10.1093/nar/gkg108
  22. Montgomery, S.B., Griffith, O.L., Sleumer, M.C., Bergman, C.M., Bilenky, M., Pleasance, E.D., Prychyna, Y., Zhang, X., and Jones, S.J. (2006). ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22(5), 637-40 https://doi.org/10.1093/bioinformatics/btk027
  23. Morrisey, E. E. (2000). GATA-6: the proliferation stops here: cell proliferation in glomerular mesangial and vascular smooth muscle cells. Circ Res. 87(8), 638-40 https://doi.org/10.1161/01.RES.87.8.638
  24. Pfeifer, M., Begerow, B., and Minne, H.W. (2002). Vitamin D and muscle function. Osteoporos Int. 13(3), 187-94 https://doi.org/10.1007/s001980200012
  25. Kim, S.B., Ryu, G.M., Kim, Y.J., Heo, J.Y., Park, C., Oh, B.S., Kim, H.L., Kimm, K.C., Kim,K.W., and Kim, Y.Y. (2007). FCAnalyzer: A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms. Genomics & Informatics 5(1), 10-18
  26. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R.K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15(8), 1034-50 https://doi.org/10.1101/gr.3715005
  27. Storey, J. D., and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100(16), 9440-5
  28. Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., and Zhu, Z. (2005). Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 23(1), 137-44 https://doi.org/10.1038/nbt1053
  29. Vitelli, L., Condorelli, G., Lulli, V., Hoang, T., Luchetti, L., Croce, C.M., and Peschle, C. (2000). A pentamer transcrip tional complex including tal-1 and retinoblastoma protein downmodulates c-kit expression in normal erythroblasts. Mol Cell Biol. 20(14), 5330-42 https://doi.org/10.1128/MCB.20.14.5330-5342.2000
  30. Vlieghe, D., Sandelin, A., De Bleser, P.J., Vleminckx, K., Wasserman, W.W., van Roy, F., and Lenhard, B. (2006). A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 34(Database issue), D95-7 https://doi.org/10.1093/nar/gkj115
  31. Wasserman, W.W. and Fickett, J.W. (1998). Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 278(1), 167-81 https://doi.org/10.1006/jmbi.1998.1700
  32. Westhoff, T., Jankowski, J., Schmidt, S., Luo, J., Giebing, G., Schluter, H., Tepel, M., Zidek, W., and van der Giet, M. (2003). Identification and characterization of adenosine 5'-tetraphosphate in human myocardial tissue. J Biol Chem. 278(20), 17735-40 https://doi.org/10.1074/jbc.M300288200
  33. Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., and Urbach, S. (2001). The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29(1), 281-3 https://doi.org/10.1093/nar/29.1.281
  34. Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway, H., Kelly, K., Walter, K., Abnizova, I., Gilks, W., Edwards, Y.J., Cooke, J.E., and Elgar, G. (2005). Highly conserved noncoding sequences are associated with vertebrate development. PLoS Biol. 3(1), e7 https://doi.org/10.1371/journal.pbio.0030007
  35. Yu, X., Lin, J., Zack, D.J., and Qian, J. (2006). Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues. Nucleic Acids Res. 34(17), 4925-36 https://doi.org/10.1093/nar/gkl595