Browse > Article

Calibrating Thresholds to Improve the Detection Accuracy of Putative Transcription Factor Binding Sites  

Kim, Young-Jin (Center for Genome Science, National Institute of Health, KCDC)
Ryu, Gil-Mi (Center for Genome Science, National Institute of Health, KCDC)
Park, Chan (College of Pharmacy, Seoul National University)
Kim, Kyu-Won (College of Pharmacy, Seoul National University)
Oh, Berm-Seok (Center for Genome Science, National Institute of Health, KCDC)
Kim, Young-Youl (Center for Genome Science, National Institute of Health, KCDC)
Gu, Man-Bok (School of Life Science & Biotechnology, Korea University)
Abstract
To understand the mechanism of transcriptional regulation, it is essential to detect promoters and regulatory elements. Various kinds of methods have been introduced to improve the prediction accuracy of regulatory elements. Since there are few experimentally validated regulatory elements, previous studies have used criteria based solely on the level of scores over background sequences. However, selecting the detection criteria for different prediction methods is not feasible. Here, we studied the calibration of thresholds to improve regulatory element prediction. We predicted a regulatory element using MATCH, which is a powerful tool for transcription factor binding site (TFBS) detection. To increase the prediction accuracy, we used a regulatory potential (RP) score measuring the similarity of patterns in alignments to those in known regulatory regions. Next, we calibrated the thresholds to find relevant scores, increasing the true positives while decreasing possible false positives. By applying various thresholds, we compared predicted regulatory elements with validated regulatory elements from the Open Regulatory Annotation (ORegAnno) database. The predicted regulators by the selected threshold were validated through enrichment analysis of muscle-specific gene sets from the Tissue-Specific Transcripts and Genes (T-STAG) database. We found 14 known muscle-specific regulators with a less than a 5% false discovery rate (FDR) in a single TFBS analysis, as well as known transcription factor combinations in our combinatorial TFBS analysis.
Keywords
combinatorial TFBS; regulatory conservation score; regulatory element; transcription factor binding site;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Alkema, W.B., Johansson, O., Lagergren, J., and Wasserman, W.W. (2004). MSCAN: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res. 32(Web Server issue), W195-8.   DOI   ScienceOn
2 Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G. M., and Eisen, M.B. (2002). Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci USA 99(2), 757-62
3 Fickett, J.W. (1996). Quantitative discrimination of MEF2 sites. Mol Cell Biol. 16(1), 437-41   DOI
4 Guo, C.S., Degnin, C., Fiddler, T.A., Stauffer, D., and Thayer, M.J. (2003). Regulation of MyoD activity and muscle cell differentiation by MDM2, pRb, and Sp1. J Biol Chem 278(25), 22615-22   DOI   ScienceOn
5 Kreiman, G. (2004). Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. Nucleic Acids Res. 32(9), 2889-900   DOI   ScienceOn
6 Westhoff, T., Jankowski, J., Schmidt, S., Luo, J., Giebing, G., Schluter, H., Tepel, M., Zidek, W., and van der Giet, M. (2003). Identification and characterization of adenosine 5'-tetraphosphate in human myocardial tissue. J Biol Chem. 278(20), 17735-40   DOI   ScienceOn
7 Yu, X., Lin, J., Zack, D.J., and Qian, J. (2006). Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues. Nucleic Acids Res. 34(17), 4925-36   DOI   ScienceOn
8 Wasserman, W.W. and Fickett, J.W. (1998). Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 278(1), 167-81   DOI   ScienceOn
9 Kel, A., Konovalova, T., Waleev, T., Cheremushkin, E., Kel-Margoulis, O., and Wingender, E. (2006). Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics. 22(10), 1190-7   DOI   ScienceOn
10 Halfon, M.S., Grad, Y., Church, G.M., and Michelson, A.M., (2002). Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12(7), 1019-28
11 Gupta, S., Vingron, M., and Haas, S.A. (2005). T-STAG: resource and web-interface for tissue-specific transcripts and genes. Nucleic Acids Res. 33(Web Server issue), W654-8   DOI   ScienceOn
12 Lingbeck, J.M., Trausch-Azar, J.S., Ciechanover, A., and Schwartz, A.L. (2005). E12 and E47 modulate cellular localization and proteasome-mediated degradation of MyoD and Id1. Oncogene 24(42), 6376-84   DOI
13 Montgomery, S.B., Griffith, O.L., Sleumer, M.C., Bergman, C.M., Bilenky, M., Pleasance, E.D., Prychyna, Y., Zhang, X., and Jones, S.J. (2006). ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22(5), 637-40   DOI   ScienceOn
14 Storey, J. D., and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100(16), 9440-5
15 Vitelli, L., Condorelli, G., Lulli, V., Hoang, T., Luchetti, L., Croce, C.M., and Peschle, C. (2000). A pentamer transcrip tional complex including tal-1 and retinoblastoma protein downmodulates c-kit expression in normal erythroblasts. Mol Cell Biol. 20(14), 5330-42   DOI
16 Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R.K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15(8), 1034-50   DOI   ScienceOn
17 Bluthgen, N., Kielbasa, S.M., and Herzel, H. (2005). Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res. 33(1), 272-9   DOI   ScienceOn
18 Hinrichs, A. S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., Hillman-Jackson, J., Kuhn, R.M., Pedersen, J.S., Pohl, A., Raney, B.J., Rosenbloom, K.R., Siepel, A., Smith, K.E., Sugnet, C.W., Sultan-Qurraie, A., Thomas, D.J., Trumbower, H., Weber, R.J., Weirauch, M., Zweig, A.S., Haussler, D., and Kent, W.J. (2006). The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34(Database issue), D590-8   DOI   ScienceOn
19 Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway, H., Kelly, K., Walter, K., Abnizova, I., Gilks, W., Edwards, Y.J., Cooke, J.E., and Elgar, G. (2005). Highly conserved noncoding sequences are associated with vertebrate development. PLoS Biol. 3(1), e7   DOI   ScienceOn
20 Conway, K., Pin, C., Kiernan, J.A., and Merrifield, P. (2004). The E protein HEB is preferentially expressed in developing muscle. Differentiation 72(7), 327-40   DOI   ScienceOn
21 Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374-8   DOI   ScienceOn
22 Dennis, G. Jr., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C., and Lempicki, R.A. (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4(5), P3   DOI
23 Margulies, E.H., and Green, E.D. (2003). Detecting highly conserved regions of the human genome by multispecies sequence comparisons. Cold Spring Harb Symp Quant Biol. 68, 255-63
24 Vlieghe, D., Sandelin, A., De Bleser, P.J., Vleminckx, K., Wasserman, W.W., van Roy, F., and Lenhard, B. (2006). A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 34(Database issue), D95-7   DOI
25 Morrisey, E. E. (2000). GATA-6: the proliferation stops here: cell proliferation in glomerular mesangial and vascular smooth muscle cells. Circ Res. 87(8), 638-40   DOI   ScienceOn
26 Fukami-Kobayashi, K., and Saito, N. (2002). How to make good use of CLUSTALW. Tanpakushitsu Kakusan Koso 47(9), 1237-9
27 Knoepfler, P.S., Bergstrom, D.A., Uetsuki, T., Dac-Korytko, I., Sun, Y.H., Wright, W.E., Tapscott, S.J., and Kamps, M.P. (1999). A conserved motif N-terminal to the DNAbinding domains of myogenic bHLH transcription factors mediates cooperative DNA binding with pbx-Meis1/Prep1. Nucleic Acids Res. 27(18), 3752-61   DOI   ScienceOn
28 King, D.C., Taylor, J., Elnitski, L., Chiaromonte, F., Miller, W., and Hardison, R.C. (2005). Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 15(8), 1051-60   DOI   ScienceOn
29 Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., and Zhu, Z. (2005). Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 23(1), 137-44   DOI   ScienceOn
30 Ho Sui, S.J., Mortimer, J.R., Arenillas, D.J., Brumm, J., Walsh, C.J., Kennedy, B.P., and Wasserman, W.W. (2005). oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 33(10), 3154-64   DOI   ScienceOn
31 Pfeifer, M., Begerow, B., and Minne, H.W. (2002). Vitamin D and muscle function. Osteoporos Int. 13(3), 187-94   DOI   ScienceOn
32 Kim, S.B., Ryu, G.M., Kim, Y.J., Heo, J.Y., Park, C., Oh, B.S., Kim, H.L., Kimm, K.C., Kim,K.W., and Kim, Y.Y. (2007). FCAnalyzer: A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms. Genomics & Informatics 5(1), 10-18   과학기술학회마을
33 Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., and Urbach, S. (2001). The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29(1), 281-3   DOI   ScienceOn
34 Cohen, C. D., Klingenhoff, A., Boucherot, A., Nitsche, A., Henger, A., Brunner, B., Schmid, H., Merkle, M., Saleem, M.A., Koller, K.P., Werner, T., Grone, H.J., Nelson, P.J., and Kretzler, M. (2006). Comparative promoter analysis allows de novo identification of specialized cell junctionassociated proteins. Proc Natl Acad Sci USA 103(15), 5682-7
35 Dottori, M., Gross, M.K., Labosky, P., and Goulding, M. (2001). The winged-helix transcription factor Foxd3 suppresses interneuron differentiation and promotes neural crest cell fate. Development. 128(21), 4127-38