Correlation Analysis between Regulatory Sequence Motifs and Expression Profiles by Kernel CCA

  • Rhee, Je-Keun (Graduate Program in Bioinformatics, Seoul National University, Center for Bioinformation Technology, Seoul University) ;
  • Joung, Je-Gun (Graduate Program in Bioinformatics, Seoul National University, Center for Bioinformation Technology, Seoul University) ;
  • Chang, Jeong-Ho (School of Computer Science and Engineering, Seoul National University) ;
  • Zhang, Byoung-Tak (Graduate Program in Bioinformatics, Seoul National University, Center for Bioinformation Technology, Seoul University, School of Computer Science and Engineering, Seoul National University)
  • Published : 2005.09.22

Abstract

Transcription factors regulate gene expression by binding to gene upstream region. Each transcription factor has the specific binding site in promoter region. So the analysis of gene upstream sequence is necessary for understanding regulatory mechanism of genes, under a plausible idea that assumption that DNA sequence motif profiles are closely related to gene expression behaviors of the corresponding genes. Here, we present an effective approach to the analysis of the relation between gene expression profiles and gene upstream sequences on the basis of kernel canonical correlation analysis (kernel CCA). Kernel CCA is a useful method for finding relationships underlying between two different data sets. In the application to a yeast cell cycle data set, it is shown that gene upstream sequence profile is closely related to gene expression patterns in terms of canonical correlation scores. By the further analysis of the contributing values or weights of sequence motifs in the construction of a pair of sequence motif profiles and expression profiles, we show that the proposed method can identify significant DNA sequence motifs involved with some specific gene expression patterns, including some well known motifs and those putative, in the process of the yeast cell cycle.

Keywords