유전자 알고리즘을 이용한 프로모터 영역의 전사인자 결합부위 패턴 탐색

(Pattern Search for Transcription Factor Binding Sites in a Promoter Region using Genetic Algorithm)

  • 김기봉 ((주) 스몰포스트 정보기술연구소) ;
  • 공은배 (충남대학교 컴퓨터공학과)
  • 발행 : 2003.06.01

초록

유전자 발현에 매우 중요한 신호역할을 하는 프로모터 영역은 여러 전사인자들이 결합하는 특정 부위들을 갖고 있다. 전사인자의 결합부위는 프로모터의 다양한 부위에 위치하며, 진화론적으로 잘 보존된 Consensus 형태의 염기서열 패턴을 띠고 있다. 본 논문은 이러한 최적의 패턴들을 탐색하기 위해 유전자 알고리즘을 기반으로 하면서, 동시에 MEME 알고리즘의 N-occurrence-per-dataset 모델의 가정과 패턴의 길이를 결정할 수 있는 Wataru 방법의 장점을 따르는 새로운 방법을 제시하고 있다. 이러한 탐색 방법은 유전체 연구자들이 임의의 DNA 염기서열 상에서 프로모터 영역을 예측하거나 특정 전사인자의 결합부위를 탐색하는데 적극 활용할 수 있다.

The promoter that plays a very important role in gene expression as a signal part has various binding sites for transcription factors. These binding sites are located on various parts in promoter region and have highly conserved consensus sequence patterns. This paper presents a new method for the consensus pattern search in promoter regions using genetic algorithm, which adopts the assumption of N-occurrence-per-dataset model of MEME algorithm and employs the advantage of Wataru method in determining the pattern length. Our method will be employed by genome researchers who try to predict the promoter region on anonymous DNA sequence and to find out the binding site for a specific transcription factor.

키워드

참고문헌

  1. E. Snyder and G. Stormo, Identification of protein coding regions in genomic DNA, Journal of Molecular Biology, Vol. 248, pp 1-18, 1995 https://doi.org/10.1006/jmbi.1995.0198
  2. M. Burset and R. Guigo, Evaluation of gene structure prediction programs, Genomics, Vol. 34, pp 353-367, 1996 https://doi.org/10.1006/geno.1996.0298
  3. C. Burge and S. Karlin, Prediction of complete gene structures in human genomic DNA, Journal of Molecular Biology, Vol. 268, pp 78-94, 1997 https://doi.org/10.1006/jmbi.1997.0951
  4. Tim Bailey and William E. Hart, 'Learning Consensus Patterns in Unaligned DNA Sequences Using a Genetic Algorithm', Sandia Laboratories Tech Report SAND95-2293
  5. Pesole G., Prunella N., Liuni S., Attimonelli M., and Saccone C, 'WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences', Nucleic Acids Research, Vol. 20, pp. 2871-2875, 1992 https://doi.org/10.1093/nar/20.11.2871
  6. Lon R. Cardon and Gary D. Stormo, 'Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments', Journal of Molecular Biology, Vol. 223, pp. 159-170, 1992 https://doi.org/10.1016/0022-2836(92)90723-W
  7. Wataru Fujibuchi and Minora Kanehisa, 'Prediction of Gene Expression specificity by Promoter Sequence Patterns', DNA Research 4, pp. 81-90, 1997 https://doi.org/10.1093/dnares/4.2.81
  8. Dan S. Prestridge, 'Predicting Pol II Promoter Sequences using Transcription Factor Binding Sites', Journal of Molecular Biology, Vol. 249, pp. 923-932, 1995 https://doi.org/10.1006/jmbi.1995.0349
  9. Dan S. Prestridge, SIGNAL SCAN: A computer program that scans DNA sequences for eukaryotic transcriptional elements, CABIOS, Vol. 7, pp. 203-206, 1991
  10. James W. Fickett and Artemis G. Hatzigeorgiou, 'REVIEW Eukaryotic Promoter Recognition', Genome Research, Vol. 7, pp. 861-878, 1997 https://doi.org/10.1101/gr.7.9.861
  11. Thomas D. Schneider, Gary D. Stormo and Larry Gold, Information Content of Binding Sites on Nucleotide Sequences, Journal of Molecular Biology, Vol. 188, pp. 415-431, 1986 https://doi.org/10.1016/0022-2836(86)90165-8
  12. Timothy Bailey and Charles Elkan, 'Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization', Machine learning Journal, Vol. 21, pp. 51-83, 1995
  13. Z. Michalewicz, 유전자 알고리즘, 그린출판사, 1996
  14. David Beasley, David R. Bull and Ralph R. Martin, An Overview of Genetic Algorithms, University Computing, Vol. 15, No. 2, pp. 58-69, 1993
  15. Ching Zhang and Andrew KX.Wong, 'A genetic algorithm for multiple molecular sequence alignment', CABIOS, Vol. 13, No. 6, 1997
  16. Cedric Notredame and Desmond G. Higgins, 'SAGA: sequence alignment by genetic algorithm', Nucleic Acids Research, Vol. 24, No. 8, pp. 1515-1524, 1996 https://doi.org/10.1093/nar/24.8.1515
  17. Cavin Perier, R., Junier, T., Bonnard, C. and Bucher, P. 'The Eukaryotic Promoter Database EPD: Recent Developments', Nucleic Acids Research, Vol. 27, pp. 307-309, 1999 https://doi.org/10.1093/nar/27.1.307
  18. Ghosh, D., A relational database of transcription factors, Nucleic Acids Research, Vol. 18, pp. 1749-1756, 1990 https://doi.org/10.1093/nar/18.7.1749
  19. Timothy L. Bailey, Likelihood vs. Information in Aligning Biopolymer Sequences, UCSD Technical Report CS93-318, 1993