DOI QR코드

DOI QR Code

Development of Integrated System for Motif and Domain Search

모티프 및 도메인 검색을 위한 통합 시스템 개발

  • Jung Min-Chul (Department of Microbiology, Kyungpook National University) ;
  • Park Wan (Department of Microbiology, Kyungpook National University) ;
  • Kim Ki-Bong (Department of Bioinformatics Engineering, Sangmyung University)
  • 정민철 (경북대학교 자연과학대학 미생물학과) ;
  • 박완 (경북대학교 자연과학대학 미생물학과) ;
  • 김기봉 (상명대학교 공과대학 생명정보과학과)
  • Published : 2004.12.01

Abstract

This paper deals with an integrated system that facilitates researchers to do motif and domain search effectively and systematically. The system we developed is constructed on the basis of the integration of various resources related to motif, domain, and protein family. Those resources that can be classified into databases and search programs are dispersed to be available in Internet. In order to develop this system, we extracted core contents of diverse databases, which are required to analyze the protein function in terms of motifs or domains, to construct local databases and installed motif or domain search programs on our server, which corresponding database has as its own search program. Diverse utilities and CGI (Common Gateway Interface) programs make the databases and the search programs interlocked and web-based graphical user interfaces integrate all the components of our system. Employing our integrated system, end-users can receive its one-stop service to do protein function analysis systematically and effectively, without surfing many sites in Internet and wasting time over integrating search results.

본 논문은 인터넷 상에 산재해 있는 활용 가능한 다양한 모티프 및 도메인 관련 리소스들을 통합화 함으로써 체계적이고 효율적인 모티프 및 도메인 검색을 할 수 있는 통합 시스템 개발에 대해 다루고 있다. 이러한 시스템을 개발하기 위해 산재해 있는 모티프 및 도메인 관련 개별 데이터베이스의 핵심 부분만을 취합하여 로컬 데이터베이스화하고, 해당 데이터베이스에서 사용되는 개별 분석 도구들을 로컬 서버에 설치하였다. 그리고 다양한 유틸리티 및 CGI 프로그램 등을 통해서 이들 데이터베이스와 분석 도구들을 상호연동시켰고, 단일 웹 인터페이스를 통해 전체적으로 통합하였다. 본 연구에서 개발한 모티프 및 도메인 통합 검색 시스템을 활용한다면, 최종 사용자들은 인터넷 상을 서핑하는데 많은 시간을 낭비하지 않고, 원스톱(one-stop) 서비스를 통해 본인이 하고자 하는 정확한 모티프 및 도메인 검색을 할 수 있어 보다 효율적이고 정확한 단백질 기능분석을 행 할 수 있을 것이다.

Keywords

References

  1. Attwood, T. K., P. Bradley, D. R. Flower, A. Gaulton, N. Maudling, A. L Mitchell, G. Moulton, A. Nordle, K. Paine, P. Taylor, A. Uddin and C. Zygouri. 2003. PRINTS and its automatic supplement, preprints. Nucleic Acids Res. 31, 400-402 https://doi.org/10.1093/nar/gkg030
  2. Bailey, T. and C. Elkan. 1995. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning Journal 21, 51-83
  3. Bateman , A., E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S. R. Eddy, S. Griffiths-Jones, K. L. Howe, M. Marshall and E. L. L. Sonnhammer. 2002. The Pfam protein families database. Nucleic Acids Res. 30, 276-280 https://doi.org/10.1093/nar/30.1.276
  4. Burge, C. and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94 https://doi.org/10.1006/jmbi.1997.0951
  5. Corpet, F., F. Servant, J. Gouzy and D. Kahn. 2000. ProDom and ProDom-CG: Tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267-269 https://doi.org/10.1093/nar/28.1.267
  6. Falquet, L., M. Pagni, P. Bucher, N. Hulo, C. J. Sigrist, K. Hofmann and A. Bairoch. 2002. The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235-238 https://doi.org/10.1093/nar/30.1.235
  7. Fujibuchi, W. and M. Kanehisa. 1997. Prediction of gene expression specificity by promoter sequence patterns. DNA Research 4, 81-90 https://doi.org/10.1093/dnares/4.2.81
  8. Gough, J., K. Karplus, R. Hughey and C. Chothia. 2001. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313, 903-919 https://doi.org/10.1006/jmbi.2001.5080
  9. Haft, D. H., J. D. Selengut and O. White. 2003. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371-373 https://doi.org/10.1093/nar/gkg128
  10. Letunic, I., L. Goodstadt, N. J. Dickens, T. Doerks, J. Schultz, R. Mott, F. Ciccarelli, R. R. Copley, C. P. Ponting and P. Bork. 2002. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242-244 https://doi.org/10.1093/nar/30.1.242
  11. Mulder, N. J., R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, M. Biswas, P. Bradley, P. Bork, P. Bucher, R. R. Copley, E. Courcelle, U. Das, R. Durbin, L. Falquet, W. Fleischmann, S. Griffiths-Jones, D. Haft, N. Harte, N. Hulo, D. Kahn, A. Kanapin, M. Krestyaninova, R. Lopez, I. Letunic, D. Lonsdale, V. Silventoinen, S. E. Orchard, M. Pagni, D. Peyruc, C. P. Ponting, J. D. Selengut, F. Servant, C. J. A. Sigrist, R. Vaughan and E. M. Zdobnov. 2003. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315-318 https://doi.org/10.1093/nar/gkg046
  12. Schneider, T., G. Stormo and L. Gold. 1986. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415-431 https://doi.org/10.1016/0022-2836(86)90165-8
  13. Wu, C. H., H. Huang, L. Yeh and W. C. Barker. 2003. Protein family classification and functional annotation. Comput. Biol. Chem. 27, 37-47 https://doi.org/10.1016/S1476-9271(02)00098-1
  14. Zdobnov, E. M. and R. Apweiler. 2001. InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847-848 https://doi.org/10.1093/bioinformatics/17.9.847
  15. Zhang, C. and A. K. Wong. 1997. A genetic algorithm for multiple molecular sequence alignment. Computer Application for Bioscience 13, 565-581 https://doi.org/10.1093/bioinformatics/13.6.565

Cited by

  1. Development of Operation·Management System of Comprehensive Plan for Storm and Flood Damage Reduction vol.15, pp.3, 2015, https://doi.org/10.9798/KOSHAM.2015.15.3.131
  2. A Study On the Application Methods of a Support Vector Machine for Gene Promoter Prediction. vol.17, pp.5, 2007, https://doi.org/10.5352/JLS.2007.17.5.714