A Protein Structure Comparison System based on PSAML

PSAML을 이용한 단백질 구조 비고 시스템

  • 김진홍 (울산대학교 컴퓨터정보통신공학부) ;
  • 안건태 (울산대학교 컴퓨터정보통신공학부) ;
  • 변상희 (울산대학교 컴퓨터정보통신공학부) ;
  • 이수현 (창원대학교 컴퓨터공학과) ;
  • 이명준 (울산대학교 컴퓨터정보통신공학부)
  • Published : 2005.04.01

Abstract

Since understanding of similarities and differences among protein structures is very important for the study of the relationship between structure and function, many protein structure comparison systems have been developed. Hut, unfortunately, these systems introduce their own protein data derived from the PDB(Protein Data Bank), which are needed in their algorithms for comparing protein structures. In addition, according to the rapid increase in the size of PDB, these systems require much more computation to search for common substructures in their databases. In this paper, we introduce a protein structure comparison system named WS4E(A Web-Based Searching Substructures of Secondary Structure Elements) based on a PSAML database which stores PSAML documents using the eXist open XML DBMS. PSAML(Protein Structure Abstraction Markup Language) is an XML representation of protein data, describing a protein structure as the secondary structures of the protein and their relationships. Using the PSAML database, the WS4E provides web services searching for common substructures among proteins represented in PSAML. In addition, to reduce the number of candidate protein structures to be compared in the PSAML database, we used topology strings which contain the spatial information of secondary structures in a protein.

단백질 구조에 대한 유사성과 특이성에 대한 이해는 단백질의 기능을 파악하는데 있어 중요한 역할을 하고 있기 때문에, 많은 단백질 구조를 비교하는 시스템이 개발되고 있다. 그러나 이러한 시스템들은 단백질 구조 비교를 위한 자신의 알고리즘에 맞게 PDB에서 제공하는 데이타를 가공해야 한다 더욱이 PDB 데이타베이스에 저장된 데이타가 증가함에 따라 대용량의 단백질 구조 데이타베이스를 대상으로 주어진 단백질과 유사한 부분구조를 찾는 시스템은 보다 많은 계산량이 필요하여진다. 본 논문에서는 XML 데이타베이스인 eXist를 이용하여 PSAML 문서를 제공하는 PSAML 데이타베이스에 기반을 둔 WS4E(A Web-Based Searching Substructures of Secondary Structure Elements) 단백질 구조 비교 시스템을 소개한다. PSAML(Protein Structure Abstraction Markup Language)은 XML기반의 단백질 구조 표현 기법으로서 단백질의 2차구조 구성요소와 그들 사이의 관계를 이용하여 단백질 구조를 정형화된 방법으로 기술한다. 구축된 PSAML 데이타베이스를 이용하여, WS4E는 PSAML로 표현된 단백질 구조에서 유사한 부분 구조를 찾는 웹서비스를 제공한다. 또한, PSAML 데이타베이스에서 비교 대상이 되는 단백질의 숫자를 감소시키기 위하여, 단백질 2차구조가 가지는 공간상의 정보를 이용하여 하나의 단백질 구조를 표현하는 기법인 topology string을 이용하였다.

Keywords

References

  1. H. M. Berman, J. D. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, 'The Protein Data Bank,' Nucleic Acid Research, Vol.28, No.1, pp.235-242, 2000 https://doi.org/10.1093/nar/28.1.235
  2. Helen B, T. Bhat, Philip B., Zukang F., Gary G., Helge W., and John W., 'The Protein Data Bank and the challenge of structural genomics,' Nature Structural Biology, Vol.7, pp.957-959, 2000 https://doi.org/10.1038/80734
  3. L. Holm and C. Sander, 'Protein structure comparison by alignment of distance matrices,' Journal of Molecular Biology, Vol.233, pp.123-138, 1993 https://doi.org/10.1006/jmbi.1993.1489
  4. A. P. Singh and D. L. Brutlag, 'Hierarchical Protein Structure Superposition using both Secondary Structure and Atomic Representations,' Intelligent Systems for Molecular Biology 97, vol.5, pp.284-293, 1997
  5. A. P. Singh and D. L. Brutlag, Protein Structure Alignment: A Comparison of Methods, 1999
  6. N. N. Alexandrov and D. Fischer, 'Analysis of topological and nontopological structural similarities in the PDB: New examples with old structures.' Proteins: Structure, Function, and Genetics, Vol.25. No.3, pp.354-365, 1996 https://doi.org/10.1002/(SICI)1097-0134(199607)25:3<354::AID-PROT7>3.3.CO;2-W
  7. 김진홍, 안건태, 이수현, 이명준, '구조비교를 위한 단백질 데이터의 XML 표현기법', 한국정보과학회 프로그래밍언어연구회, 제16권, 제2호, pp.15-16, 2002
  8. MGED group, MicroArray and Gene Expression (MAGE), WWW document (http://www.mged.org/Workgroups/MAGE/mage.html), 2004
  9. BioXML, Genome Annotation Markup Elements (GAME), WWW document (http://www.bioxml.org/Projects/game/), 2003
  10. V. Guerrini and D. Jackson, 'Bioinformatics and Extended Markup Language (XML),' Online Journal of Bioinformatics, Vol.1, No.1, pp.12-21, 2000
  11. R. Sayle and E. Milner-White, 'RASMOL: biomolecular graphics for all,' Trends in Biochemical Science, Vol.20, pp.374-376, 1995 https://doi.org/10.1016/S0968-0004(00)89080-5
  12. P. Bourne, H. Berman, B. McMahon, K.Warenpaugh, J. Westbrook, and P. Fitzgerald, 'The Macromolecular Crystallographic Information File (mmCIF),' Methods In Enzymology. Vol.277, pp.571-590, 1997 https://doi.org/10.1016/S0076-6879(97)77032-0
  13. Proteomics Inc., BioML:Biological Markup Language, WWW document (http://www.bioml.com/bioml/), 2004
  14. D. Hanisch, R. Zimmer, and T. Lengauer, 'ProML: the Protein Markup Language for specification of protein sequences, structures and families,' In Silico Biol, Vol.2, No.3, pp.313-324, 2002
  15. Hofinann K, Bucher P, Falquet L, and Bairoch A (1999) The PROS1TE database, its status in 1999. Nucleic Acids Res 27: 215-219 https://doi.org/10.1093/nar/27.1.215
  16. P. Murray-Rust and H. Rzepa, 'Chemical markup Language and XML Part 1. Basic principles,' J. Chem. Inf . Comp. Sci, Vol.39, No.6, pp.928-942, 1999 https://doi.org/10.1021/ci990052b
  17. A. Murzin, S. Brenner, T. Hubbard, and C. Chothia, 'SCOP: A structural classification of proteins database for the investigation of sequences and structures,' Journal of Molecular Biology, Vol.247, pp.536-540, 1995 https://doi.org/10.1006/jmbi.1995.0159
  18. I. Eidharnmer, I. Jonassen, and W. R. Taylor, Structure Comparison and Structure Patterns, Report no 174, University of Bergen, 1999
  19. D. S. Greer, J. D. Westbrook, and P. E. Bourne, OpenMMS: An Ontology Driven Architecture for Macromolecular Structure, Objects in Bio and Cheminformatics, 2001
  20. W3C, Document Object Model (DOM), WWW document (http://www.w3.org/DOM/), 2004
  21. W. Kabsch and C. Sander, 'Dictionary of Protein Secondary Stucture: Pattern Recognition of Hydrogen-Bonded and Geometrical Features,' Biopolymers, Vol.22, pp.2577-2637, 1983 https://doi.org/10.1002/bip.360221211
  22. The Apache Software Foundation, Xerces: XML parsers in Java, Apache XML Project, WWW document (http://xml.apache.org/), 2004
  23. Akmal B. Chaudri, Awais Rashid, Roberto Zicari, XML Data Management: Native XML and XML-Enabled Database Systems, Addison Wesley Professional, 2003
  24. Martin AC, 'The ups and downs of protein topology: rapid comparison of protein structure,' Protein Eng. Vol.13, No.12, pp.829-837, 2002
  25. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, 'Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,' Nucleic Acids Res., No.25, pp.3389-3402, 1997 https://doi.org/10.1093/nar/25.17.3389
  26. David W. Mount, Bioinformatics Sequence and Genome Analysis, Gold Spring Harbor Laboratory Press, pp.31-32, 2001
  27. Su-Hyun Lee, Jin-Hong Kim, Geon-Tae Ahn, and Myung-Joon Lee, 'Efficient Generation of Compatibility Graphs for Two Sets With an Ordered Attribute,' Information Sciences, (submitted), 2004
  28. Hiroaki KATO and Yoshimasa TAKAHASHI, 'Automated Identification of Three- Dimensional Common Structural Features of Proteins,' J. Chem. Software, Vol.7, No.4, pp.161-170, 2001 https://doi.org/10.2477/jchemsoft.7.161
  29. Sampo Niskanen, Patrie Ostergard, Cliquer: routines for clique searching, WWW document (http://www.hut.fi/~pat/cliquer), 2002
  30. VRML Plugin, VRML Plugin and Browser Detector, WWW document (http://cic.nist.gov/vrml/vbdetect.html), 2002
  31. Sun Microsystems, Java Object Serialization Specification, WWW document (http://java.sun.com/j2se/l.4/docs/guide/serialization/spec/serialTOC.doc.html) 2003
  32. D. Gilbert, D. Westhead, J. Viksna, and J. Thornton, A computer system to perform structure comparison using TOPS representations of protein structure, Comput. Chem., Vol.26, pp.23-30, 2001 https://doi.org/10.1016/S0097-8485(01)00096-1
  33. Holm, L., Park, J. DaliLite workbench for protein structure comparison, Bioinformatics, Vol.16, pp.566-567, 2000 https://doi.org/10.1093/bioinformatics/16.6.566