• Title/Summary/Keyword: Protein Structure Alignment

Search Result 30, Processing Time 0.028 seconds

A Study of Flexible Protein Structure Alignment Using Three Dimensional Local Similarities (단백질 3차원 구조의 지역적 유사성을 이용한 Flexible 단백질 구조 정렬에 관한 연구)

  • Park, Chan-Yong;Hwang, Chi-Jung
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.359-366
    • /
    • 2009
  • Analysis of 3-dimensional (3D) protein structure plays an important role of structural bioinformatics. The protein structure alignment is the main subjects of the structural bioinformatics and the most fundamental problem. Protein Structures are flexible and undergo structural changes as part of their function, and most existing protein structure comparison methods treat them as rigid bodies, which may lead to incorrect alignment. We present a new method that carries out the flexible structure alignment by means of finding SSPs(Similar Substructure Pairs) and flexible points of the protein. In order to find SSPs, we encode the coordinates of atoms in the backbone of protein into RDA(Relative Direction Angle) using local similarity of protein structure. We connect the SSPs with Floyd-Warshall algorithm and make compatible SSPs. We compare the two compatible SSPs and find optimal flexible point in the protein. On our well defined performance experiment, 68 benchmark data set is used and our method is better than three widely used methods (DALI, CE, FATCAT) in terms of alignment accuracy.

Protein Backbone Torsion Angle-Based Structure Comparison and Secondary Structure Database Web Server

  • Jung, Sunghoon;Bae, Se-Eun;Ahn, Insung;Son, Hyeon S.
    • Genomics & Informatics
    • /
    • v.11 no.3
    • /
    • pp.155-160
    • /
    • 2013
  • Structural information has been a major concern for biological and pharmaceutical studies for its intimate relationship to the function of a protein. Three-dimensional representation of the positions of protein atoms is utilized among many structural information repositories that have been published. The reliability of the torsional system, which represents the native processes of structural change in the structural analysis, was partially proven with previous structural alignment studies. Here, a web server providing structural information and analysis based on the backbone torsional representation of a protein structure is newly introduced. The web server offers functions of secondary structure database search, secondary structure calculation, and pair-wise protein structure comparison, based on a backbone torsion angle representation system. Application of the implementation in pair-wise structural alignment showed highly accurate results. The information derived from this web server might be further utilized in the field of ab initio protein structure modeling or protein homology-related analyses.

A new method to predict the protein sequence alignment quality (단백질 서열정렬 정확도 예측을 위한 새로운 방법)

  • Lee, Min-Ho;Jeong, Chan-Seok;Kim, Dong-Seop
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.82-87
    • /
    • 2006
  • The most popular protein structure prediction method is comparative modeling. To guarantee accurate comparative modeling, the sequence alignment between a query protein and a template should be accurate. Although choosing the best template based on the protein sequence alignments is most critical to perform more accurate fold-recognition in comparative modeling, even more critical is the sequence alignment quality. Contrast to a lot of attention to developing a method for choosing the best template, prediction of alignment accuracy has not gained much interest. Here, we develop a method for prediction of the shift score, a recently proposed measure for alignment quality. We apply support vector regression (SVR) to predict shift score. The alignment between a query protein and a template protein of length n in our own library is transformed into an input vector of length n +2. Structural alignments are assumed to be the best alignment, and SVR is trained to predict the shift score between structural alignment and profile-profile alignment of a query protein to a template protein. The performance is assessed by Pearson correlation coefficient. The trained SVR predicts shift score with the correlation between observed and predicted shift score of 0.80.

  • PDF

Protein Structure Alignment Based on Maximum of Residue Pair Distance and Similarity Graph (정렬된 잔기 사이의 최대거리와 유사도 그래프에 기반한 단백질 구조 정렬)

  • Kim, Woo-Cheol;Park, Sang-Hyun;Won, Jung-Im
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.396-408
    • /
    • 2007
  • After the Human Genome Project finished the sequencing of a human DNA sequence, the concerns on protein functions are increasing. Since the structures of proteins are conserved in divergent evolution, their functions are determined by their structures rather than by their amino acid sequences. Therefore, if similarities between two protein structures are observed, we could expect them to have common biological functions. So far, a lot of researches on protein structure alignment have been performed. However, most of them use RMSD(Root Mean Square Deviation) as a similarity measure with which it is hard to judge the similarity level of two protein structures intuitively. In addition, they retrieve only one result having the highest alignment score with which it is hard to satisfy various users of different purpose. To overcome these limitations, we propose a novel protein structure alignment algorithm based on MRPD(Maximum of Residue Pair Distance) and SG (Similarity Graph). MRPD is more intuitive similarity measure by which fast tittering of unpromising pairs of protein pairs is possible, and SG is a compact representation method for multiple alignment results with which users can choose the most plausible one among various users' needs by providing multiple alignment results without compromising the time to align protein structures.

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

  • You, Youngki;Jang, Inhwan;Lee, Kyungro;Kim, Heonjoo;Lee, Kwanhee
    • Interdisciplinary Bio Central
    • /
    • v.6 no.4
    • /
    • pp.3.1-3.10
    • /
    • 2014
  • Amino acid substitution matrices are essential tools for protein sequence analysis, homology sequence search in protein databases and multiple sequence alignment. The PAM matrix was the first widely used amino acid substitution matrix. The BLOSUM series then succeeded the PAM matrix. Most substitution matrixes were developed by using the statistical frequency of substitution between each amino acid at blocks representing groups of protein families or related proteins. However, substitution of amino acids is based on the similarity of physiochemical properties of each amino acid. In this study, a new approach was used to obtain major physiochemical properties in multiple sequence alignment. Frequency of amino acid substitution in multiple sequence alignment database and selected attributes of amino acids in physiochemical properties database were merged. This merged data showed the major physiochemical properties through principle components analysis. Using factor analysis, these four principle components were interpreted as flexibility of electronic movement, polarity, negative charge and structural flexibility. Applying these four components, BAPS was constructed and validated for accuracy. When comparing receiver operated characteristic ($ROC_{50}$) values, BAPS scored slightly lower than BLOSUM and PAM. However, when evaluating for accuracy by comparing results from multiple sequence alignment with the structural alignment results of two test data sets with known three-dimensional structure in the homologous structure alignment database, the result of the test for BAPS was comparatively equivalent or better than results for prior matrices including PAM, Gonnet, Identity and Genetic code matrix.

Reviving GOR method in protein secondary structure prediction: Effective usage of evolutionary information

  • Lee, Byung-Chul;Lee, Chang-Jun;Kim, Dong-Sup
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.133-138
    • /
    • 2003
  • The prediction of protein secondary structure has been an important bioinformatics tool that is an essential component of the template-based protein tertiary structure prediction process. It has been known that the predicted secondary structure information improves both the fold recognition performance and the alignment accuracy. In this paper, we describe several novel ideas that may improve the prediction accuracy. The main idea is motivated by an observation that the protein's structural information, especially when it is combined with the evolutionary information, significantly improves the accuracy of the predicted tertiary structure. From the non-redundant set of protein structures, we derive the 'potential' parameters for the protein secondary structure prediction that contains the structural information of proteins, by following the procedure similar to the way to derive the directional information table of GOR method. Those potential parameters are combined with the frequency matrices obtained by running PSI-BLAST to construct the feature vectors that are used to train the support vector machines (SVM) to build the secondary structure classifiers. Moreover, the problem of huge model file size, which is one of the known shortcomings of SVM, is partially overcome by reducing the size of training data by filtering out the redundancy not only at the protein level but also at the feature vector level. A preliminary result measured by the average three-state prediction accuracy is encouraging.

  • PDF

Identification of Viral Taxon-Specific Genes (VTSG): Application to Caliciviridae

  • Kang, Shinduck;Kim, Young-Chang
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.23.1-23.5
    • /
    • 2018
  • Virus taxonomy was initially determined by clinical experiments based on phenotype. However, with the development of sequence analysis methods, genotype-based classification was also applied. With the development of genome sequence analysis technology, there is an increasing demand for virus taxonomy to be extended from in vivo and in vitro to in silico. In this study, we verified the consistency of the current International Committee on Taxonomy of Viruses taxonomy using an in silico approach, aiming to identify the specific sequence for each virus. We applied this approach to norovirus in Caliciviridae, which causes 90% of gastroenteritis cases worldwide. First, based on the dogma "protein structure determines its function," we hypothesized that the specific sequence can be identified by the specific structure. Firstly, we extracted the coding region (CDS). Secondly, the CDS protein sequences of each genus were annotated by the conserved domain database (CDD) search. Finally, the conserved domains of each genus in Caliciviridae are classified by RPS-BLAST with CDD. The analysis result is that Caliciviridae has sequences including RNA helicase in common. In case of Norovirus, Calicivirus coat protein C terminal and viral polyprotein N-terminal appears as a specific domain in Caliciviridae. It does not include in the other genera in Caliciviridae. If this method is utilized to detect specific conserved domains, it can be used as classification keywords based on protein functional structure. After determining the specific protein domains, the specific protein domain sequences would be converted to gene sequences. This sequences would be re-used one of viral bio-marks.

A Method for Protein Structure Alignment based on Protein Secondary Structure (단백질 이차 구조에 기반을 둔 단백질 구조 정렬 방법)

  • 김진홍;안건태;윤형석;이수현;이명준
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04a
    • /
    • pp.700-702
    • /
    • 2002
  • 단백질 구조를 정렬하는 방법은 단백질의 모티프 또는 폴드를 찾는데 사용되고 있으며, 기능적 또는 구조적으로 연관된 단백질을 분류하는데 유용하게 사용되고 있다. 본 논문에서는 단백질 이차 구조($\alpha$-나선 구조와 $\beta$-병풍구조)를 기반으로 하는 단백질 구조 정렬 방법에 대하여 기술한다. 제안된 단백질 이차 구조 요소 기반의 정렬방법은 단백질 구조를 단백질 이차 구조 요소와 그들 사이의 관계(수소결합, 상대적 위치)를 이용하여 표현하고, 표현된 두 개의 구조를 단백질 이차 구조 요소와 그들 사이의 관계만을 이용하여 비교하는 방법으로 기존의 방법보다 빨리 정렬할 수 있다.

  • PDF

Backbone 1H, 15N and 13C Resonance Assignment and Secondary Structure Prediction of HP0062 (O24902_HELPY) from Helicobacter pylori

  • Jang, Sun-Bok;Ma, Chao;Park, Sung-Jean;Kwon, Ae-Ran;Lee, Bong-Jin
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.13 no.2
    • /
    • pp.117-125
    • /
    • 2009
  • HP0062 is an 86 residue hypothetical protein from Helicobacter pylori strain 26695. HP0062 was identified ESAT-6/WXG100 superfamily protein based on structure and sequence alignment and also contains leucine zipper domain sequence. Here, we report the sequence-specific backbone resonance assignment of HP0062. About 97.7% of all $^1H_N,\;^{15}N,\;^{13}C_{\alpha},\;^{13}C_{\beta}\;and\;^{13}C=O$ resonances were assigned unambiguously. We could predict the secondary structure of HP0062 by analyzing the deviation of the $^{13}C_{alpha}\;and\;^{13}C_{\beta}$ chemical shifts from their respective random coil values. Secondary structure prediction shows that HP0062 consist of two ${\alpha}$-helices. This study is a prerequisite for determining the solution structure of HP0062 and can be used for the study on interaction between HP0062 and DNA and other Helicobacter pylori proteins.

3D Shape Descriptor with Interatomic Distance for Screening the Molecular Database (분자 데이터베이스 스크리닝을 위한 원자간 거리 기반의 3차원 형상 기술자)

  • Lee, Jae-Ho;Park, Joon-Young
    • Korean Journal of Computational Design and Engineering
    • /
    • v.14 no.6
    • /
    • pp.404-414
    • /
    • 2009
  • In the computational molecular analysis, 3D structural comparison for protein searching plays a very important role. As protein databases have been grown rapidly in size, exhaustive search methods cannot provide satisfactory performance. Because exhaustive search methods try to handle the structure of protein by using sphere set which is converted from atoms set, the similarity calculation about two sphere sets is very expensive. Instead, the filter-and-refine paradigm offers an efficient alternative to database search without compromising the accuracy of the answers. In recent, a very fast algorithm based on the inter-atomic distance has been suggested by Ballester and Richard. Since they adopted the moments of distribution with inter-atomic distance between atoms which are rotational invariant, they can eliminate the structure alignment and orientation fix process and perform the searching faster than previous methods. In this paper, we propose a new 3D shape descriptor. It has properties of the general shape distribution and useful property in screening the molecular database. We show some experimental results for the validity of our method.