• Title/Summary/Keyword: protein function prediction

Search Result 91, Processing Time 0.024 seconds

Genomic Analysis of 13 Putative Active Prophages Located in the Genomes of Walnut Blight Pathogen Xanthomonas arboricola pv. juglandis

  • Cao, Zheng;Cuiying, Du;Benzhong, Fu
    • Microbiology and Biotechnology Letters
    • /
    • v.50 no.4
    • /
    • pp.563-573
    • /
    • 2022
  • Xanthomonas arboricola pv. juglandis (Xaj) is a globally important bacterial pathogen of walnut trees that causes substantial economic losses in commercial walnut production. Although prophages are common in bacterial plant pathogens and play important roles in bacterial diversity and pathogenicity, there has been limited investigation into the distribution and function of prophages in Xaj. In this study, we identified and characterized 13 predicted prophages from the genomes of 12 Xaj isolates from around the globe. These prophages ranged in length from 11.8 kb to 51.9 kb, with between 11-75 genes and 57.82-64.15% GC content. The closest relatives of these prophages belong to the Myoviridae and Siphoviridae families of the Caudovirales order. The phylogenetic analysis allowed the classification of the prophages into five groups. The gene constitution of these predicted prophages was revealed via Roary analysis. Amongst 126 total protein groups, the most prevalent group was only present in nine prophages, and 22 protein groups were present in only one prophage (singletons). Also, bioinformatic analysis of the 13 identified prophages revealed the presence of 431 genes with an average length of 389.7 bp. Prokka annotation of these prophages identified 466 hypothetical proteins, 24 proteins with known function, and six tRNA genes. The proteins with known function mainly comprised prophage integrase IntA, replicative DNA helicase, tyrosine recombinase XerC, and IS3 family transposase. There was no detectable insertion site specificity for these prophages in the Xaj genomes. The identified Xaj prophage genes, particularly those of unknown function, merit future investigation.

Comparative Genomics of T-complex protein 10 like in Humans and Chimpanzees

  • Kim, Il-Chul;Kim, Dae-Soo;Kim, Dae-Won;Choi, Sang-Haeng;Choi, Han-Ho;Chae, Sung-Hwa;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • v.3 no.2
    • /
    • pp.61-65
    • /
    • 2005
  • Comparing 231 genes on chimpanzee chromosome 22 with their orthologous on human chromosome 21, we have found that 15 orthologs have indels within their coding sequences. It was rather surprising that significant number of genes have changed by indel, despite the shorter time since their divergence and led us hypothesize that indels and structural changes may represent one of the major mechanism of proteome evolution in the higher primates. Human T-complex protein 10 like (TCP 10L) is a representative having indel within its coding sequence. Gene structure of human TCP10L compared with chimpanzee TCP10L gene showed 16 base pair difference in genomic DNA. As a result of the indel, frame shift mutation occurs in coding sequence (CDS) and human TCP10L express longer polypeptide of 21 amino acid residues than that of chimpanzee. Our prediction found that the indel may affect to dramatic change of secondary protein structure between human and chimpanzee TCP10L. Especially, the structural changes in the C-terminal region of TCP10L protein may affect on the interacting potential to other proteins rather than DNA binding function of the protein. Through these changes, TCP10L might influence gene expression profiles in liver and testis and subsequently influence the physiological changes required in primate evolution.

A Performance Comparison of Multi-Label Classification Methods for Protein Subcellular Localization Prediction (단백질의 세포내 위치 예측을 위한 다중레이블 분류 방법의 성능 비교)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.4
    • /
    • pp.992-999
    • /
    • 2014
  • This paper presents an extensive experimental comparison of a variety of multi-label learning methods for the accurate prediction of subcellular localization of proteins which simultaneously exist at multiple subcellular locations. We compared several methods from three categories of multi-label classification algorithms: algorithm adaptation, problem transformation, and meta learning. Experimental results are analyzed using 12 multi-label evaluation measures to assess the behavior of the methods from a variety of view-points. We also use a new summarization measure to find the best performing method. Experimental results show that the best performing methods are power-set method pruning a infrequently occurring subsets of labels and classifier chains modeling relevant labels with an additional feature. futhermore, ensembles of many classifiers of these methods enhance the performance further. The recommendation from this study is that the correlation of subcellular locations is an effective clue for classification, this is because the subcellular locations of proteins performing certain biological function are not independent but correlated.

A Method for Protein Functional Flow Configuration and Validation (단백질 기능 흐름 모델 구성 및 평가 기법)

  • Jang, Woo-Hyuk;Jung, Suk-Hoon;Han, Dong-Soo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.4
    • /
    • pp.284-288
    • /
    • 2009
  • With explosively growing PPI databases, the computational approach for a prediction and configuration of PPI network has been a big stream in the bioinformatics area. Recent researches gradually consider physicochemical properties of proteins and support high resolution results with integration of experimental results. With regard to current research trend, it is very close future to complete a PPI network configuration of each organism. However, direct applying the PPI network to real field is complicated problem because PPI network is only a set of co-expressive proteins or gene products, and its network link means simple physical binding rather than in-depth knowledge of biological process. In this paper, we suggest a protein functional flow model which is a directed network based on a protein functions' relation of signaling transduction pathway. The vertex of the suggested model is a molecular function annotated by gene ontology, and the relations among the vertex are considered as edges. Thus, it is easy to trace a specific function's transition, and it can be a constraint to extract a meaningful sub-path from whole PPI network. To evaluate the model, 11 functional flow models of Homo sapiens were built from KEGG, and Cronbach's alpha values were measured (alpha=0.67). Among 1023 functional flows, 765 functional flows showed 0.6 or higher alpha values.

Evaluation of Information Representation Goodness-of-fit According to Protein Visualization Pattern (단백질 가시화 형태에 따른 정보표현적합도 평가)

  • Byeon, Jaehee;Choi, Yoo-Joo;Suh, Jung-Keun
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.117-125
    • /
    • 2015
  • The information about protein structure gives the clues for the function of protein. It is needed for the improvement for the efficacy and fast development of protein drugs. So, the studies visualizing the structure of protein effectively increase. Most studies of visualization focus on the structural prediction for protein or the improvement on the rendering speed. However, studies of information delivery depending on the form of protein visualization are very limited. The major objective of this study is to analyze the information representation goodness-of-fit for the patterns of the hybrid visualization with primary and secondary structures of protein. Those hybrid visualizations included the patterns which updated current representative visualization services, Chimera, PDB and Cn3D. Information factor to analyze information representation goodness-of-fit is assorted by protein primary structure, secondary protein structure, the location of amino acid and ratio information about protein secondary structure, based on the result of subject-analysis. Subject is the group of experts who are involved in protein drug development over 5 years. The result of this study shows the meaningful difference in the information representation goodness-of-fit by the patterns of hybrid visualization and proves the difference in the information by the pattern of visualization.

Backbone 1H, 15N, and 13C Resonances Assignment and Secondary Structure Prediction of SAV0506 from Staphylococcus aureus

  • Lee, In Gyun;Lee, Ki-Young;Kim, Ji-Hun;Chae, Susanna;Lee, Bong-Jin
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.17 no.1
    • /
    • pp.54-58
    • /
    • 2013
  • SAV0506 is an 87 residue hypothetical protein from Staphylococcus aureus strain Mu50 and also predicted to have similar function to ribosome associated heat shock protein, Hsp 15. Hsp15 is thought to be involved in the repair mechanism of erroneously produced 50S ribosome subunit. In this report, we present the sequence specific backbone resonance assignment of SAV0506. About 82.5% of all resonances could be assigned unambiguously. By analyzing deviations of the $C{\alpha}$ and $C{\beta}$ chemical shift values, we could predict the secondary structure of SAV0506. This study is an essential step towards the structural characterization of SAV0506.

Predicting tissue-specific expressions based on sequence characteristics

  • Paik, Hyo-Jung;Ryu, Tae-Woo;Heo, Hyoung-Sam;Seo, Seung-Won;Lee, Do-Heon;Hur, Cheol-Goo
    • BMB Reports
    • /
    • v.44 no.4
    • /
    • pp.250-255
    • /
    • 2011
  • In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

Search method of Domain for prediction of protein function (단백질의 기능 예측을 위한 도메인 검색 방법)

  • 허미영;김홍기;최진성
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.239-242
    • /
    • 2003
  • 모든 생명체는 유전자의 최종 산물인 다양한 단백질들이 각각의 복잡한 기능을 수행함과 동시에 그들 사이의 긴밀한 상호작용에 의해 생명을 유지한다. 도메인 (Domain)은 단백질의 기능적 단위로서 한 개 단백질은 최대 수십 개의 도메인을 가지는데 이들 도메인에 대한 정보는 단백질의 기능을 예측하는데 도움이 될 수 있다. 본 논문에서는 종양을 억제하는 기능을 가지는 단백질과 그러한 기능을 가질 것으로 추정되어지는 단백질의 아미노산 서열, 또 기능이 밝혀지지 않은 미지의 아미노산 서열을 가지고 이미 밝혀져 있는 도메인 서열과 비교 검색하여 이들 사이에 일치하는 도메인을 통하여 표적 단백질의 기능 동정에 관한 연구에 도움이 되며, 또한 기능이 밝혀지지 않은 아미노산 서열의 도메인을 검색하여 새로운 기능을 예측함으로써 다른 실험적 방법과 비교하여 시간과 비용을 절약할 수 있는 효과적인 방법을 얻었기에 제안하고자 한다.

  • PDF

An Information-based Model for an Interactive Web Service with Agricultural Biotechnology

  • Kim, Chang-Kug;Seo, Young-Joo;Park, Dong-Suk;Hahn, Jang-Ho
    • Genomics & Informatics
    • /
    • v.9 no.2
    • /
    • pp.85-88
    • /
    • 2011
  • The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a biological information-based database. The major functions of the NABIC are focused on biotechnological developments for agricultural bioinformatics and providing a web-based service to construct bioinformatics workflows easily, such as protein function prediction and genome systems biology programs. The NABIC has concentrated on the functional genomics of major crops, building an integrated biotechnology database for agro-biotech information that focuses on the proteomics of major agricultural resources, such as rice, Chinese cabbage, rice Ds-tagging lines, and microorganisms.

A Hybrid Protein Function Prediction System Using Sequence Similarity and Feature-based Classification (서열 유사도와 특징 기반 분류를 융합시킨 단백질 기능 예측 시스템)

  • Moon, Ji Hwan;Kim, Yoo-Sung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.197-200
    • /
    • 2010
  • 단백질의 서열 정보와 기능 정보의 양이 증가함에 따라 컴퓨터 실험을 통한 단백질의 기능 예측이 가능해졌으며 정확성이 높은 예측 시스템을 개발하려는 여러 연구가 시도되고 있다. 대표적인 방법으로 서열 유사도를 기반으로 기능 예측을 하는 시스템이 제안되었으나 단백질 중에는 서열이 유사하지만 기능이 다르거나 또는 서열은 다름에도 불구하고 기능이 같은 단백질이 존재하기 때문에 서열의 유사도 만을 이용해서는 단백질의 기능 예측을 어렵다. 이러한 유사도 방법의 단점을 극복하기 위해 단백질 서열로부터 추출한 특징을 기반으로 분류하는 방법도 제안되었다. 본 논문에서는 이러한 기존 방법들의 장점을 얻기 위하여 서열 유사도 방법과 특징 기반 방법을 융합한 단백질 기능 예측 시스템을 제안하고 예측 정확성 분석을 위한 실험을 실시하였다. 실험의 결과에 따르면 제안된 융합시스템이 서열 유사도만을 이용한 방법과 특징 기반 방법보다 좋은 예측 정확률을 갖는 것으로 분석되었다.