• Title/Summary/Keyword: protein annotation

Search Result 110, Processing Time 0.025 seconds

SFannotation: A Simple and Fast Protein Function Annotation System

  • Yu, Dong Su;Kim, Byung Kwon
    • Genomics & Informatics
    • /
    • v.12 no.2
    • /
    • pp.76-78
    • /
    • 2014
  • Owing to the generation of vast amounts of sequencing data by using cost-effective, high-throughput sequencing technologies with improved computational approaches, many putative proteins have been discovered after assembly and structural annotation. Putative proteins are typically annotated using a functional annotation system that uses extant databases, but the expansive size of these databases often causes a bottleneck for rapid functional annotation. We developed SFannotation, a simple and fast functional annotation system that rapidly annotates putative proteins against four extant databases, Swiss-Prot, TIGRFAMs, Pfam, and the non-redundant sequence database, by using a best-hit approach with BLASTP and HMMSEARCH.

In-silico characterization and structure-based functional annotation of a hypothetical protein from Campylobacter jejuni involved in propionate catabolism

  • Mazumder, Lincon;Hasan, Mehedi;Rus’d, Ahmed Abu;Islam, Mohammad Ariful
    • Genomics & Informatics
    • /
    • v.19 no.4
    • /
    • pp.43.1-43.12
    • /
    • 2021
  • Campylobacter jejuni is one of the most prevalent organisms associated with foodborne illness across the globe causing campylobacteriosis and gastritis. Many proteins of C. jejuni are still unidentified. The purpose of this study was to determine the structure and function of a non-annotated hypothetical protein (HP) from C. jejuni. A number of properties like physiochemical characteristics, 3D structure, and functional annotation of the HP (accession No. CAG2129885.1) were predicted using various bioinformatics tools followed by further validation and quality assessment. Moreover, the protein-protein interactions and active site were obtained from the STRING and CASTp server, respectively. The hypothesized protein possesses various characteristics including an acidic pH, thermal stability, water solubility, and cytoplasmic distribution. While alpha-helix and random coil structures are the most prominent structural components of this protein, most of it is formed of helices and coils. Along with expected quality, the 3D model has been found to be novel. This study has identified the potential role of the HP in 2-methylcitric acid cycle and propionate catabolism. Furthermore, protein-protein interactions revealed several significant functional partners. The in-silico characterization of this protein will assist to understand its molecular mechanism of action better. The methodology of this study would also serve as the basis for additional research into proteomic and genomic data for functional potential identification.

Development of Web-Based Assistant System for Protein-Protein Interaction and Function Analysis (웹 기반의 단백질 상호작용 및 기능분석을 위한 보조 시스템 개발)

  • Jung Min-Chul;Park Wan;Kim Ki-Bong
    • Journal of Life Science
    • /
    • v.14 no.6 s.67
    • /
    • pp.997-1002
    • /
    • 2004
  • This paper deals with the WASPIFA (Web-based Assistant System for Protein-protein Interaction and Function Analysis) system that can provide the comprehensive information on Protein-protein interaction and function concerned with function analysis. Different from existing systems for protein function and protein-protein interaction analysis, which provide fragmentary information restricted to specific field, our system furnishes end-user with comprehensive and synthetic information on the input sequence to be analyzed, including function and annotation information, domain information, and interaction relationship information. The synthetic information that our system contains as local databases has been extracted from many resources related to function, annotation, motif and domain by various pre-processing. Employing our system, end-users can evaluate and judge the synthetic results to do protein interaction and function analysis effectively. In addition, the WASPIFA system is equipped with automatic system management and data update function that facilitates system manager to maintain and manage it efficiently.

Hypothetical protein predicted to be tumor suppressor: a protein functional analysis

  • Kader, Md. Abdul;Ahammed, Akash;Khan, Md. Sharif;Ashik, Sheikh Abdullah Al;Islam, Md. Shariful;Hossain, Mohammad Uzzal
    • Genomics & Informatics
    • /
    • v.20 no.1
    • /
    • pp.6.1-6.15
    • /
    • 2022
  • Litorilituus sediminis is a Gram-negative, aerobic, novel bacterium under the family of Colwelliaceae, has a stunning hypothetical protein containing domain called von Hippel-Lindau that has significant tumor suppressor activity. Therefore, this study was designed to elucidate the structure and function of the biologically important hypothetical protein EMK97_00595 (QBG34344.1) using several bioinformatics tools. The functional annotation exposed that the hypothetical protein is an extracellular secretory soluble signal peptide and contains the von Hippel-Lindau (VHL; VHL beta) domain that has a significant role in tumor suppression. This domain is conserved throughout evolution, as its homologs are available in various types of the organism like mammals, insects, and nematode. The gene product of VHL has a critical regulatory activity in the ubiquitous oxygen-sensing pathway. This domain has a significant role in inhibiting cell proliferation, angiogenesis progression, kidney cancer, breast cancer, and colon cancer. At last, the current study depicts that the annotated hypothetical protein is linked with tumor suppressor activity which might be of great interest to future research in the higher organism.

Draft Genome of Toxocara canis, a Pathogen Responsible for Visceral Larva Migrans

  • Kong, Jinhwa;Won, Jungim;Yoon, Jeehee;Lee, UnJoo;Kim, Jong-Il;Huh, Sun
    • Parasites, Hosts and Diseases
    • /
    • v.54 no.6
    • /
    • pp.751-758
    • /
    • 2016
  • This study aimed at constructing a draft genome of the adult female worm Toxocara canis using next-generation sequencing (NGS) and de novo assembly, as well as to find new genes after annotation using functional genomics tools. Using an NGS machine, we produced DNA read data of T. canis. The de novo assembly of the read data was performed using SOAPdenovo. RNA read data were assembled using Trinity. Structural annotation, homology search, functional annotation, classification of protein domains, and KEGG pathway analysis were carried out. Besides them, recently developed tools such as MAKER, PASA, Evidence Modeler, and Blast2GO were used. The scaffold DNA was obtained, the N50 was 108,950 bp, and the overall length was 341,776,187 bp. The N50 of the transcriptome was 940 bp, and its length was 53,046,952 bp. The GC content of the entire genome was 39.3%. The total number of genes was 20,178, and the total number of protein sequences was 22,358. Of the 22,358 protein sequences, 4,992 were newly observed in T. canis. Following proteins previously unknown were found: E3 ubiquitin-protein ligase cbl-b and antigen T-cell receptor, zeta chain for T-cell and B-cell regulation; endoprotease bli-4 for cuticle metabolism; mucin 12Ea and polymorphic mucin variant C6/1/40r2.1 for mucin production; tropomodulin-family protein and ryanodine receptor calcium release channels for muscle movement. We were able to find new hypothetical polypeptides sequences unique to T. canis, and the findings of this study are capable of serving as a basis for extending our biological understanding of T. canis.

Identification of long non-coding RNA-mRNA interactions and genome-wide lncRNA annotation in animal transcriptome profiling

  • Yoon-Been Park;Jun-Mo Kim
    • Journal of Animal Science and Technology
    • /
    • v.65 no.2
    • /
    • pp.293-310
    • /
    • 2023
  • Protein-translated mRNA analysis has been extensively used to determine the function of various traits in animals. The non-coding RNA (ncRNA), which was known to be non-functional because it was not encoded as a protein, was re-examined as it was studied to actually function. One of the ncRNAs, long non-coding RNA (lncRNA), is known to have a function of regulating mRNA expression, and its importance is emerging. Therefore, lncRNAs are currently being used to understand the traits of various animals as well as human diseases. However, studies on lncRNA annotation and its functions are still lacking in most animals except humans and mice. lncRNAs have unique characteristics of lncRNAs and interact with mRNA through various mechanisms. In order to make lncRNA annotations in animals in the future, it is essential to understand the characteristics of lncRNAs and the mechanisms by which lncRNAs function. In addition, this will allow lncRNAs to be used for a wider variety of traits in a wider range of animals, and it is expected that integrated analysis using other biological information will be possible.

Chromosome-Centric Human Proteome Study of Chromosome 11 Team

  • Hwang, Heeyoun;Kim, Jin Young;Yoo, Jong Shin
    • Mass Spectrometry Letters
    • /
    • v.12 no.3
    • /
    • pp.60-65
    • /
    • 2021
  • As a part of the Chromosome-centric Human Proteome Project (C-HPP), we have developed a few algorithms for accurate identification of missing proteins, alternative splicing variants, single amino acid variants, and characterization of function unannotated proteins. We have found missing proteins, novel and known ASVs, and SAAVs using LC-MS/MS data from human brain and olfactory epithelial tissue, where we validated their existence using synthetic peptides. According to the neXtProt database, the number of missing proteins in chromosome 11 shows a decreasing pattern. The development of genomic and transcriptomic sequencing techniques make the number of protein variants in chromosome 11 tremendously increase. We developed a web solution named as SAAvpedia for identification and function annotation of SAAVs, and the SAAV information is automatically transformed into the neXtProt web page using REST API service. For the 73 uPE1 in chromosome 11, we have studied the function annotaion of CCDC90B (NX_Q9GZT6), SMAP (NX_O00193), and C11orf52 (NX_Q96A22).

Molecular characterization and functional annotation of a hypothetical protein (SCO0618) of Streptomyces coelicolor A3(2)

  • Ferdous, Nadim;Reza, Mahjerin Nasrin;Emon, Md. Tabassum Hossain;Islam, Md. Shariful;Mohiuddin, A.K.M.;Hossain, Mohammad Uzzal
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.28.1-28.9
    • /
    • 2020
  • Streptomyces coelicolor is a gram-positive soil bacterium which is well known for the production of several antibiotics used in various biotechnological applications. But numerous proteins from its genome are considered hypothetical proteins. Therefore, the present study aimed to reveal the functions of a hypothetical protein from the genome of S. coelicolor. Several bioinformatics tools were employed to predict the structure and function of this protein. Sequence similarity was searched through the available bioinformatics databases to find out the homologous protein. The secondary and tertiary structure were predicted and further validated with quality assessment tools. Furthermore, the active site and the interacting proteins were also explored with the utilization of CASTp and STRING server. The hypothetical protein showed the important biological activity having with two functional domain including POD-like_MBL-fold and rhodanese homology domain. The functional annotation exposed that the selected hypothetical protein could show the hydrolase activity. Furthermore, protein-protein interactions of selected hypothetical protein revealed several functional partners those have the significant role for the bacterial survival. At last, the current study depicts that the annotated hypothetical protein is linked with hydrolase activity which might be of great interest to the further research in bacterial genetics.

CaGe: A Web-Based Cancer Gene Annotation System for Cancer Genomics

  • Park, Young-Kyu;Kang, Tae-Wook;Baek, Su-Jin;Kim, Kwon-Il;Kim, Seon-Young;Lee, Do-Heon;Kim, Yong-Sung
    • Genomics & Informatics
    • /
    • v.10 no.1
    • /
    • pp.33-39
    • /
    • 2012
  • High-throughput genomic technologies (HGTs), including next-generation DNA sequencing (NGS), microarray, and serial analysis of gene expression (SAGE), have become effective experimental tools for cancer genomics to identify cancer-associated somatic genomic alterations and genes. The main hurdle in cancer genomics is to identify the real causative mutations or genes out of many candidates from an HGT-based cancer genomic analysis. One useful approach is to refer to known cancer genes and associated information. The list of known cancer genes can be used to determine candidates of cancer driver mutations, while cancer gene-related information, including gene expression, protein-protein interaction, and pathways, can be useful for scoring novel candidates. Some cancer gene or mutation databases exist for this purpose, but few specialized tools exist for an automated analysis of a long gene list from an HGT-based cancer genomic analysis. This report presents a new web-accessible bioinformatic tool, called CaGe, a cancer genome annotation system for the assessment of candidates of cancer genes from HGT-based cancer genomics. The tool provides users with information on cancer-related genes, mutations, pathways, and associated annotations through annotation and browsing functions. With this tool, researchers can classify their candidate genes from cancer genome studies into either previously reported or novel categories of cancer genes and gain insight into underlying carcinogenic mechanisms through a pathway analysis. We show the usefulness of CaGe by assessing its performance in annotating somatic mutations from a published small cell lung cancer study.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.