• Title/Summary/Keyword: genome annotation

Search Result 179, Processing Time 0.02 seconds

Functional Genomic Approaches Using the Nematode Caenorhabditis elegans as a Model System

  • Lee, Jun-Ho;Nam, Seung-Hee;Hwang, Soon-Baek;Hong, Min-Gi;Kwon, Jae-Young;Joeng, Kyu-Sang;Im, Seol-Hee;Shim, Ji-Won;Park, Moon-Cheol
    • BMB Reports
    • /
    • v.37 no.1
    • /
    • pp.107-113
    • /
    • 2004
  • Since the completion of the genome project of the nematode C. elegans in 1998, functional genomic approaches have been applied to elucidate the gene and protein networks in this model organism. The recent completion of the whole genome of C. briggsae, a close sister species of C. elegans, now makes it possible to employ the comparative genomic approaches for identifying regulatory mechanisms that are conserved in these species and to make more precise annotation of the predicted genes. RNA interference (RNAi) screenings in C. elegans have been performed to screen the whole genome for the genes whose mutations give rise to specific phenotypes of interest. RNAi screens can also be used to identify genes that act genetically together with a gene of interest. Microarray experiments have been very useful in identifying genes that exhibit co-regulated expression profiles in given genetic or environmental conditions. Proteomic approaches also can be applied to the nematode, just as in other species whose genomes are known. With all these functional genomic tools, genetics will still remain an important tool for gene function studies in the post genome era. New breakthroughs in C. elegans biology, such as establishing a feasible gene knockout method, immortalized cell lines, or identifying viruses that can be used as vectors for introducing exogenous gene constructs into the worms, will augment the usage of this small organism for genome-wide biology.

Comparative Genomic Analysis and BTEX Degradation Pathways of a Thermotolerant Cupriavidus cauae PHS1

  • Chandran Sathesh-Prabu;Jihoon Woo;Yuchan Kim;Suk Min Kim;Sun Bok Lee;Che Ok Jeon;Donghyuk Kim;Sung Kuk Lee
    • Journal of Microbiology and Biotechnology
    • /
    • v.33 no.7
    • /
    • pp.875-885
    • /
    • 2023
  • Volatile organic compounds such as benzene, toluene, ethylbenzene, and isomers of xylenes (BTEX) constitute a group of monoaromatic compounds that are found in petroleum and have been classified as priority pollutants. In this study, based on its newly sequenced genome, we reclassified the previously identified BTEX-degrading thermotolerant strain Ralstonia sp. PHS1 as Cupriavidus cauae PHS1. Also presented are the complete genome sequence of C. cauae PHS1, its annotation, species delineation, and a comparative analysis of the BTEX-degrading gene cluster. Moreover, we cloned and characterized the BTEX-degrading pathway genes in C. cauae PHS1, the BTEX-degrading gene cluster of which consists of two monooxygenases and meta-cleavage genes. A genome-wide investigation of the PHS1 coding sequence and the experimentally confirmed regioselectivity of the toluene monooxygenases and catechol 2,3-dioxygenase allowed us to reconstruct the BTEX degradation pathway. The degradation of BTEX begins with aromatic ring hydroxylation, followed by ring cleavage, and eventually enters the core carbon metabolism. The information provided here on the genome and BTEX-degrading pathway of the thermotolerant strain C. cauae PHS1 could be useful in constructing an efficient production host.

Status of Philippine Mango Genomics: Enriching Molecular Genomics Towards a Globally Competitive Philippine Mango Industry

  • Eureka Teresa M. Ocampo;Cris Q. Cortaga;Jhun Laurence S. Rasco;John Albert P. Lachica;Darlon V. Lantican
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.28-28
    • /
    • 2022
  • This paper presents the first genome assemblies of Philippine mangoes that provide valuable reference for varietal improvement and genomic studies on mango and related fruit crops. WE sequenced whole genomes of3 species, Mangifera odorata (Huani), Mangifera altissima (Paho), and Mangifera indica 'Carabao' (Sweet Elena). 'Carabao' is the major export variety of the Philippines; Paho is identified as vulnerable by the IUCN Red List of Threatened Species; Huani has fruit sap acrid which is the primary defense mechanism against insects and birds. We used Falcon, a diploid aware -de novo assembler to assemble SMRT generated long-read sequences. Falcon-unzip was employed to phase the output assembly producing larger contig sets (primary contigs) and shorter contigs corresponding to haplotypes (haplotigs). Assembly statistics were generated by comparing the assembly to a reference genome, Tommy Atkins, using Quality Assessment Tool (QUAST). Moreover, the extent of duplication and completeness of gene content was measured using Benchmarking Universal Single-Copy Orthologs (BUSCO). Draft assemblies with high duplications were processed using Purge Haplotigs and Purge Dups to lessen duplications with minimal impact on genome completeness. De novo assemblies of Huani, Paho and 'Carabao' were then generated with primary contig sizes of 463.64 Mb, 508.95 Mb and 401.51 Mb respectively. These draft assemblies of Huani, Paho and 'Carabao' showed 96.90%, 95.17% and 99.07% complete BUSCOs respectively which is comparable to 'Tommy Atkins' genome (98.6%). Using two mango transcriptome data (pooled RNA-seq from different mango varieties and tissues), 91-96% or 24-30 million reads were successfully mapped back for each generated assembly indicating high degree of completeness. The results obtained demonstrated the highly contiguous, phased, and near complete genome assembly of three Philippine mango species for structural and functional annotation of gene units, especially those with economic importance.

  • PDF

OrCanome: a Comprehensive Resource for Oral Cancer

  • Bhartiya, Deeksha;Kumar, Amit;Singh, Harpreet;Sharma, Amitesh;Kaushik, Anita;Kumari, Suchitra;Mehrotra, Ravi
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.3
    • /
    • pp.1333-1336
    • /
    • 2016
  • Oral cancer is one of the most prevalent cancers in India but the underlying mechanisms are minimally unraveled. Cancer research has immensely benefited from genome scale high throughput studies which have contributed to expanding the volume of data. Such datasets also exist for oral cancer genes but there has been no consolidated approach to integrate the data to reveal meaningful biological information. OrCanome is one of the largest and comprehensive, user-friendly databases of oral cancer. It features a compilation of over 900 genes dysregulated in oral cancer and provides detailed annotations of the genes, transcripts and proteins along with additional information encompassing expression, inhibitors, epitopes and pathways. The resource has been envisioned as a one-stop solution for genomic, transcriptomic and proteomic annotation of these genes and the integrated approach will facilitate the identification of potential biomarkers and therapeutic targets.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

COVID-19 recommender system based on an annotated multilingual corpus

  • Barros, Marcia;Ruas, Pedro;Sousa, Diana;Bangash, Ali Haider;Couto, Francisco M.
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.24.1-24.7
    • /
    • 2021
  • Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Hypothetical protein predicted to be tumor suppressor: a protein functional analysis

  • Kader, Md. Abdul;Ahammed, Akash;Khan, Md. Sharif;Ashik, Sheikh Abdullah Al;Islam, Md. Shariful;Hossain, Mohammad Uzzal
    • Genomics & Informatics
    • /
    • v.20 no.1
    • /
    • pp.6.1-6.15
    • /
    • 2022
  • Litorilituus sediminis is a Gram-negative, aerobic, novel bacterium under the family of Colwelliaceae, has a stunning hypothetical protein containing domain called von Hippel-Lindau that has significant tumor suppressor activity. Therefore, this study was designed to elucidate the structure and function of the biologically important hypothetical protein EMK97_00595 (QBG34344.1) using several bioinformatics tools. The functional annotation exposed that the hypothetical protein is an extracellular secretory soluble signal peptide and contains the von Hippel-Lindau (VHL; VHL beta) domain that has a significant role in tumor suppression. This domain is conserved throughout evolution, as its homologs are available in various types of the organism like mammals, insects, and nematode. The gene product of VHL has a critical regulatory activity in the ubiquitous oxygen-sensing pathway. This domain has a significant role in inhibiting cell proliferation, angiogenesis progression, kidney cancer, breast cancer, and colon cancer. At last, the current study depicts that the annotated hypothetical protein is linked with tumor suppressor activity which might be of great interest to future research in the higher organism.

FCAnalyzer: A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms

  • Kim, Sang-Bae;Ryu, Gil-Mi;Kim, Young-Jin;Heo, Jee-Yeon;Park, Chan;Oh, Berm-Seok;Kim, Hyung-Lae;Kimm, Ku-Chan;Kim, Kyu-Won;Kim, Young-Youl
    • Genomics & Informatics
    • /
    • v.5 no.1
    • /
    • pp.10-18
    • /
    • 2007
  • Numerous studies have reported that genes with similar expression patterns are co-regulated. From gene expression data, we have assumed that genes having similar expression pattern would share similar transcription factor binding sites (TFBSs). These function as the binding regions for transcription factors (TFs) and thereby regulate gene expression. In this context, various analysis tools have been developed. However, they have shortcomings in the combined analysis of expression patterns and significant TFBSs and in the functional analysis of target genes of significantly overrepresented putative regulators. In this study, we present a web-based A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms (FCAnalyzer). This system integrates microarray clustering data with similar expression patterns, and TFBS data in each cluster. FCAnalyzer is designed to perform two independent clustering procedures. The first process clusters gene expression profiles using the K-means clustering method, and the second process clusters predicted TFBSs in the upstream region of previously clustered genes using the hierarchical biclustering method for simultaneous grouping of genes and samples. This system offers retrieved information for predicted TFBSs in each cluster using $Match^{TM}$ in the TRANSFAC database. We used gene ontology term analysis for functional annotation of genes in the same cluster. We also provide the user with a combinatorial TFBS analysis of TFBS pairs. The enrichment of TFBS analysis and GO term analysis is statistically by the calculation of P values based on Fisher’s exact test, hypergeometric distribution and Bonferroni correction. FCAnalyzer is a web-based, user-friendly functional clustering analysis system that facilitates the transcriptional regulatory analysis of co-expressed genes. This system presents the analyses of clustered genes, significant TFBSs, significantly enriched TFBS combinations, their target genes and TFBS-TF pairs.

Mining the Proteome of Fusobacterium nucleatum subsp. nucleatum ATCC 25586 for Potential Therapeutics Discovery: An In Silico Approach

  • Habib, Abdul Musaweer;Islam, Md. Saiful;Sohel, Md.;Mazumder, Md. Habibul Hasan;Sikder, Mohd. Omar Faruk;Shahik, Shah Md.
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.255-264
    • /
    • 2016
  • The plethora of genome sequence information of bacteria in recent times has ushered in many novel strategies for antibacterial drug discovery and facilitated medical science to take up the challenge of the increasing resistance of pathogenic bacteria to current antibiotics. In this study, we adopted subtractive genomics approach to analyze the whole genome sequence of the Fusobacterium nucleatum, a human oral pathogen having association with colorectal cancer. Our study divulged 1,499 proteins of F. nucleatum, which have no homolog's in human genome. These proteins were subjected to screening further by using the Database of Essential Genes (DEG) that resulted in the identification of 32 vitally important proteins for the bacterium. Subsequent analysis of the identified pivotal proteins, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Automated Annotation Server (KAAS) resulted in sorting 3 key enzymes of F. nucleatum that may be good candidates as potential drug targets, since they are unique for the bacterium and absent in humans. In addition, we have demonstrated the three dimensional structure of these three proteins. Finally, determination of ligand binding sites of the 2 key proteins as well as screening for functional inhibitors that best fitted with the ligands sites were conducted to discover effective novel therapeutic compounds against F. nucleatum.

Genomic Analysis of the Moderately Haloalkaliphilic Bacterium Oceanobacillus kimchii Strain X50T with Improved High-Quality Draft Genome Sequences

  • Hyun, Dong-Wook;Whon, Tae Woong;Kim, Joon-Yong;Kim, Pil Soo;Shin, Na-Ri;Kim, Min-Soo;Bae, Jin-Woo
    • Journal of Microbiology and Biotechnology
    • /
    • v.25 no.12
    • /
    • pp.1971-1976
    • /
    • 2015
  • Oceanobacillus kimchii is a member of the genus Oceanobacillus within the family Bacillaceae. Species of the Oceanobacillus possess moderate haloalkaliphilic features and originate from various alkali or salty environments. The haloalkaliphilic characteristics of Oceanobacillus advocate they may have possible uses in biotechnological and industrial applications, such as alkaline enzyme production and biodegradation. This study presents the draft genome sequence of O. kimchii X50T and its annotation. Furthermore, comparative genomic analysis of O. kimchii X50T was performed with two previously reported Oceanobacillus genome sequences. The 3,822,411 base-pair genome contains 3,792 protein-coding genes and 80 RNA genes with an average G+C content of 35.18 mol%. The strain carried 67 and 13 predicted genes annotated with transport system and osmoregulation, respectively, which support the tolerance phenotype of the strain in high-alkali and high-salt environments.