• Title/Summary/Keyword: genome annotation

Search Result 179, Processing Time 0.025 seconds

A semi-automatic cell type annotation method for single-cell RNA sequencing dataset

  • Kim, Wan;Yoon, Sung Min;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.26.1-26.6
    • /
    • 2020
  • Single-cell RNA sequencing (scRNA-seq) has been widely applied to provide insights into the cell-by-cell expression difference in a given bulk sample. Accordingly, numerous analysis methods have been developed. As it involves simultaneous analyses of many cell and genes, efficiency of the methods is crucial. The conventional cell type annotation method is laborious and subjective. Here we propose a semi-automatic method that calculates a normalized score for each cell type based on user-supplied cell type-specific marker gene list. The method was applied to a publicly available scRNA-seq data of mouse cardiac non-myocyte cell pool. Annotating the 35 t-stochastic neighbor embedding clusters into 12 cell types was straightforward, and its accuracy was evaluated by constructing co-expression network for each cell type. Gene Ontology analysis was congruent with the annotated cell type and the corollary regulatory network analysis showed upstream transcription factors that have well supported literature evidences. The source code is available as an R script upon request.

Complete Genome Sequence of Enterococcus faecalis CAUM157 Isolated from Raw Cow's Milk

  • Elnar, Arxel G.;Lim, Sang-Dong;Kim, Geun-Bae
    • Journal of Dairy Science and Biotechnology
    • /
    • v.38 no.3
    • /
    • pp.142-145
    • /
    • 2020
  • Enterococcus faecalis CAUM157, isolated from raw cow's milk, is a Gram-positive, facultatively anaerobic, and non-spore-forming bacterium capable of inhabiting a wide range of environmental niches. E. faecalis CAUM157 was observed to produce a two-peptide bacteriocin that had a wide range of activity against several pathogens, including Listeria monocytogenes, Staphylococcus aureus, and periodontitis-causing bacteria. The whole genome of E. faecalis CAUM157 was sequenced using the PacBio RS II platform, revealing a genome size of 2,972,812 bp with a G+C ratio of 37.44%, assembled into two contigs. Annotation analysis revealed 2,830 coding sequences, 12 rRNAs, and 61 tRNAs. Further, in silico analysis of the genome identified a single bacteriocin gene cluster.

RGISS: Rice (Oryza sativa L. ssp. japonica) Genome Information Service System

  • Lee, Dae-Sang;Seo, Hwa-Jung;Hahn, Jang-Ho;Kong, Eun-Bae;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.5 no.4
    • /
    • pp.194-195
    • /
    • 2007
  • We have constructed the Rice Genome Information Service System (RGISS), which is an information service system of the Oryza sativa L. ssp. japonica (rice) genome, using the released version of rice Build 3.0 pseudomolecules based on the Ensembl architecture. The nonredundant library, composed of 3,360 clones of BACs, PACs, and fosmids, was used to construct supercontigs. RGISS contains 50,717 annotated genes from GenBank, 56,161 predicted genes from FgeneSH, and information on 9,587 markers, which includes STS, SSR, and EST-based RFLP. The 20,180 ESTs sequenced by the Korea National Institute of Agricultural Biotechnology (NIAB) were aligned and mapped into 168,792 exons. By gene ontology analysis, the classified protein numbers in the rice genome were 6158, 4531, and 12,364 proteins, which were mapped to molecular function, cellular component, and biological process, respectively.

Complete genome sequence of Streptococcus hyointestinalis B19, a strain producing bacteriocin, isolated from chicken feces

  • Lee, Ju-Eun;Heo, Sunhak;Kim, Geun-Bae
    • Journal of Animal Science and Technology
    • /
    • v.62 no.3
    • /
    • pp.420-422
    • /
    • 2020
  • Streptococcus hyointestinalis B19 was isolated from chicken feces collected from local farm in Anseong, Korea. S. hyointestinalis B19 was shown to produce bacteriocin-like compounds exhibiting inhibitory activities against several pathogens including strains of Clostridium perfringens and Listeria monocytogenes. The whole genome of S. hyointestinalis B19 strain was sequenced using PacBio RS II platform. The genome comprised four contigs with a size of 2,217,061 bp. The DNA G + C content was found to be 42.95 mol%. Annotation results revealed 2,266 coding sequences (CDSs), 18 rRNAs, and 61 tRNA genes. Based on genome analysis, we found that the strain B19 possessed various genes associated with bacteriocin synthesis, modification, and transport.

Complete genome sequence of Bacillus coagulans CACC834 isolated from canine

  • Kim, Jung-Ae;Kim, Dae-Hyuk;Kim, Yangseon
    • Journal of Animal Science and Technology
    • /
    • v.63 no.6
    • /
    • pp.1464-1467
    • /
    • 2021
  • Bacillus coagulans CACC 834 was isolated from canine feces, and its potential probiotic properties were characterized by functional genome analysis. Whole-genome sequencing of B. coagulans CACC 834 was performed using the PacBio RSII platforms. The complete genome assembly consisted of one circular chromosome (3.1 Mb) with guanine (G) + cytosine (C) content of 47.1%. Annotation revealed 3,181 protein-coding sequences (CDSs), 30 rRNAs, and 83 tRNAs. Gene associated 11% of the genes were involved in replication, recombination, and repair. We also annotated various stress-related, acid resistance, bile salt resistance and adhesion-related domains in this strain, which likely provide support in exerting probiotic action by survival under gastrointestinal tract. These results add to our comprehensive understanding of B. coagulans and suggest potential mammal-related industrial applications.

Sequencing and annotation of the complete mitochondrial genome of a threatened labeonine fish, Cirrhinus reba

  • Islam, Mohammad Nazrul;Sultana, Shirin;Alam, Md. Jobaidul
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.32.1-32.7
    • /
    • 2020
  • The mitochondrial genome of a species is an essential resource for its effective conservation and phylogenetic studies. In this article, we present sequencing and characterization of the complete mitochondrial genome of a threatened labeonine fish, Cirrhinus reba collected from Khulna region of Bangladesh. The complete mitochondrial genome was 16,597 bp in size, which formed a circular double-stranded DNA molecule containing a total of 37 mitochondrial genes (13 protein-coding genes, 2 ribosomal RNA genes, and 22 transfer RNA genes) with two non-coding regions, an origin of light strand replication (OL) and a displacement loop (D-loop), similar structure with other fishes of Teleostei. The phylogenetic tree demonstrated its close relationship with labeonine fishes. The complete mitogenome of Cirrhinus reba (GenBank no. MN862482) showed 99.96% identity to another haplotype of Cirrhinus reba (AP013325), followed by 90.18% identity with Labeo bata (AP011198).

Genome Sequencing and Genome-Wide Identification of Carbohydrate-Active Enzymes (CAZymes) in the White Rot Fungus Flammulina fennae

  • Lee, Chang-Soo;Kong, Won-Sik;Park, Young-Jin
    • Microbiology and Biotechnology Letters
    • /
    • v.46 no.3
    • /
    • pp.300-312
    • /
    • 2018
  • Whole-genome sequencing of the wood-rotting fungus, Flammulina fennae, was carried out to identify carbohydrate-active enzymes (CAZymes). De novo genome assembly (31 kmer) of short reads by next-generation sequencing revealed a total genome length of 32,423,623 base pairs (39% GC). A total of 11,591 gene models in the assembled genome sequence of F. fennae were predicted by ab initio gene prediction using the AUGUSTUS tool. In a genome-wide comparison, 6,715 orthologous groups shared at least one gene with F. fennae and 10,667 (92%) of 11,591 genes for F. fennae proteins had orthologs among the Dikarya. Additionally, F. fennae contained 23 species-specific genes, of which 16 were paralogous. CAZyme identification and annotation revealed 513 CAZymes, including 82 auxiliary activities, 220 glycoside hydrolases, 85 glycosyltransferases, 20 polysaccharide lyases, 57 carbohydrate esterases, and 45 carbohydrate binding-modules in the F. fennae genome. The genome information of F. fennae increases the understanding of this basidiomycete fungus. CAZyme gene information will be useful for detailed studies of lignocellulosic biomass degradation for biotechnological and industrial applications.

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19

  • Ouyang, Sizhuo;Wang, Yuxing;Zhou, Kaiyin;Xia, Jingbo
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.23.1-23.7
    • /
    • 2021
  • Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.

A biomedically oriented automatically annotated Twitter COVID-19 dataset

  • Hernandez, Luis Alberto Robles;Callahan, Tiffany J.;Banda, Juan M.
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.21.1-21.5
    • /
    • 2021
  • The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

Development of an Analysis Program of Type I Polyketide Synthase Gene Clusters Using Homology Search and Profile Hidden Markov Model

  • Tae, Hong-Seok;Sohng, Jae-Kyung;Park, Kie-Jung
    • Journal of Microbiology and Biotechnology
    • /
    • v.19 no.2
    • /
    • pp.140-146
    • /
    • 2009
  • MAPSI(Management and Analysis for Polyketide Synthase Type I) has been developed to offer computational analysis methods to detect type I PKS(polyketide synthase) gene clusters in genome sequences. MAPSI provides a genome analysis component, which detects PKS gene clusters by identifying domains in proteins of a genome. MAPSI also contains databases on polyketides and genome annotation data, as well as analytic components such as new PKS assembly and domain analysis. The polyketide data and analysis component are accessible through Web interfaces and are displayed with diverse information. MAPSI, which was developed to aid researchers studying type I polyketides, provides diverse components to access and analyze polyketide information and should become a very powerful computational tool for polyketide research. The system can be extended through further studies of factors related to the biological activities of polyketides.