• Title/Summary/Keyword: Gene annotation

Search Result 184, Processing Time 0.021 seconds

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 2018.05a
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

CaGe: A Web-Based Cancer Gene Annotation System for Cancer Genomics

  • Park, Young-Kyu;Kang, Tae-Wook;Baek, Su-Jin;Kim, Kwon-Il;Kim, Seon-Young;Lee, Do-Heon;Kim, Yong-Sung
    • Genomics & Informatics
    • /
    • v.10 no.1
    • /
    • pp.33-39
    • /
    • 2012
  • High-throughput genomic technologies (HGTs), including next-generation DNA sequencing (NGS), microarray, and serial analysis of gene expression (SAGE), have become effective experimental tools for cancer genomics to identify cancer-associated somatic genomic alterations and genes. The main hurdle in cancer genomics is to identify the real causative mutations or genes out of many candidates from an HGT-based cancer genomic analysis. One useful approach is to refer to known cancer genes and associated information. The list of known cancer genes can be used to determine candidates of cancer driver mutations, while cancer gene-related information, including gene expression, protein-protein interaction, and pathways, can be useful for scoring novel candidates. Some cancer gene or mutation databases exist for this purpose, but few specialized tools exist for an automated analysis of a long gene list from an HGT-based cancer genomic analysis. This report presents a new web-accessible bioinformatic tool, called CaGe, a cancer genome annotation system for the assessment of candidates of cancer genes from HGT-based cancer genomics. The tool provides users with information on cancer-related genes, mutations, pathways, and associated annotations through annotation and browsing functions. With this tool, researchers can classify their candidate genes from cancer genome studies into either previously reported or novel categories of cancer genes and gain insight into underlying carcinogenic mechanisms through a pathway analysis. We show the usefulness of CaGe by assessing its performance in annotating somatic mutations from a published small cell lung cancer study.

Multi-tissue observation of the long non-coding RNA effects on sexually biased gene expression in cattle

  • Yoon, Joon;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.7
    • /
    • pp.1044-1051
    • /
    • 2019
  • Objective: Recent studies have implied that gene expression has high tissue-specificity, and therefore it is essential to investigate gene expression in a variety of tissues when performing the transcriptomic analysis. In addition, the gradual increase of long non-coding RNA (lncRNA) annotation database has increased the importance and proportion of mapped reads accordingly. Methods: We employed simple statistical models to detect the sexually biased/dimorphic genes and their conjugate lncRNAs in 40 RNA-seq samples across two factors: sex and tissue. We employed two quantification pipeline: mRNA annotation only and mRNA+lncRNA annotation. Results: As a result, the tissue-specific sexually dimorphic genes are affected by the addition of lncRNA annotation at a non-negligible level. In addition, many lncRNAs are expressed in a more tissue-specific fashion and with greater variation between tissues compared to protein-coding genes. Due to the genic region lncRNAs, the differentially expressed gene list changes, which results in certain sexually biased genes to become ambiguous across the tissues. Conclusion: In a past study, it has been reported that tissue-specific patterns can be seen throughout the differentially expressed genes between sexes in cattle. Using the same dataset, this study used a more recent reference, and the addition of conjugate lncRNA information, which revealed alterations of differentially expressed gene lists that result in an apparent distinction in the downstream analysis and interpretation. We firmly believe such misquantification of genic lncRNAs can be vital in both future and past studies.

SFannotation: A Simple and Fast Protein Function Annotation System

  • Yu, Dong Su;Kim, Byung Kwon
    • Genomics & Informatics
    • /
    • v.12 no.2
    • /
    • pp.76-78
    • /
    • 2014
  • Owing to the generation of vast amounts of sequencing data by using cost-effective, high-throughput sequencing technologies with improved computational approaches, many putative proteins have been discovered after assembly and structural annotation. Putative proteins are typically annotated using a functional annotation system that uses extant databases, but the expansive size of these databases often causes a bottleneck for rapid functional annotation. We developed SFannotation, a simple and fast functional annotation system that rapidly annotates putative proteins against four extant databases, Swiss-Prot, TIGRFAMs, Pfam, and the non-redundant sequence database, by using a best-hit approach with BLASTP and HMMSEARCH.

Implementation of Gene Information System (유전자 정보시스템 설계 및 구현)

  • Choi, Nak-Joong;Choi, Han Suk;Kim, Dong-Wook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.549-550
    • /
    • 2018
  • We have developed a web server for the high throughput annotation of gene. This system processes entire data sets with an automated pipeline of 13 analytic services, then deposits the data into the MySQL database and transforms it into three kinds of reports: preprocessing, assembling and annotation.

  • PDF

PromoterWizard: An Integrated Promoter Prediction Program Using Hybrid Methods

  • Park, Kie-Jung;Kim, Ki-Bong
    • Genomics & Informatics
    • /
    • v.9 no.4
    • /
    • pp.194-196
    • /
    • 2011
  • Promoter prediction is a very important problem and is closely related to the main problems of bioinformatics such as the construction of gene regulatory networks and gene function annotation. In this context, we developed an integrated promoter prediction program using hybrid methods, PromoterWizard, which can be employed to detect the core promoter region and the transcription start site (TSS) in vertebrate genomic DNA sequences, an issue of obvious importance for genome annotation efforts. PromoterWizard consists of three main modules and two auxiliary modules. The three main modules include CDRM (Composite Dependency Reflecting Model) module, SVM (Support Vector Machine) module, and ICM (Interpolated Context Model) module. The two auxiliary modules are CpG Island Detector and GCPlot that may contribute to improving the predictive accuracy of the three main modules and facilitating human curator to decide on the final annotation.

A semi-automatic cell type annotation method for single-cell RNA sequencing dataset

  • Kim, Wan;Yoon, Sung Min;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.26.1-26.6
    • /
    • 2020
  • Single-cell RNA sequencing (scRNA-seq) has been widely applied to provide insights into the cell-by-cell expression difference in a given bulk sample. Accordingly, numerous analysis methods have been developed. As it involves simultaneous analyses of many cell and genes, efficiency of the methods is crucial. The conventional cell type annotation method is laborious and subjective. Here we propose a semi-automatic method that calculates a normalized score for each cell type based on user-supplied cell type-specific marker gene list. The method was applied to a publicly available scRNA-seq data of mouse cardiac non-myocyte cell pool. Annotating the 35 t-stochastic neighbor embedding clusters into 12 cell types was straightforward, and its accuracy was evaluated by constructing co-expression network for each cell type. Gene Ontology analysis was congruent with the annotated cell type and the corollary regulatory network analysis showed upstream transcription factors that have well supported literature evidences. The source code is available as an R script upon request.

Loss of Heterozygosity at the Calcium Regulation Gene Locus on Chromosome 10q in Human Pancreatic Cancer

  • Long, Jin;Zhang, Zhong-Bo;Liu, Zhe;Xu, Yuan-Hong;Ge, Chun-Lin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.6
    • /
    • pp.2489-2493
    • /
    • 2015
  • Background: Loss of heterozygosity (LOH) on chromosomal regions is crucial in tumor progression and this study aimed to identify genome-wide LOH in pancreatic cancer. Materials and Methods: Single-nucleotide polymorphism (SNP) profiling data GSE32682 of human pancreatic samples snap-frozen during surgery were downloaded from Gene Expression Omnibus database. Genotype console software was used to perform data processing. Candidate genes with LOH were screened based on the genotype calls, SNP loci of LOH and dbSNP database. Gene annotation was performed to identify the functions of candidate genes using NCBI (the National Center for Biotechnology Information) database, followed by Gene Ontology, INTERPRO, PFAM and SMART annotation and UCSC Genome Browser track to the unannotated genes using DAVID (the Database for Annotation, Visualization and Integration Discovery). Results: The candidate genes with LOH identified in this study were MCU, MICU1 and OIT3 on chromosome 10. MCU was found to encode a calcium transporter and MICU1 could encode an essential regulator of mitochondrial $Ca^{2+}$ uptake. OIT3 possibly correlated with calcium binding revealed by the annotation analyses and was regulated by a large number of transcription factors including STAT, SOX9, CREB, NF-kB, PPARG and p53. Conclusions: Global genomic analysis of SNPs identified MICU1, MCU and OIT3 with LOH on chromosome 10, implying involvement of these genes in progression of pancreatic cancer.

Rough Computational Annotation and Hierarchical Conserved Area Viewing Tool for Genomes Using Multiple Relation Graph. (다중 관계 그래프를 이용한 유전체 보존영역의 계층적 시각화와 개략적 전사 annotation 도구)

  • Lee, Do-Hoon
    • Journal of Life Science
    • /
    • v.18 no.4
    • /
    • pp.565-571
    • /
    • 2008
  • Due to rapid development of bioinformatics technologies, various biological data have been produced in silico. So now days complicated and large scale biodata are used to accomplish requirement of researcher. Developing visualization and annotation tool using them is still hot issues although those have been studied for a decade. However, diversity and various requirements of users make us hard to develop general purpose tool. In this paper, I propose a novel system, Genome Viewer and Annotation tool (GenoVA), to annotate and visualize among genomes using known information and multiple relation graph. There are several multiple alignment tools but they lose conserved area for complexity of its constrains. The GenoVA extracts all associated information between all pair genomes by extending pairwise alignment. High frequency conserved area and high BLAST score make a block node of relation graph. To represent multiple relation graph, the system connects among associated block nodes. Also the system shows the known information, COG, gene and hierarchical path of block node. In this case, the system can annotates missed area and unknown gene by navigating the special block node's clustering. I experimented ten bacteria genomes for extracting the feature to visualize and annotate among them. GenoVA also supports simple and rough computational annotation of new genome.

Xperanto: A Web-Based Integrated System for DNA Microarray Data Management and Analysis

  • Park, Ji Yeon;Park, Yu Rang;Park, Chan Hee;Kim, Ji Hoon;Kim, Ju Ha
    • Genomics & Informatics
    • /
    • v.3 no.1
    • /
    • pp.39-42
    • /
    • 2005
  • DNA microarray is a high-throughput biomedical technology that monitors gene expression for thousands of genes in parallel. The abundance and complexity of the gene expression data have given rise to a requirement for their systematic management and analysis to support many laboratories performing microarray research. On these demands, we developed Xperanto for integrated data management and analysis using user-friendly web-based interface. Xperanto provides an integrated environment for management and analysis by linking the computational tools and rich sources of biological annotation. With the growing needs of data sharing, it is designed to be compliant to MGED (Microarray Gene Expression Data) standards for microarray data annotation and exchange. Xperanto enables a fast and efficient management of vast amounts of data, and serves as a communication channel among multiple researchers within an emerging interdisciplinary field.