• 제목/요약/키워드: gene annotation

검색결과 184건 처리시간 0.028초

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 한국균학회 2018년도 춘계학술대회 및 임시총회
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

CaGe: A Web-Based Cancer Gene Annotation System for Cancer Genomics

  • Park, Young-Kyu;Kang, Tae-Wook;Baek, Su-Jin;Kim, Kwon-Il;Kim, Seon-Young;Lee, Do-Heon;Kim, Yong-Sung
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.33-39
    • /
    • 2012
  • High-throughput genomic technologies (HGTs), including next-generation DNA sequencing (NGS), microarray, and serial analysis of gene expression (SAGE), have become effective experimental tools for cancer genomics to identify cancer-associated somatic genomic alterations and genes. The main hurdle in cancer genomics is to identify the real causative mutations or genes out of many candidates from an HGT-based cancer genomic analysis. One useful approach is to refer to known cancer genes and associated information. The list of known cancer genes can be used to determine candidates of cancer driver mutations, while cancer gene-related information, including gene expression, protein-protein interaction, and pathways, can be useful for scoring novel candidates. Some cancer gene or mutation databases exist for this purpose, but few specialized tools exist for an automated analysis of a long gene list from an HGT-based cancer genomic analysis. This report presents a new web-accessible bioinformatic tool, called CaGe, a cancer genome annotation system for the assessment of candidates of cancer genes from HGT-based cancer genomics. The tool provides users with information on cancer-related genes, mutations, pathways, and associated annotations through annotation and browsing functions. With this tool, researchers can classify their candidate genes from cancer genome studies into either previously reported or novel categories of cancer genes and gain insight into underlying carcinogenic mechanisms through a pathway analysis. We show the usefulness of CaGe by assessing its performance in annotating somatic mutations from a published small cell lung cancer study.

Multi-tissue observation of the long non-coding RNA effects on sexually biased gene expression in cattle

  • Yoon, Joon;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제32권7호
    • /
    • pp.1044-1051
    • /
    • 2019
  • Objective: Recent studies have implied that gene expression has high tissue-specificity, and therefore it is essential to investigate gene expression in a variety of tissues when performing the transcriptomic analysis. In addition, the gradual increase of long non-coding RNA (lncRNA) annotation database has increased the importance and proportion of mapped reads accordingly. Methods: We employed simple statistical models to detect the sexually biased/dimorphic genes and their conjugate lncRNAs in 40 RNA-seq samples across two factors: sex and tissue. We employed two quantification pipeline: mRNA annotation only and mRNA+lncRNA annotation. Results: As a result, the tissue-specific sexually dimorphic genes are affected by the addition of lncRNA annotation at a non-negligible level. In addition, many lncRNAs are expressed in a more tissue-specific fashion and with greater variation between tissues compared to protein-coding genes. Due to the genic region lncRNAs, the differentially expressed gene list changes, which results in certain sexually biased genes to become ambiguous across the tissues. Conclusion: In a past study, it has been reported that tissue-specific patterns can be seen throughout the differentially expressed genes between sexes in cattle. Using the same dataset, this study used a more recent reference, and the addition of conjugate lncRNA information, which revealed alterations of differentially expressed gene lists that result in an apparent distinction in the downstream analysis and interpretation. We firmly believe such misquantification of genic lncRNAs can be vital in both future and past studies.

SFannotation: A Simple and Fast Protein Function Annotation System

  • Yu, Dong Su;Kim, Byung Kwon
    • Genomics & Informatics
    • /
    • 제12권2호
    • /
    • pp.76-78
    • /
    • 2014
  • Owing to the generation of vast amounts of sequencing data by using cost-effective, high-throughput sequencing technologies with improved computational approaches, many putative proteins have been discovered after assembly and structural annotation. Putative proteins are typically annotated using a functional annotation system that uses extant databases, but the expansive size of these databases often causes a bottleneck for rapid functional annotation. We developed SFannotation, a simple and fast functional annotation system that rapidly annotates putative proteins against four extant databases, Swiss-Prot, TIGRFAMs, Pfam, and the non-redundant sequence database, by using a best-hit approach with BLASTP and HMMSEARCH.

유전자 정보시스템 설계 및 구현 (Implementation of Gene Information System)

  • 최낙중;최한석;김동욱
    • 한국콘텐츠학회:학술대회논문집
    • /
    • 한국콘텐츠학회 2018년도 춘계 종합학술대회 논문집
    • /
    • pp.549-550
    • /
    • 2018
  • We have developed a web server for the high throughput annotation of gene. This system processes entire data sets with an automated pipeline of 13 analytic services, then deposits the data into the MySQL database and transforms it into three kinds of reports: preprocessing, assembling and annotation.

  • PDF

PromoterWizard: An Integrated Promoter Prediction Program Using Hybrid Methods

  • Park, Kie-Jung;Kim, Ki-Bong
    • Genomics & Informatics
    • /
    • 제9권4호
    • /
    • pp.194-196
    • /
    • 2011
  • Promoter prediction is a very important problem and is closely related to the main problems of bioinformatics such as the construction of gene regulatory networks and gene function annotation. In this context, we developed an integrated promoter prediction program using hybrid methods, PromoterWizard, which can be employed to detect the core promoter region and the transcription start site (TSS) in vertebrate genomic DNA sequences, an issue of obvious importance for genome annotation efforts. PromoterWizard consists of three main modules and two auxiliary modules. The three main modules include CDRM (Composite Dependency Reflecting Model) module, SVM (Support Vector Machine) module, and ICM (Interpolated Context Model) module. The two auxiliary modules are CpG Island Detector and GCPlot that may contribute to improving the predictive accuracy of the three main modules and facilitating human curator to decide on the final annotation.

A semi-automatic cell type annotation method for single-cell RNA sequencing dataset

  • Kim, Wan;Yoon, Sung Min;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • 제18권3호
    • /
    • pp.26.1-26.6
    • /
    • 2020
  • Single-cell RNA sequencing (scRNA-seq) has been widely applied to provide insights into the cell-by-cell expression difference in a given bulk sample. Accordingly, numerous analysis methods have been developed. As it involves simultaneous analyses of many cell and genes, efficiency of the methods is crucial. The conventional cell type annotation method is laborious and subjective. Here we propose a semi-automatic method that calculates a normalized score for each cell type based on user-supplied cell type-specific marker gene list. The method was applied to a publicly available scRNA-seq data of mouse cardiac non-myocyte cell pool. Annotating the 35 t-stochastic neighbor embedding clusters into 12 cell types was straightforward, and its accuracy was evaluated by constructing co-expression network for each cell type. Gene Ontology analysis was congruent with the annotated cell type and the corollary regulatory network analysis showed upstream transcription factors that have well supported literature evidences. The source code is available as an R script upon request.

Loss of Heterozygosity at the Calcium Regulation Gene Locus on Chromosome 10q in Human Pancreatic Cancer

  • Long, Jin;Zhang, Zhong-Bo;Liu, Zhe;Xu, Yuan-Hong;Ge, Chun-Lin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권6호
    • /
    • pp.2489-2493
    • /
    • 2015
  • Background: Loss of heterozygosity (LOH) on chromosomal regions is crucial in tumor progression and this study aimed to identify genome-wide LOH in pancreatic cancer. Materials and Methods: Single-nucleotide polymorphism (SNP) profiling data GSE32682 of human pancreatic samples snap-frozen during surgery were downloaded from Gene Expression Omnibus database. Genotype console software was used to perform data processing. Candidate genes with LOH were screened based on the genotype calls, SNP loci of LOH and dbSNP database. Gene annotation was performed to identify the functions of candidate genes using NCBI (the National Center for Biotechnology Information) database, followed by Gene Ontology, INTERPRO, PFAM and SMART annotation and UCSC Genome Browser track to the unannotated genes using DAVID (the Database for Annotation, Visualization and Integration Discovery). Results: The candidate genes with LOH identified in this study were MCU, MICU1 and OIT3 on chromosome 10. MCU was found to encode a calcium transporter and MICU1 could encode an essential regulator of mitochondrial $Ca^{2+}$ uptake. OIT3 possibly correlated with calcium binding revealed by the annotation analyses and was regulated by a large number of transcription factors including STAT, SOX9, CREB, NF-kB, PPARG and p53. Conclusions: Global genomic analysis of SNPs identified MICU1, MCU and OIT3 with LOH on chromosome 10, implying involvement of these genes in progression of pancreatic cancer.

다중 관계 그래프를 이용한 유전체 보존영역의 계층적 시각화와 개략적 전사 annotation 도구 (Rough Computational Annotation and Hierarchical Conserved Area Viewing Tool for Genomes Using Multiple Relation Graph.)

  • 이도훈
    • 생명과학회지
    • /
    • 제18권4호
    • /
    • pp.565-571
    • /
    • 2008
  • 생물정보학의 발전으로 다양한 형태의 생물정보가 컴퓨터 프로그램에 의해 양산되고 있다. 단순한 서열간의 비교나 작은 규모의 자료를 처리하기 보다는 다각화된 정보와 대규모의 생물정보를 취급하고 있다. 그 중에서 시각화와 annotation를 위한 도구개발은 지난 10년간 많은 연구가 되고 있는 분야이다. 그럼에도 일반화된 도구 개발은 생물정보의 다양성과 사용자 요구의 다양화로 인해 매우 어렵다. 본 논문에서는 유전체간 알려진 정보와 다중 관계 그래프를 이용하여 이를 annotation하고 시각화하는 GenoVA 시스템을 제안한다. 다중 정렬을 위한 몇 개의 프로그램이 존재하지만 그 방법들이 서열내의 복잡성 때문에 많은 정보가 누락된다. 따라서 제안된 방법에서는 pairwise alignment를 확장하여 모든 유전체간 비교를 통해 연관성 도출한다. 유전체간 보존되는 영역의 빈도수와 BLAST 점수가 높은 것을 블록노드라 하고 이들 간의 연관관계를 다중 관계 그래프로 표현하였다. 또한 GenoVA는 알려진 정보, COG, 유전자를 시각화하고 다중 관계 그래프의 한 영역을 중심으로 클러스터링된 경로를 계층적으로 보여주었다. 이때 누락되거나 알려지지 않은 유전자나 다른 annotation정보 추출할 수 있다. 본 논문의 실험을 위해 열 개의 박테리아 유전체가 사용되었고 시각화와 annotation을 위한 자료로 활용하였다. GenoVA는 새로운 유전체에 대한 개략적이고 전산적 annotation을 직관적이고 편리하게 제공한다.

Xperanto: A Web-Based Integrated System for DNA Microarray Data Management and Analysis

  • Park, Ji Yeon;Park, Yu Rang;Park, Chan Hee;Kim, Ji Hoon;Kim, Ju Ha
    • Genomics & Informatics
    • /
    • 제3권1호
    • /
    • pp.39-42
    • /
    • 2005
  • DNA microarray is a high-throughput biomedical technology that monitors gene expression for thousands of genes in parallel. The abundance and complexity of the gene expression data have given rise to a requirement for their systematic management and analysis to support many laboratories performing microarray research. On these demands, we developed Xperanto for integrated data management and analysis using user-friendly web-based interface. Xperanto provides an integrated environment for management and analysis by linking the computational tools and rich sources of biological annotation. With the growing needs of data sharing, it is designed to be compliant to MGED (Microarray Gene Expression Data) standards for microarray data annotation and exchange. Xperanto enables a fast and efficient management of vast amounts of data, and serves as a communication channel among multiple researchers within an emerging interdisciplinary field.