• 제목/요약/키워드: genome annotation

검색결과 179건 처리시간 0.018초

Using the PubAnnotation ecosystem to perform agile text mining on Genomics & Informatics: a tutorial review

  • Nam, Hee-Jo;Yamada, Ryota;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • 제18권2호
    • /
    • pp.13.1-13.6
    • /
    • 2020
  • The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.

CaGe: A Web-Based Cancer Gene Annotation System for Cancer Genomics

  • Park, Young-Kyu;Kang, Tae-Wook;Baek, Su-Jin;Kim, Kwon-Il;Kim, Seon-Young;Lee, Do-Heon;Kim, Yong-Sung
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.33-39
    • /
    • 2012
  • High-throughput genomic technologies (HGTs), including next-generation DNA sequencing (NGS), microarray, and serial analysis of gene expression (SAGE), have become effective experimental tools for cancer genomics to identify cancer-associated somatic genomic alterations and genes. The main hurdle in cancer genomics is to identify the real causative mutations or genes out of many candidates from an HGT-based cancer genomic analysis. One useful approach is to refer to known cancer genes and associated information. The list of known cancer genes can be used to determine candidates of cancer driver mutations, while cancer gene-related information, including gene expression, protein-protein interaction, and pathways, can be useful for scoring novel candidates. Some cancer gene or mutation databases exist for this purpose, but few specialized tools exist for an automated analysis of a long gene list from an HGT-based cancer genomic analysis. This report presents a new web-accessible bioinformatic tool, called CaGe, a cancer genome annotation system for the assessment of candidates of cancer genes from HGT-based cancer genomics. The tool provides users with information on cancer-related genes, mutations, pathways, and associated annotations through annotation and browsing functions. With this tool, researchers can classify their candidate genes from cancer genome studies into either previously reported or novel categories of cancer genes and gain insight into underlying carcinogenic mechanisms through a pathway analysis. We show the usefulness of CaGe by assessing its performance in annotating somatic mutations from a published small cell lung cancer study.

PrimateDB: Development of Primate Genome DB and Web Service

  • Woo, Taeha;Shin, Gwangsik;Kang, Taewook;Kim, Byoungchul;Seo, Jungmin;Kim, Sang Soo;Kim, Chang-Bae
    • Genomics & Informatics
    • /
    • 제3권2호
    • /
    • pp.73-76
    • /
    • 2005
  • The comparative analysis of the human and primate genomes including the chimpanzee can reveal unique types of information impossible to obtain from comparing the human genome with the genomes of other vertebrates. PrimateDB is an open depository server that provides primate genome information for the comparative genome research. The database also provides an easy access to variable information within/between the primate genomes and supports analyzed information, such as annotation and retroelements and phylogeny. The comparative analyses of more primate genomes are also being included as the long-term objective.

Functional Annotation and Analysis of Korean Patented Biological Sequences Using Bioinformatics

  • Lee, Byung Wook;Kim, Tae Hyung;Kim, Seon Kyu;Kim, Sang Soo;Ryu, Gee Chan;Bhak, Jong
    • Molecules and Cells
    • /
    • 제21권2호
    • /
    • pp.269-275
    • /
    • 2006
  • A recent report of the Korean Intellectual Property Office(KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net

EST Knowledge Integrated Systems (EKIS): An Integrated Database of EST Information for Research Application

  • Kim, Dae-Won;Jung, Tae-Sung;Choi, Young-Sang;Nam, Seong-Hyeuk;Kwon, Hyuk-Ryul;Kim, Dong-Wook;Choi, Han-Suk;Choi, Sang-Heang;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • 제7권1호
    • /
    • pp.38-40
    • /
    • 2009
  • The EST Knowledge Integrated System, EKIS (http://ekis.kribb.re.kr), was established as a part of Korea's Ministry of Education, Science and Technology initiative for genome sequencing and application research of the biological model organisms (GEAR) project. The goals of the EKIS are to collect EST information from GEAR projects and make an integrated database to provide transcriptomic and metabolomic information for biological scientists. The EKIS constitutes five independent categories and several retrieval systems in each category for incorporating massive EST data from high-throughput sequencing of 65 different species. Through the EKIS database, scientists can freely access information including BLAST functional annotation as well as Genechip and pathway information for KEGG. By integrating complex data into a framework of existing EST knowledge information, the EKIS provides new insights into specialized metabolic pathway information for an applied industrial material.

XPERNATO-TOX: an Integrated Toxicogenomics Knowledgebase

  • Woo Jung-Hoon;Kim Hyeoun-Eui;Kong Gu;Kim Ju-Han
    • Genomics & Informatics
    • /
    • 제4권1호
    • /
    • pp.40-44
    • /
    • 2006
  • Toxicogenomics combines transcriptome, proteome and metabolome profiling with conventional toxicology to investigate the interaction between biological molecules and toxicant or environmental stress in disease caution. Toxicogenomics faces the problems of comparison and integration across different sources of data. Cause of unusual characteristics of toxicogenomic data, researcher should be assisted by data analysis and annotation for getting meaningful information. There are already existing repositories which claim to stand for toxicogenomics database. However, those just contain limited abilities for toxicogenomic research. For supporting toxicologist who comes up against toxicogenomic data flood, now we propose novel toxicogenomics knowledgebase system, XPERANTO-TOX. XPERANTO-TOX is an integrated system for toxicogenomic data management and analysis. It is composed of three distinct but closely connected parts. Firstly, Data Storage System is for reposit many kinds of '-omics' data and conventional toxicology data. Secondly, Data Analysis System consists of analytical modules for integrated toxicogenomics data. At last, Data Annotation System is for giving extensive insight of data to researcher.

Genomic Analysis of a Freshwater Actinobacterium, "Candidatus Limnosphaera aquatica" Strain IMCC26207, Isolated from Lake Soyang

  • Kim, Suhyun;Kang, Ilnam;Cho, Jang-Cheon
    • Journal of Microbiology and Biotechnology
    • /
    • 제27권4호
    • /
    • pp.825-833
    • /
    • 2017
  • Strain IMCC26207 was isolated from the surface layer of Lake Soyang in Korea by the dilutionto-extinction culturing method, using a liquid medium prepared with filtered and autoclaved lake water. The strain could neither be maintained in a synthetic medium other than natural freshwater medium nor grown on solid agar plates. Phylogenetic analysis of 16S rRNA gene sequences indicated that strain IMCC26207 formed a distinct lineage in the order Acidimicrobiales of the phylum Actinobacteria. The closest relative among the previously identified bacterial taxa was "Candidatus Microthrix parvicella" with 16S rRNA gene sequence similarity of 91.7%. Here, the draft genome sequence of strain IMCC26207, a freshwater actinobacterium, is reported with the description of the genome properties and annotation summary. The draft genome consisted of 10 contigs with a total size of 3,316,799 bp and an average G+C content of 57.3%. The IMCC26207 genome was predicted to contain 2,975 protein-coding genes and 51 non-coding RNA genes, including 45 tRNA genes. Approximately 76.8% of the protein coding genes could be assigned with a specific function. Annotation of the IMCC26207 genome showed several traits of adaptation to living in oligotrophic freshwater environments, such as phosphorus-limited condition. Comparative genomic analysis revealed that the genome of strain IMCC26207 was distinct from that of "Candidatus Microthrix" strains; therefore, we propose the name "Candidatus Limnosphaera aquatica" for this bacterium.

Xperanto: A Web-Based Integrated System for DNA Microarray Data Management and Analysis

  • Park, Ji Yeon;Park, Yu Rang;Park, Chan Hee;Kim, Ji Hoon;Kim, Ju Ha
    • Genomics & Informatics
    • /
    • 제3권1호
    • /
    • pp.39-42
    • /
    • 2005
  • DNA microarray is a high-throughput biomedical technology that monitors gene expression for thousands of genes in parallel. The abundance and complexity of the gene expression data have given rise to a requirement for their systematic management and analysis to support many laboratories performing microarray research. On these demands, we developed Xperanto for integrated data management and analysis using user-friendly web-based interface. Xperanto provides an integrated environment for management and analysis by linking the computational tools and rich sources of biological annotation. With the growing needs of data sharing, it is designed to be compliant to MGED (Microarray Gene Expression Data) standards for microarray data annotation and exchange. Xperanto enables a fast and efficient management of vast amounts of data, and serves as a communication channel among multiple researchers within an emerging interdisciplinary field.