• Title/Summary/Keyword: genome annotation

Search Result 179, Processing Time 0.025 seconds

Functional annotation of lung cancer-associated genetic variants by cell type-specific epigenome and long-range chromatin interactome

  • Lee, Andrew J.;Jung, Inkyung
    • Genomics & Informatics
    • /
    • v.19 no.1
    • /
    • pp.3.1-3.12
    • /
    • 2021
  • Functional interpretation of noncoding genetic variants associated with complex human diseases and traits remains a challenge. In an effort to enhance our understanding of common germline variants associated with lung cancer, we categorize regulatory elements based on eight major cell types of human lung tissue. Our results show that 21.68% of lung cancer-associated risk variants are linked to noncoding regulatory elements, nearly half of which are cell type-specific. Integrative analysis of high-resolution long-range chromatin interactome maps and single-cell RNA-sequencing data of lung tumors uncovers number of putative target genes of these variants and functionally relevant cell types, which display a potential biological link to cancer susceptibility. The present study greatly expands the scope of functional annotation of lung cancer-associated genetic risk factors and dictates probable cell types involved in lung carcinogenesis.

Complete genome sequence of Paenibacillus konkukensis sp. nov. SK3146 as a potential probiotic strain

  • Jung, Hae-In;Park, Sungkwon;Niu, Kai-Min;Lee, Sang-Won;Kothari, Damini;Yi, Kwon Jung;Kim, Soo-Ki
    • Journal of Animal Science and Technology
    • /
    • v.63 no.3
    • /
    • pp.666-670
    • /
    • 2021
  • Paenibacillus konkukensis sp. nov., SK3146 is a novel strain isolated from a pig feed. Here, we present complete genome sequence of SK3146. The genome consists of a single circular genome measuring 7,968,964 bp in size with an average guanine + cytosine (G+C) content of 53.4%. Genomic annotation revealed that the strain encodes 151 proteins related to hydrolases (EC3), which was higher than those in Bacillus subtilis and Escherichia coli. Diverse kinds of hydrolases including galactosidase, glucosidase, cellulase, lipase, xylanase, and protease were found in the genome of SK3146, coupled with one bacteriocin encoding gene. The complete genome sequence of P. konkukensis SK3146 indicates the immense probiotic potential of the strain with nutrient digestibility and antimicrobial activity functions.

Gene annotation by the "interactome"analysis in KEGG

  • Kanehisa, Minoru
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.56-58
    • /
    • 2000
  • Post-genomics may be defined in different ways depending on how one views the challenges after the genome. A popular view is to follow the concept of the central dogma in molecular biology, namely from genome to transcriptome to proteome. Projects are going on to analyze gene expression profiles both at the mRNA and protein levels and to catalog protein 3D structure families, which will no doubt help the understanding of information in the genome. However complete, such catalogs of genes, RNAs, and proteins only tell us about the building blocks of life. They do not tell us much about the wiring (interaction) of building blocks, which is essential for uncovering systemic functional behaviors of the cell or the organism. Thus, an alternative view of post-genomics is to go up from the molecular level to the cellular level, and to understand, what I call, the "interactome"or a complete picture of molecular interactions in the cell. KEGG (http://www.genome.ad.jp/kegg/) is our attempt to computerize current knowledge on various cellular processes as a collection of "generalized"protein-protein interaction networks, to develop new graph-based algorithms for predicting such networks from the genome information, and to actually reconstruct the interactomes for all the completely sequenced genomes and some partial genomes. During the reconstruction process, it becomes readily apparent that certain pathways and molecular complexes are present or absent in each organism, indicating modular structures of the interactome. In addition, the reconstruction uncovers missing components in an otherwise complete pathway or complex, which may result from misannotation of the genome or misrepresentation of the KEGG pathway. When combined with additional experimental data on protein-protein interactions, such as by yeast two-hybrid systems, the reconstruction possibly uncovers unknown partners for a particular pathway or complex. Thus, the reconstruction is tightly coupled with the annotation of individual genes, which is maintained in the GENES database in KEGG. We are also trying to expand our literature surrey to include in the GENES database most up-to-date information about gene functions.

  • PDF

Mouse phenogenomics, toolbox for functional annotation of human genome

  • Kim, Il-Yong;Shin, Jae-Hoon;Seong, Je-Kyung
    • BMB Reports
    • /
    • v.43 no.2
    • /
    • pp.79-90
    • /
    • 2010
  • Mouse models are crucial for the functional annotation of human genome. Gene modification techniques including gene targeting and gene trap in mouse have provided powerful tools in the form of genetically engineered mice (GEM) for understanding the molecular pathogenesis of human diseases. Several international consortium and programs are under way to deliver mutations in every gene in mouse genome. The information from studying these GEM can be shared through international collaboration. However, there are many limitations in utility because not all human genes are knocked out in mouse and they are not yet phenotypically characterized by standardized ways which is required for sharing and evaluating data from GEM. The recent improvement in mouse genetics has now moved the bottleneck in mouse functional genomics from the production of GEM to the systematic mouse phenotype analysis of GEM. Enhanced, reproducible and comprehensive mouse phenotype analysis has thus emerged as a prerequisite for effectively engaging the phenotyping bottleneck. In this review, current information on systematic mouse phenotype analysis and an issue-oriented perspective will be provided.

Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions

  • Yang, Long;Cho, Hwan-Gue
    • Genomics & Informatics
    • /
    • v.10 no.1
    • /
    • pp.58-64
    • /
    • 2012
  • Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and $Arabidopsis$) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in $Ostreococcus$ $lucimarinus$ was excessive (47.7%), while in $Ostreococcus$ $tauri$, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.

Draft Genome of Toxocara canis, a Pathogen Responsible for Visceral Larva Migrans

  • Kong, Jinhwa;Won, Jungim;Yoon, Jeehee;Lee, UnJoo;Kim, Jong-Il;Huh, Sun
    • Parasites, Hosts and Diseases
    • /
    • v.54 no.6
    • /
    • pp.751-758
    • /
    • 2016
  • This study aimed at constructing a draft genome of the adult female worm Toxocara canis using next-generation sequencing (NGS) and de novo assembly, as well as to find new genes after annotation using functional genomics tools. Using an NGS machine, we produced DNA read data of T. canis. The de novo assembly of the read data was performed using SOAPdenovo. RNA read data were assembled using Trinity. Structural annotation, homology search, functional annotation, classification of protein domains, and KEGG pathway analysis were carried out. Besides them, recently developed tools such as MAKER, PASA, Evidence Modeler, and Blast2GO were used. The scaffold DNA was obtained, the N50 was 108,950 bp, and the overall length was 341,776,187 bp. The N50 of the transcriptome was 940 bp, and its length was 53,046,952 bp. The GC content of the entire genome was 39.3%. The total number of genes was 20,178, and the total number of protein sequences was 22,358. Of the 22,358 protein sequences, 4,992 were newly observed in T. canis. Following proteins previously unknown were found: E3 ubiquitin-protein ligase cbl-b and antigen T-cell receptor, zeta chain for T-cell and B-cell regulation; endoprotease bli-4 for cuticle metabolism; mucin 12Ea and polymorphic mucin variant C6/1/40r2.1 for mucin production; tropomodulin-family protein and ryanodine receptor calcium release channels for muscle movement. We were able to find new hypothetical polypeptides sequences unique to T. canis, and the findings of this study are capable of serving as a basis for extending our biological understanding of T. canis.

Genome Analysis and Optimization of Caproic Acid Production of Clostridium butyricum GD1-1 Isolated from the Pit Mud of Nongxiangxing Baijiu

  • Min Li;Tao Li;Jia Zheng;Zongwei Qiao;Kaizheng Zhang;Huibo Luo;Wei Zou
    • Journal of Microbiology and Biotechnology
    • /
    • v.33 no.10
    • /
    • pp.1337-1350
    • /
    • 2023
  • Caproic acid is a precursor substance for the synthesis of ethyl caproate, the main flavor substance of nongxiangxing baijiu liquor. In this study, Clostridium butyricum GD1-1, a strain with high caproic acid concentration (3.86 g/l), was isolated from the storage pit mud of nongxiangxing baijiu for sequencing and analysis. The strain's genome was 3,840,048 bp in length with 4,050 open reading frames. In addition, virulence factor annotation analysis showed C. butyricum GD1-1 to be safe at the genetic level. However, the annotation results using the Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server predicted a deficiency in the strain's synthesis of alanine, methionine, and biotin. These results were confirmed by essential nutrient factor validation experiments. Furthermore, the optimized medium conditions for caproic acid concentration by strain GD1-1 were (g/l): glucose 30, NaCl 5, yeast extract 10, peptone 10, beef paste 10, sodium acetate 11, L-cysteine 0.6, biotin 0.004, starch 2, and 2.0% ethanol. The optimized fermentation conditions for caproic acid production by C. butyricum GD1-1 on a single-factor basis were: 5% inoculum volume, 35℃, pH 7, and 90% loading volume. Under optimal conditions, the caproic acid concentration of strain GD1-1 reached 5.42 g/l, which was 1.40 times higher than the initial concentration. C. butyricum GD1-1 could be further used in caproic acid production, NXXB pit mud strengthening and maintenance, and artificial pit mud preparation.

Genome data mining for everyone

  • Lee, Gir-Won;Kim, Sang-Soo
    • BMB Reports
    • /
    • v.41 no.11
    • /
    • pp.757-764
    • /
    • 2008
  • The genomic sequences of a huge number of species have been determined. Typically, these genome sequences and the associated annotation data are accessed through Internet-based genome browsers that offer a user-friendly interface. Intelligent use of the data should expedite biological knowledge discovery. Such activity is collectively called data mining and involves queries that can be simple, complex, and even combinational. Various tools have been developed to make genome data mining available to computational and experimental biologists alike. In this mini-review, some tools that have proven successful will be introduced along with examples taken from published reports.

Loss of Heterozygosity at the Calcium Regulation Gene Locus on Chromosome 10q in Human Pancreatic Cancer

  • Long, Jin;Zhang, Zhong-Bo;Liu, Zhe;Xu, Yuan-Hong;Ge, Chun-Lin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.6
    • /
    • pp.2489-2493
    • /
    • 2015
  • Background: Loss of heterozygosity (LOH) on chromosomal regions is crucial in tumor progression and this study aimed to identify genome-wide LOH in pancreatic cancer. Materials and Methods: Single-nucleotide polymorphism (SNP) profiling data GSE32682 of human pancreatic samples snap-frozen during surgery were downloaded from Gene Expression Omnibus database. Genotype console software was used to perform data processing. Candidate genes with LOH were screened based on the genotype calls, SNP loci of LOH and dbSNP database. Gene annotation was performed to identify the functions of candidate genes using NCBI (the National Center for Biotechnology Information) database, followed by Gene Ontology, INTERPRO, PFAM and SMART annotation and UCSC Genome Browser track to the unannotated genes using DAVID (the Database for Annotation, Visualization and Integration Discovery). Results: The candidate genes with LOH identified in this study were MCU, MICU1 and OIT3 on chromosome 10. MCU was found to encode a calcium transporter and MICU1 could encode an essential regulator of mitochondrial $Ca^{2+}$ uptake. OIT3 possibly correlated with calcium binding revealed by the annotation analyses and was regulated by a large number of transcription factors including STAT, SOX9, CREB, NF-kB, PPARG and p53. Conclusions: Global genomic analysis of SNPs identified MICU1, MCU and OIT3 with LOH on chromosome 10, implying involvement of these genes in progression of pancreatic cancer.

A Study on Construction of Integrated Prokaryotes Gene Prediction System (통합형 미생물 유전자 예측 시스템의 구축에 관한 연구)

  • Chang Jong-won;Ryoo Yoon-kyu;Ku Ja-hyo;Yoon Young-woo
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.6 no.1
    • /
    • pp.27-32
    • /
    • 2005
  • As a large quantity of Genome sequencing has happened to be done a very much a surprising speed in short period, an automatic genome annotation process has become prerequisite. The most difficult process among with this kind of genome annotation works is to finding out the protein-coding genes within a genome. The main 2 subjects of gene prediction are Eukaryotes and Prokaryotes ; their genes have different structures, therefore, their gene prediction methods will also obviously varies. Until now, it is found that among of the 231 genome sequenced species, 200 have been found to be prokaryotes, therefore, for study of biotechnology studies, through comparative genomics, prokaryotes, rather than eukaryotes could may be more appropriate than eukaryotes. Even more, prokaryotes does not have the gene structure called an intron, so it makes the gene prediction easier. Former prokaryotes gene predictions have been shown to be 80%~ to 90% of accuracy. A recent study is aiming at 100% of gene prediction accuracy. In this paper, especially in the case of the E. coli K-12 and S. typhi genomes, gene prediction accuracy which showed 98.5% and 98.7% was more efficient than previous GLIMMER.

  • PDF