• Title/Summary/Keyword: Pan-genome analysis

Search Result 29, Processing Time 0.026 seconds

Bioinformatics services for analyzing massive genomic datasets

  • Ko, Gunhwan;Kim, Pan-Gyu;Cho, Youngbum;Jeong, Seongmun;Kim, Jae-Yoon;Kim, Kyoung Hyoun;Lee, Ho-Yeon;Han, Jiyeon;Yu, Namhee;Ham, Seokjin;Jang, Insoon;Kang, Byunghee;Shin, Sunguk;Kim, Lian;Lee, Seung-Won;Nam, Dougu;Kim, Jihyun F.;Kim, Namshin;Kim, Seon-Young;Lee, Sanghyuk;Roh, Tae-Young;Lee, Byungwook
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.8.1-8.10
    • /
    • 2020
  • The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www. bioexpress.re.kr/.

Genomic Insight into the Salt Tolerance of Enterococcus faecium, Enterococcus faecalis and Tetragenococcus halophilus

  • Heo, Sojeong;Lee, Jungmin;Lee, Jong-Hoon;Jeong, Do-Won
    • Journal of Microbiology and Biotechnology
    • /
    • v.29 no.10
    • /
    • pp.1591-1602
    • /
    • 2019
  • To shed light on the genetic basis of salt tolerance in Enterococcus faecium, Enterococcus faecalis, and Tetragenococcus halophilus, we performed comparative genome analysis of 10 E. faecalis, 11 E. faecium, and three T. halophilus strains. Factors involved in salt tolerance that could be used to distinguish the species were identified. Overall, T. halophilus contained a greater number of potassium transport and osmoprotectant synthesis genes compared with the other two species. In particular, our findings suggested that T. halophilus may be the only one among the three species capable of synthesizing glycine betaine from choline, cardiolipin from glycerol and proline from citrate. These molecules are well-known osmoprotectants; thus, we propose that these genes confer the salt tolerance of T. halophilus.

Sequence Analysis of Mitochondrial Genome of Toxascaris leonina from a South China Tiger

  • Li, Kangxin;Yang, Fang;Abdullahi, A.Y.;Song, Meiran;Shi, Xianli;Wang, Minwei;Fu, Yeqi;Pan, Weida;Shan, Fang;Chen, Wu;Li, Guoqing
    • Parasites, Hosts and Diseases
    • /
    • v.54 no.6
    • /
    • pp.803-807
    • /
    • 2016
  • Toxascaris leonina is a common parasitic nematode of wild mammals and has significant impacts on the protection of rare wild animals. To analyze population genetic characteristics of T. leonina from South China tiger, its mitochondrial (mt) genome was sequenced. Its complete circular mt genome was 14,277 bp in length, including 12 proteincoding genes, 22 tRNA genes, 2 rRNA genes, and 2 non-coding regions. The nucleotide composition was biased toward A and T. The most common start codon and stop codon were TTG and TAG, and 4 genes ended with an incomplete stop codon. There were 13 intergenic regions ranging 1 to 10 bp in size. Phylogenetically, T. leonina from a South China tiger was close to canine T. leonina. This study reports for the first time a complete mt genome sequence of T. leonina from the South China tiger, and provides a scientific basis for studying the genetic diversity of nematodes between different hosts.

Deep Learning in Genomic and Medical Image Data Analysis: Challenges and Approaches

  • Yu, Ning;Yu, Zeng;Gu, Feng;Li, Tianrui;Tian, Xinmin;Pan, Yi
    • Journal of Information Processing Systems
    • /
    • v.13 no.2
    • /
    • pp.204-214
    • /
    • 2017
  • Artificial intelligence, especially deep learning technology, is penetrating the majority of research areas, including the field of bioinformatics. However, deep learning has some limitations, such as the complexity of parameter tuning, architecture design, and so forth. In this study, we analyze these issues and challenges in regards to its applications in bioinformatics, particularly genomic analysis and medical image analytics, and give the corresponding approaches and solutions. Although these solutions are mostly rule of thumb, they can effectively handle the issues connected to training learning machines. As such, we explore the tendency of deep learning technology by examining several directions, such as automation, scalability, individuality, mobility, integration, and intelligence warehousing.

New Approach to Predict microRNA Gene by using data Compression technique

  • Kim, Dae-Won;Yang, Joshua SungWoo;Kim, Pan-Jun;Chu, In-Sun;Jeong, Ha-Woong;Park, Hong-Seog
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.361-365
    • /
    • 2005
  • Over the past few years, the complex and subtle roles of microRNA (miRNA) in gene regulation have been increasingly appreciated. Computational approaches have played one of important roles in identifying miRNAs from plant and animals, as well as in predicting their putative gene target. We present a new approach of comprehensive analysis of the evolutionarily conserved element scores and applied data compression technique to detect putative miRNA genes. We used the evolutionarily conserved elements [19] (see more detail on method and material) to calculate for base-by-base along the candidate pre-miRNA gene region by detecting common conserved pattern from target sequence. We applied the data compression technique [20] to detect unknown miRNA genes. This zipping method devises, without loss of generality with respect to the nature of the character strings, a method to measure the similarity between the strings under consideration [20]. Our experience to using our new computational method for detecting miRNA gene identification (or miRNA gene prediction) has been stratified and we were able to find 28 putative miRNA genes.

  • PDF

Pan-Genomics of Lactobacillus plantarum Revealed Group-Specific Genomic Profiles without Habitat Association

  • Choi, Sukjung;Jin, Gwi-Deuk;Park, Jongbin;You, Inhwan;Kim, Eun Bae
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.8
    • /
    • pp.1352-1359
    • /
    • 2018
  • Lactobacillus plantarum is a lactic acid bacterium that promotes animal intestinal health as a probiotic and is found in a wide variety of habitats. Here, we investigated the genomic features of different clusters of L. plantarum strains via pan-genomic analysis. We compared the genomes of 108 L. plantarum strains that were available from the NCBI GenBank database. These genomes were 2.9-3.7 Mbp in size and 44-45% in G+C content. A total of 8,847 orthologs were collected, and 1,709 genes were identified to be shared as core genes by all the strains analyzed. On the basis of SNPs from the core genes, 108 strains were clustered into five major groups (G1-G5) that are different from previous reports and are not clearly associated with habitats. Analysis of group-specific enriched or depleted genes revealed that G1 and G2 were rich in genes for carbohydrate utilization (${\text\tiny{L}}-arabinose$, ${\text\tiny{L}}-rhamnose$, and fructooligosaccharides) and that G3, G4, and G5 possessed more genes for the restriction-modification system and MazEF toxin-antitoxin. These results indicate that there are critical differences in gene content and survival strategies among genetically clustered L. plantarum strains, regardless of habitats.

Unraveling the hypoxia modulating potential of VEGF family genes in pan-cancer

  • So-Hyun Bae;Taewon Hwang;Mi-Ryung Han
    • Genomics & Informatics
    • /
    • v.21 no.4
    • /
    • pp.44.1-44.10
    • /
    • 2023
  • Tumor hypoxia, oxygen deprivation state, occurs in most cancers and promotes angiogenesis, enhancing the potential for metastasis. The vascular endothelial growth factor (VEGF) family genes play crucial roles in tumorigenesis by promoting angiogenesis. To investigate the malignant processes triggered by hypoxia-induced angiogenesis across pan-cancers, we comprehensively analyzed the relationships between the expression of VEGF family genes and hypoxic microenvironment based on integrated bioinformatics methods. Our results suggest that the expression of VEGF family genes differs significantly among various cancers, highlighting their heterogeneity effect on human cancers. Across the 33 cancers, VEGFB and VEGFD showed the highest and lowest expression levels, respectively. The survival analysis showed that VEGFA and placental growth factor (PGF) were correlated with poor prognosis in many cancers, including kidney renal cell and liver hepatocellular carcinoma. VEGFC expression was positively correlated with glioma and stomach cancer. VEGFA and PGF showed distinct positive correlations with hypoxia scores in most cancers, indicating a potential correlation with tumor aggressiveness. The expression of miRNAs targeting VEGF family genes, including hsa-miR-130b-5p and hsa-miR-940, was positively correlated with hypoxia. In immune subtypes analysis, VEGFC was highly expressed in C3 (inflammatory) and C6 (transforming growth factor β dominant) across various cancers, indicating its potential role as a tumor promotor. VEGFC expression exhibited positive correlations with immune infiltration scores, suggesting low tumor purity. High expression of VEGFA and VEGFC showed favorable responses to various drugs, including BLU-667, which abrogates RET signaling, an oncogenic driver in liver and thyroid cancers. Our findings suggest potential roles of VEGF family genes in malignant processes related with hypoxia-induced angiogenesis.

Gateway RFP-Fusion Vectors for High Throughput Functional Analysis of Genes

  • Park, Jae-Yong;Hwang, Eun Mi;Park, Nammi;Kim, Eunju;Kim, Dong-Gyu;Kang, Dawon;Han, Jaehee;Choi, Wan Sung;Ryu, Pan-Dong;Hong, Seong-Geun
    • Molecules and Cells
    • /
    • v.23 no.3
    • /
    • pp.357-362
    • /
    • 2007
  • There is an increasing demand for high throughput (HTP) methods for gene analysis on a genome-wide scale. However, the current repertoire of HTP detection methodologies allows only a limited range of cellular phenotypes to be studied. We have constructed two HTP-optimized expression vectors generated from the red fluorescent reporter protein (RFP) gene. These vectors produce RFP-tagged target proteins in a multiple expression system using gateway cloning technology (GCT). The RFP tag was fused with the cloned genes, thereby allowing us localize the expressed proteins in mammalian cells. The effectiveness of the vectors was evaluated using an HTP-screening system. Sixty representative human C2 domains were tagged with RFP and overexpressed in HiB5 neuronal progenitor cells, and we studied in detail two C2 domains that promoted the neuronal differentiation of HiB5 cells. Our results show that the two vectors developed in this study are useful for functional gene analysis using an HTP-screening system on a genome-wide scale.

Methylation-sensitive high-resolution melting analysis of the USP44 promoter can detect early-stage hepatocellular carcinoma in blood samples

  • Si-Cho, Kim;Jiwon, Kim;Da-Won, Kim;Yanghee, Choi;Kyunghyun, Park;Eun Ju, Cho;Su Jong, Yu;Jeongsil, Kim-Ha;Young-Joon, Kim
    • BMB Reports
    • /
    • v.55 no.11
    • /
    • pp.553-558
    • /
    • 2022
  • Hepatocellular carcinoma (HCC) is dangerous cancer that often evades early detection because it is asymptomatic and an effective detection method is lacking. For people with chronic liver inflammation who are at high risk of developing HCC, a sensitive detection method for HCC is needed. In a meta-analysis of The Cancer Genome Atlas pan-cancer methylation database, we identified a CpG island in the USP44 promoter that is methylated specifically in HCC. We developed methylation-sensitive high-resolution melting (MS-HRM) analysis to measure the methylation levels of the USP promoter in cell-free DNA isolated from patients. Our MS-HRM assay correctly identified 40% of patients with early-stage HCC, whereas the α-fetoprotein test, which is currently used to detect HCC, correctly identified only 25% of early-stage HCC patients. These results demonstrate that USP44 MS-HRM analysis is suitable for HCC surveillance.

Comparative Genomic and Genetic Functional Analysis of Industrial L-Leucine- and L-Valine-Producing Corynebacterium glutamicum Strains

  • Ma, Yuechao;Chen, Qixin;Cui, Yi;Du, Lihong;Shi, Tuo;Xu, Qingyang;Ma, Qian;Xie, Xixian;Chen, Ning
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.11
    • /
    • pp.1916-1927
    • /
    • 2018
  • Corynebacterium glutamicum is an excellent platform for the production of amino acids, and is widely used in the fermentation industry. Most industrial strains are traditionally obtained by repeated processes of random mutation and selection, but the genotype of these strains is often unclear owing to the absence of genomic information. As such, it is difficult to improve the growth and amino acid production of these strains via metabolic engineering. In this study, we generated a complete genome map of an industrial L-valine-producing strain, C. glutamicum XV. In order to establish the relationship between genotypes and physiological characteristics, a comparative genomic analysis was performed to explore the core genome, structural variations, and gene mutations referring to an industrial L-leucine-producing strain, C. glutamicum CP, and the widely used C. glutamicum ATCC 13032. The results indicate that a 36,349 bp repeat sequence in the CP genome contained an additional copy each of lrp and brnFE genes, which benefited the export of L-leucine. However, in XV, the kgd and panB genes were disrupted by nucleotide insertion, which increase the availability of precursors to synthesize L-valine. Moreover, the specific amino acid substitutions in key enzymes increased their activities. Additionally, a novel strategy is proposed to remodel central carbon metabolism and reduce pyruvate consumption without having a negative impact on cell growth by introducing the CP-derived mutant $H^+$/citrate symporter. These results further our understanding regarding the metabolic networks in these strains and help to elucidate the influence of different genotypes on these processes.