• Title/Summary/Keyword: genome-mining

Search Result 84, Processing Time 0.027 seconds

Genome data mining for everyone

  • Lee, Gir-Won;Kim, Sang-Soo
    • BMB Reports
    • /
    • v.41 no.11
    • /
    • pp.757-764
    • /
    • 2008
  • The genomic sequences of a huge number of species have been determined. Typically, these genome sequences and the associated annotation data are accessed through Internet-based genome browsers that offer a user-friendly interface. Intelligent use of the data should expedite biological knowledge discovery. Such activity is collectively called data mining and involves queries that can be simple, complex, and even combinational. Various tools have been developed to make genome data mining available to computational and experimental biologists alike. In this mini-review, some tools that have proven successful will be introduced along with examples taken from published reports.

MediScore: MEDLINE-based Interactive Scoring of Gene and Disease Associations

  • Cho, Hye-Young;Oh, Bermseok;Lee, Jong-Keuk;Kim, Kuchan;Koh, InSong
    • Genomics & Informatics
    • /
    • v.2 no.3
    • /
    • pp.131-133
    • /
    • 2004
  • MediScore is an information retrieval system, which helps to search for the set of genes associated with a specific disease or the set of diseases associated with a specific gene. Despite recent improvement of natural language processing (NLP) and other text mining approaches to search for disease associated genes, many false positive results come out due to diversity of exceptional cases as well as ambiguities in gene names. In order to overcome the weak points of current text mining approaches, MediScore introduces statistical normalization based on binomial to normal distribution approximation which corrects inaccurate scores caused by common words not representing genes and interactive rescoring by the user to remove the false positive results. Interactive rescoring includes individual alias scoring for each gene to remove false gene synonyms, referring MEDLINE abstracts, and cross referencing between OMIM and other related information.

GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction

  • Oh, So-Yeon;Kim, Ji-Hyeon;Kim, Seo-Jin;Nam, Hee-Jo;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.16 no.3
    • /
    • pp.75-77
    • /
    • 2018
  • Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.

Evaluation and Genome Mining of Bacillus stercoris Isolate B.PNR1 as Potential Agent for Fusarium Wilt Control and Growth Promotion of Tomato

  • Rattana Pengproh;Thanwanit Thanyasiriwat;Kusavadee Sangdee;Juthaporn Saengprajak;Praphat Kawicha;Aphidech Sangdee
    • The Plant Pathology Journal
    • /
    • v.39 no.5
    • /
    • pp.430-448
    • /
    • 2023
  • Recently, strategies for controlling Fusarium oxysporum f. sp. lycopersici (Fol), the causal agent of Fusarium wilt of tomato, focus on using effective biocontrol agents. In this study, an analysis of the biocontrol and plant growth promoting (PGP) attributes of 11 isolates of loamy soil Bacillus spp. has been conducted. Among them, the isolates B.PNR1 and B.PNR2 inhibited the mycelial growth of Fol by inducing abnormal fungal cell wall structures and cell wall collapse. Moreover, broad-spectrum activity against four other plant pathogenic fungi, F. oxysporum f. sp. cubense race 1 (Foc), Sclerotium rolfsii, Colletotrichum musae, and C. gloeosporioides were noted for these isolates. These two Bacillus isolates produced indole acetic acid, phosphate solubilization enzymes, and amylolytic and cellulolytic enzymes. In the pot experiment, the culture filtrate from B.PNR1 showed greater inhibition of the fungal pathogens and significantly promoted the growth of tomato plants more than those of the other treatments. Isolate B.PNR1, the best biocontrol and PGP, was identified as Bacillus stercoris by its 16S rRNA gene sequence and whole genome sequencing analysis (WGS). The WGS, through genome mining, confirmed that the B.PNR1 genome contained genes/gene cluster of a nonribosomal peptide synthetase/polyketide synthase, such as fengycin, surfactin, bacillaene, subtilosin A, bacilysin, and bacillibactin, which are involved in antagonistic and PGP activities. Therefore, our finding demonstrates the effectiveness of B. stercoris strain B.PNR1 as an antagonist and for plant growth promotion, highlighting the use of this microorganism as a biocontrol agent against the Fusarium wilt pathogen and PGP abilities in tomatoes.

Association Analysis of Reactive Oxygen Species-Hypertension Genes Discovered by Literature Mining

  • Lim, Ji Eun;Hong, Kyung-Won;Jin, Hyun-Seok;Oh, Bermseok
    • Genomics & Informatics
    • /
    • v.10 no.4
    • /
    • pp.244-248
    • /
    • 2012
  • Oxidative stress, which results in an excessive product of reactive oxygen species (ROS), is one of the fundamental mechanisms of the development of hypertension. In the vascular system, ROS have physical and pathophysiological roles in vascular remodeling and endothelial dysfunction. In this study, ROS-hypertension-related genes were collected by the biological literature-mining tools, such as SciMiner and gene2pubmed, in order to identify the genes that would cause hypertension through ROS. Further, single nucleotide polymorphisms (SNPs) located within these gene regions were examined statistically for their association with hypertension in 6,419 Korean individuals, and pathway enrichment analysis using the associated genes was performed. The 2,945 SNPs of 237 ROS-hypertension genes were analyzed, and 68 genes were significantly associated with hypertension (p < 0.05). The most significant SNP was rs2889611 within MAPK8 (p = $2.70{\times}10^{-5}$; odds ratio, 0.82; confidence interval, 0.75 to 0.90). This study demonstrates that a text mining approach combined with association analysis may be useful to identify the candidate genes that cause hypertension through ROS or oxidative stress.

Genomic Analysis of the Xanthoria elegans and Polyketide Synthase Gene Mining Based on the Whole Genome

  • Xiaolong Yuan;Yunqing Li;Ting Luo;Wei Bi;Jiaojun Yu;Yi Wang
    • Mycobiology
    • /
    • v.51 no.1
    • /
    • pp.36-48
    • /
    • 2023
  • Xanthoria elegans is a lichen symbiosis, that inhabits extreme environments and can absorb UV-B. We reported the de novo sequencing and assembly of X. elegans genome. The whole genome was approximately 44.63 Mb, with a GC content of 40.69%. Genome assembly generated 207 scaffolds with an N50 length of 563,100 bp, N90 length of 122,672 bp. The genome comprised 9,581 genes, some encoded enzymes involved in the secondary metabolism such as terpene, polyketides. To further understand the UV-B absorbing and adaptability to extreme environments mechanisms of X. elegans, we searched the secondary metabolites genes and gene-cluster from the genome using genome-mining and bioinformatics analysis. The results revealed that 7 NR-PKSs, 12 HR-PKSs and 2 hybrid PKS-PKSs from X. elegans were isolated, they belong to Type I PKS (T1PKS) according to the domain architecture; phylogenetic analysis and BGCs comparison linked the putative products to two NR-PKSs and three HR-PKSs, the putative products of two NR-PKSs were emodin xanthrone (most likely parietin) and mycophelonic acid, the putative products of three HR-PKSs were soppilines, (+)-asperlin and macrolactone brefeldin A, respectively. 5 PKSs from X. elegans build a correlation between the SMs carbon skeleton and PKS genes based on the domain architecture, phylogenetic and BGC comparison. Although the function of 16 PKSs remains unclear, the findings emphasize that the genes from X. elegans represent an unexploited source of novel polyketide and utilization of lichen gene resources.

Overview of frequent pattern mining

  • Jurg Ott;Taesung Park
    • Genomics & Informatics
    • /
    • v.20 no.4
    • /
    • pp.39.1-39.9
    • /
    • 2022
  • Various methods of frequent pattern mining have been applied to genetic problems, specifically, to the combined association of two genotypes (a genotype pattern, or diplotype) at different DNA variants with disease. These methods have the ability to come up with a selection of genotype patterns that are more common in affected than unaffected individuals, and the assessment of statistical significance for these selected patterns poses some unique problems, which are briefly outlined here.

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

ManBIF: a Program for Mining and Managing Biobank Impact Factor Data

  • Yu, Ki-Jin;Nam, Jung-Min;Her, Yun;Chu, Min-Seock;Seo, Hyung-Seok;Kim, Jun-Woo;Jeon, Jae-Pil;Park, Hye-Kyung;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.37-38
    • /
    • 2011
  • Biobank Impact Factor (BIF), which is a very effective criterion to evaluate the activity of biobanks, can be estimated by the citation information of biobanks from scientific papers. We have developed a program, ManBIF, to investigate the citation information from PDF files in the literature. The program manages a dictionary for expressions to represent biobanks and their resources, mines the citation information by converting PDF files to text files and searching with a dictionary, and produces a statistical report file. It can be used as an important tool by biobanks.