• Title/Summary/Keyword: biological dataset

Search Result 121, Processing Time 0.03 seconds

NGSEA: Network-Based Gene Set Enrichment Analysis for Interpreting Gene Expression Phenotypes with Functional Gene Sets

  • Han, Heonjong;Lee, Sangyoung;Lee, Insuk
    • Molecules and Cells
    • /
    • v.42 no.8
    • /
    • pp.579-588
    • /
    • 2019
  • Gene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets; however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.

Insight into the species identification and distribution of Grateloupiaceae (Halymeniales, Rhodophyta) having Grateloupia filicina-like morphology in the Northwest Pacific

  • Su Yeon Kim;Sung Min Boo;Hawn Su Yoon;Myung Sook Kim
    • ALGAE
    • /
    • v.38 no.1
    • /
    • pp.23-38
    • /
    • 2023
  • Accurately identifying species is the basis of all biological studies. There has been much confusion in the identification of Grateloupiacean species, which have finely pinnate gross morphology similar to Grateloupia filicina (the type species of the family). The objective of this study was to comprehensively investigate species identification and distribution of G. filicina-like species in the Northwest Pacific, based on the rbcL sequences. A total of 118 specimens from 78 sites in Korea and Japan were collected from 2001 to 2021 and analyzed for their rbcL sequences. Additional 341 sequences downloaded from the GenBank were included in our comprehensive dataset. Based on these sequences, we documented the nomenclatural history and geographical distribution of the species, and commented on the application of species name. G. asiatica was the most abundant G. filicina-like species in the Northwest Pacific, and its high degree of morphological variation caused many misidentifications. In particular, G. dalianensis, G. serra, and G. variata require reconsideration of their conspecificity with G. asiatica using more specimens from China. By contrast, G. oligoclora was presumed to be a heterotypic synonym of G. subpectinata. The occurrence of G. acuminata, G. ramosissima, and G. livida in Korea resulted from misidentifications with other species.

Splice Site Detection Using a Combination of Markov Model and Neural Network

  • M Abdul Baten, A.K.;Halgamuge, Saman K.;Wickramarachchi, Nalin;Rajapakse, Jagath C.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.167-172
    • /
    • 2005
  • This paper introduces a method which improves the performance of the identification of splice sites in the genomic DNA sequence of eukaryotes. This method combines a low order Markov model in series with a neural network for the predictions of splice sites. The lower order Markov model incorporates the biological knowledge surrounding the splice sites as probabilistic parameters. The Neural network takes the Markov encoded parameters as the inputs and produces the prediction. Two types of neural networks are used for the comparison. This method reduces the computational complexity and shows encouraging accuracy in the predictions of splice sites when applied to several standard splice site dataset.

  • PDF

A Method for Protein Identification Based on MS/MS using Probabilistic Graphical Models (확률그래프모델을 이용한 MS/MS 기반 단백질 동정 기법)

  • Li, Hong-Lan;Hwang, Kyu-Baek
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.426-428
    • /
    • 2012
  • In order to identify proteins that are present in biological samples, these samples are separated and analyzed under the sequential procedure as follows: protein purification and digestion, peptide fragmentation by tandem mass spectrometry (MS/MS) which breaks peptides into fragments, peptide identification, and protein identification. One of the widely used methods for protein identification is based on probabilistic approaches such as ProteinProphet and BaysPro. However, they do not consider the difference in peptide identification probabilities according to their length. Here, we propose a probabilistic graphical model-based approach to protein identification from MS/MS data considering peptide identification probabilities, number of sibling peptides, and peptide length. We compared our approach with ProteinProphet using a yeast MS/MS dataset. As a result, our model identified 27 more proteins than ProteinProphet at 1% of FDR (false discovery rate), confirming the importance of peptide length information in protein identification.

Visualization of Medical Images Using Visualization Toolkit (VTK를 이용한 의료영상의 가시화)

  • Choi, H.G.;Tack, G.R.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1998 no.11
    • /
    • pp.113-114
    • /
    • 1998
  • In this paper, visualization of Visible Human data offered by NLM(National Library of Medicine) is performed using $VTK^{TM}$. Computed Tomography Data set(axial, $587\times341\times256$, and the distance between slices 1mm) is used throughout the study. Before the actual visualization routine, 8 bit-reader class of VTK is developed to transform CT data to VTK dataset. After that, the visualization procedures are done to display 3D image on PC. VTK is freeware, not a commercial software. The results of VTK show relatively good image quality and slower processing time compared with the commercial softwares like IAP, IDL, AVS. Thus if processing time is not the critical factor, VTK is worthy to be used in visualization of the medical images.

  • PDF

Effect of Normalization on Detection of Differentially-Expressed Genes with Moderate Effects

  • Cho, Seo-Ae;Lee, Eun-Jee;Kim, Young-Chul;Park, Tae-Sung
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.118-123
    • /
    • 2007
  • The current existing literature offers little guidance on how to decide which method to use to analyze one-channel microarray measurements when dealing with large, grouped samples. Most previous methods have focused on two-channel data;therefore they can not be easily applied to one-channel microarray data. Thus, a more reliable method is required to determine an appropriate combination of individual basic processing steps for a given dataset in order to improve the validity of one-channel expression data analysis. We address key issues in evaluating the effectiveness of basic statistical processing steps of microarray data that can affect the final outcome of gene expression analysis without focusingon the intrinsic data underlying biological interpretation.

Identification of inhibitors against ROS1 targeting NSCLC by In- Silico approach

  • Bavya, Chandrasekhar
    • Journal of Integrative Natural Science
    • /
    • v.15 no.4
    • /
    • pp.171-177
    • /
    • 2022
  • ROS1 (c-ros oncogene) is one of the gene with mutation in NSCLC (non-small cell lung cancer). The increased expression of ROS1 is leading to the increase proliferation of cell, cell migration and survival. Crizotinib and Entrectinib are the drugs that have been approved by FDA against ROS1 protein, but recently patients started to develop resistance against Crizotinib and there is a need of new drug that could act as an effective drug against ROS1 for NSCLC. In this study, we have performed virtual screening, where compounds are taken from Zinc 15 dataset and molecular docking was performed. The top compounds were taken based upon their binding affinity and their interactions with the residues. The compounds stability and chemical reactivity was also studied through Density Functional theory and their properties. Further study of these compounds could reveal the required information of ROS1-inhibitor complex and in the discovery of potent inhibitors.

New Finding of Golovinomyces salviae Powdery Mildew on Glechoma longituba (Lamiaceae), Besides Its Original Host Salvia spp.

  • In-Young Choi;Lamiya Abasova;Joon-Ho Choi;Young-Joon Choi;Hyeon-Dong Shin
    • The Korean Journal of Mycology
    • /
    • v.51 no.3
    • /
    • pp.239-243
    • /
    • 2023
  • The Golovinomyces biocellatus complex is known to consist of powdery mildew from the Golovinomyces genus, associated with host plants from the Lamiaceae family. Recent molecular phylogenetic analyses have resolved the taxonomic composition of this complex, and Golovinomyces biocellatus sensu stricto is considered to be a pathogen of Glechoma species, globally. However, this paper presents a new finding of Golovinomyces salviae on Glechoma longituba, besides its original host species of Salvia. This information was inferred by molecular phylogenetic analyses from the multi-locus nucleotide sequence dataset of intergeneric spacer (IGS), internal transcribed spacer (ITS), large subunit (LSU) of rDNA, and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene. Further, the asexual morphology of this fungus is described and illustrated.

Tolerance Range Analysis of Fish on Chemical Water Quality in Aquatic Ecosystems

  • Kim, Jeong-Kyu;Han, Jeong-Ho;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.43 no.4
    • /
    • pp.459-470
    • /
    • 2010
  • In this study, we analyzed fish tolerance guilds in mainstems and tributaries of 65 streams and rivers arid their relations to water quality using dataset sampled from April to November, 2009. For the study, water quality parameters including biochemical oxygen demand (BOD), electric conductivity (EC), total nitrogen (TN), total phosphorus (TP), ammonia nitrogen ($NH_3$-N), nitrate nitrogen ($NO_3$-N) and phosphate phosphorus ($PO_4$-P) were analyzed in the laboratory and also tolerance ranges in 3 category fishes of sensitive, intermediate, and tolerant species with high abundance were analyzed. According to fish guild analysis, tolerant species was 58% of the total community and the proportion of omnivore species was 63% of the total, indicating a degradation of habitats and water quality. Water quality was shown typical longitudinal gradients from the headwater to the down-river; TN and TP increased toward the down-rivers except for the big point-source area and ionic contents, based on, electric conductivity showed same pattern. Tolerance guild analysis of 9 major species with high abundance indicated that sensitive groups had narrower tolerance range in the water quality than the groups of intermediate and tolerant species. In contrast, tolerant groups including Zacco platypus, Carassius auratus, and Opsarichthys uncirostris amurensis had wider tolerance ranges than the groups of sensitive and intermediate species. Thus, each group was evidently segregated from the tolerance levels. Principal Component Analysis (PCA) employed for the relations of water quality to fish species in each groups suggests that water quality had highest eigenvalues with fish species in the 1st axis of the PCA and nitrogen (TN, $NH_3$-N, $NO_3$-N) and phosphorus (TP) were key components differentiating three groups of sensitive, intermediate and tolerance guilds.

Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS

  • Kwon, Ji-Sun;Kim, Ji-Hye;Nam, Doug-U;Kim, Sang-Soo
    • Genomics & Informatics
    • /
    • v.10 no.2
    • /
    • pp.123-127
    • /
    • 2012
  • Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.