• Title/Summary/Keyword: gene expression data

Search Result 1,293, Processing Time 0.035 seconds

Deep learning for stage prediction in neuroblastoma using gene expression data

  • Park, Aron;Nam, Seungyoon
    • Genomics & Informatics
    • /
    • v.17 no.3
    • /
    • pp.30.1-30.4
    • /
    • 2019
  • Neuroblastoma is a major cause of cancer death in early childhood, and its timely and correct diagnosis is critical. Gene expression datasets have recently been considered as a powerful tool for cancer diagnosis and subtype classification. However, no attempts have yet been made to apply deep learning using gene expression to neuroblastoma classification, although deep learning has been applied to cancer diagnosis using image data. Taking the International Neuroblastoma Staging System stages as multiple classes, we designed a deep neural network using the gene expression patterns and stages of neuroblastoma patients. Despite a small patient population (n = 280), stage 1 and 4 patients were well distinguished. If it is possible to replicate this approach in a larger population, deep learning could play an important role in neuroblastoma staging.

A Pattern Consistency Index for Detecting Heterogeneous Time Series in Clustering Time Course Gene Expression Data (시간경로 유전자 발현자료의 군집분석에서 이질적인 시계열의 탐지를 위한 패턴일치지수)

  • Son, Young-Sook;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.371-379
    • /
    • 2005
  • In this paper, we propose a pattern consistency index for detecting heterogeneous time series that deviate from the representative pattern of each cluster in clustering time course gene expression data using the Pearson correlation coefficient. We examine its usefulness by applying this index to serum time course gene expression data from microarrays.

High-throughput identification of chrysanthemum gene function and expression: An overview and an effective proposition

  • Nguyen, Toan Khac;Lim, Jin Hee
    • Journal of Plant Biotechnology
    • /
    • v.48 no.3
    • /
    • pp.139-147
    • /
    • 2021
  • Since whole-genome duplication (WGD) of diploid Chrysanthemum nankingense and de novo assembly whole-genome of C. seticuspe have been obtained, they have afforded to perceive the diversity evolution and gene discovery in the improved investigation of chrysanthemum breeding. The robust tools of high-throughput identification and analysis of gene function and expression produce their vast importance in chrysanthemum genomics. However, the gigantic genome size and heterozygosity are also mentioned as the major obstacles preventing the chrysanthemum breeding practices and functional genomics analysis. Nonetheless, some of technological contemporaries provide scientific efficient and promising solutions to diminish the drawbacks and investigate the high proficient methods for generous phenotyping data obtaining and system progress in future perspectives. This review provides valuable strategies for a broad overview about the high-throughput identification, and molecular analysis of gene function and expression in chrysanthemum. We also contribute the efficient proposition about specific protocols for considering chrysanthemum genes. In further perspective, the proper high-throughput identification will continue to advance rapidly and advertise the next generation in chrysanthemum breeding.

Gene Selection using Principal Component Analysis for Molecular classification (Principal Component Analysis를 이용한 Gene Selection)

  • Lim Soo-Hong;Sohn Kirack;Hong Sung-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.259-261
    • /
    • 2005
  • 수천개의 Gene Expression Measurement를 생성해 내는 DNA Microarray 연구는 조직과 세포의 표본으로부터 진단에 유용한 Gene Expression 정보를 모으게 된다. 이런 종류의 Data를 분석하기 위하여 SVM(Support Vector Machine)을 사용한 새로운 방법이 연구되어왔다. 본 논문에서는 Gene Expression Data에 대한 고유벡터(Eigen Vector)를 이용하여 SVM의 성능을 향상시키고 질병진단에 유용한 Gene을 찾아 내는 알고리즘을 기술한다. 고유벡터를 통하여 Gene을 선택적으로 SVM Learning에 참가 시키고 분류의 결과를 통하여 추가된 Gene이 질병 진단에 미치는 영향력을 알아냄으로써 질병에 대한 Gene 역할을 파악 하는데 활용할 수 있다.

  • PDF

Rank-based Multiclass Gene Selection for Cancer Classification with Naive Bayes Classifiers based on Gene Expression Profiles (나이브 베이스 분류기를 이용한 유전발현 데이타기반 암 분류를 위한 순위기반 다중클래스 유전자 선택)

  • Hong, Jin-Hyuk;Cho, Sung-Bae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.8
    • /
    • pp.372-377
    • /
    • 2008
  • Multiclass cancer classification has been actively investigated based on gene expression profiles, where it determines the type of cancer by analyzing the large amount of gene expression data collected by the DNA microarray technology. Since gene expression data include many genes not related to a target cancer, it is required to select informative genes in order to obtain highly accurate classification. Conventional rank-based gene selection methods often use ideal marker genes basically devised for binary classification, so it is difficult to directly apply them to multiclass classification. In this paper, we propose a novel method for multiclass gene selection, which does not use ideal marker genes but directly analyzes the distribution of gene expression. It measures the class-discriminability by discretizing gene expression levels into several regions and analyzing the frequency of training samples for each region, and then classifies samples by using the naive Bayes classifier. We have demonstrated the usefulness of the proposed method for various representative benchmark datasets of multiclass cancer classification.

Gene Expression Signatures for Compound Response in Cancers

  • He, Ningning;Yoon, Suk-Joon
    • Genomics & Informatics
    • /
    • v.9 no.4
    • /
    • pp.173-180
    • /
    • 2011
  • Recent trends in generating multiple, large-scale datasets provide new challenges to manipulating the relationship of different types of components, such as gene expression and drug response data. Integrative analysis of compound response and gene expression datasets generates an opportunity to capture the possible mechanism of compounds by using signature genes on diverse types of cancer cell lines. Here, we integrated datasets of compound response and gene expression profiles on NCI60 cell lines and constructed a network, revealing the relationship for 801 compounds and 341 gene probes. As examples, obtusol, which shows an exclusive sensitivity on a small number of colon cell lines, is related to a set of gene probes that have unique overexpression in colon cell lines. We also found that the SLC7A11 gene, a direct target of miR-26b, might be a key element in understanding the action of many diverse classes of anticancer compounds. We demonstrated that this network might be useful for studying the mechanisms of varied compound response on diverse cancer cell lines.

Correlation between Expression Level of Gene and Codon Usage

  • Hwang, Da-Jung;Han, Joon-Hee;Raghava, G P S
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.138-149
    • /
    • 2004
  • In this study, we analyzed the gene expression data of Saccharomyces cerevisiae obtained from Holstege et al. 1998 to understand the relationship between expression level and nucleotide sequence of a gene. First, the correlation between gene expression and percent composition of each type of nucleotide was computed. It was observed that nucleotide 'G' and 'C' show positive correlation (r ${\geq}$ 0.15), 'A' shows negative correlation (r ${\approx}$ -0.21) and 'T' shows no correlation (r ${\approx}$ 0.00) with gene expression. It was also found that 'G+C' rich genes express more in comparison to 'A+T' rich genes. We observed the inverse correlation between composition of a nucleotide at genome level and level of gene expression. Then we computed the correlation between dinucleotides (e.g. AA, AT, GC) composition and gene expression and observed a wide variation in correlation (from r = -0.45 for AT to r = 0.35 for GT). The dinucleotides which contain 'T' have wide range of correlation with gene expression. For example, GT and CT have high positive correlation and AT have high negative correlation. We also computed the correlation between trinucleotides (or codon) composition and gene expression and again observed wide range of correlation (from r = -0.45 for ATA r = 0.45 for GGT). However, the major codons of a large number of amino acids show positive correlation with expression level, but there are a few amino acids whose major codons show negative correlation with expression level. These observations clearly indic ate the relationship between nucleotides composition and expression level. We also demonstrate that codon composition can be used to predict the expression of gene in a given condition. Software has been developed for calculating correlation between expression of gene and codon usage.

  • PDF

A review of gene selection methods based on machine learning approaches (기계학습 접근법에 기반한 유전자 선택 방법들에 대한 리뷰)

  • Lee, Hajoung;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.667-684
    • /
    • 2022
  • Gene expression data present the level of mRNA abundance of each gene, and analyses of gene expressions have provided key ideas for understanding the mechanism of diseases and developing new drugs and therapies. Nowadays high-throughput technologies such as DNA microarray and RNA-sequencing enabled the simultaneous measurement of thousands of gene expressions, giving rise to a characteristic of gene expression data known as high dimensionality. Due to the high-dimensionality, learning models to analyze gene expression data are prone to overfitting problems, and to solve this issue, dimension reduction or feature selection techniques are commonly used as a preprocessing step. In particular, we can remove irrelevant and redundant genes and identify important genes using gene selection methods in the preprocessing step. Various gene selection methods have been developed in the context of machine learning so far. In this paper, we intensively review recent works on gene selection methods using machine learning approaches. In addition, the underlying difficulties with current gene selection methods as well as future research directions are discussed.

Performance Comparison of Classication Methods with the Combinations of the Imputation and Gene Selection Methods

  • Kim, Dong-Uk;Nam, Jin-Hyun;Hong, Kyung-Ha
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1103-1113
    • /
    • 2011
  • Gene expression data is obtained through many stages of an experiment and errors produced during the process may cause missing values. Due to the distinctness of the data so called 'small n large p', genes have to be selected for statistical analysis, like classification analysis. For this reason, imputation and gene selection are important in a microarray data analysis. In the literature, imputation, gene selection and classification analysis have been studied respectively. However, imputation, gene selection and classification analysis are sequential processing. For this aspect, we compare the performance of classification methods after imputation and gene selection methods are applied to microarray data. Numerical simulations are carried out to evaluate the classification methods that use various combinations of the imputation and gene selection methods.

Characteristics of Oncolytic Adenovirus Replication and Gene Expression in Hypoxic Condition

  • Kim, Hong-Sung
    • Biomedical Science Letters
    • /
    • v.17 no.3
    • /
    • pp.185-190
    • /
    • 2011
  • Adenovirus type 5 (Ad5) vectors have been used for gene transfer to a wide variety of cell types in vivo and in vitro. The advantages of adenovirus vectors include the high titer of virus readily obtained in large scale preparations, their ability to transduce dividing and non dividing cells, and the high level of transgene expression. Since adenovirus vectors do not integrate in host cell DNA, there is a lack of insertional mutagenesis. However, many human tumor cells lack expression of the adenovirus 5 receptors and contain areas of hypoxia. In order to identify the pattern of replication and gene expression of oncolytic adenovirus in hypoxic condition, multiple different fiber modified Ads (Ad5F/S11, Ad5F/S35, Ad5F/K7, Ad5F/K21, and Ad5F/RGD) was compared. The replication of all fiber modified adenovirus was inhibited in hypoxic condition in HEK 293 cells, but gene expression has variety on different tumor cell lines and the level of coxackievirus and adenovirus receptor (CAR) expression. These data suggest that CAR expression pattern and hypoxic condition of tumor are considered for optimal oncolytic adenovirus application.