• 제목/요약/키워드: gene ranking

검색결과 22건 처리시간 0.027초

Applying a modified AUC to gene ranking

  • Yu, Wenbao;Chang, Yuan-Chin Ivan;Park, Eunsik
    • Communications for Statistical Applications and Methods
    • /
    • 제25권3호
    • /
    • pp.307-319
    • /
    • 2018
  • High-throughput technologies enable the simultaneous evaluation of thousands of genes that could discriminate different subclasses of complex diseases. Ranking genes according to differential expression is an important screening step for follow-up analysis. Many statistical measures have been proposed for this purpose. A good ranked list should provide a stable rank (at least for top-ranked gene), and the top ranked genes should have a high power in differentiating different disease status. However, there is a lack of emphasis in the literature on ranking genes based on these two criteria simultaneously. To achieve the above two criteria simultaneously, we proposed to apply a previously reported metric, the modified area under the receiver operating characteristic cure, to gene ranking. The proposed ranking method is found to be promising in leading to a stable ranking list and good prediction performances of top ranked genes. The findings are illustrated through studies on both synthesized data and real microarray gene expression data. The proposed method is recommended for ranking genes or other biomarkers for high-dimensional omics studies.

Fisher Criterion을 이용한 Gene Set Enrichment Analysis 기반 유의 유전자 집합의 검출 방법 연구 (Identifying Statistically Significant Gene-Sets by Gene Set Enrichment Analysis Using Fisher Criterion)

  • 김재영;신미영
    • 전자공학회논문지CI
    • /
    • 제45권4호
    • /
    • pp.19-26
    • /
    • 2008
  • Gene set enrichment analysis (GSEA)는 두 개의 클래스를 가지는 마이크로어레이 실험 데이터 분석을 위해 생물학적 특징을 기반으로 구성된 다양한 유전자-집합 중에서 두 클래스의 발현값들이 통계적으로 중요한 차이를 나타내는 유의한 유전자-집합을 추출하기 위한 분석 방법이다. 특히, 유전자에 대한 다양한 생물학적인 정보를 지닌 유전자 주석 데이터베이스(Cytogenetic Band, KEGG pathway, Gene Ontology 등)를 이용하여 마이크로어레이 실험에 사용된 전체 유전자 중 특정 기능을 가지는 유전자들을 그룹화하여 다양한 유전자-집합을 발굴하고, 각 유전자-집합 내에서 두 클래스간에 발현값의 차이를 참조하여 유의한 유전자들을 결정하여, 이를 기반으로 통계적으로 유의한 유전자-집합들을 최종 검출하는 방법이다. 본 논문에서는 GSEA 분석 과정에서 현재 주로 사용되고 있는 signal-to-noise ratio 기반 유전자 서열화(gene ranking) 방법 대신에, Fisher criterion을 이용한 유전자 서열화 방법을 적용함으로써 기존의 GSEA 방법에서 추출하지 못한 생물학적으로 의미 있는 새로운 유의 유전자-집합을 추출하는 방법을 제안하고자 한다. 또한, 제안한 방법의 성능을 고찰하기 위하여 공개된 Leukemia 관련 마이크로어레이 실험 데이터 분석에 적용하였으며, 기존의 알려진 결과와 비교 분석함으로써 제안한 방법의 유용성을 검증하고자 하였다.

Evaluation of reference genes for RT-qPCR study in abalone Haliotis discus hannai during heavy metal overload stress

  • Lee, Sang Yoon;Nam, Yoon Kwon
    • Fisheries and Aquatic Sciences
    • /
    • 제19권4호
    • /
    • pp.21.1-21.11
    • /
    • 2016
  • Background: The evaluation of suitable reference genes as normalization controls is a prerequisite requirement for launching quantitative reverse transcription-PCR (RT-qPCR)-based expression study. In order to select the stable reference genes in abalone Haliotis discus hannai tissues (gill and hepatopancreas) under heavy metal exposure conditions (Cu, Zn, and Cd), 12 potential candidate housekeeping genes were subjected to expression stability based on the comprehensive ranking while integrating four different statistical algorithms (geNorm, NormFinder, BestKeeper, and ${\Delta}CT$ method). Results: Expression stability in the gill subset was determined as RPL7 > RPL8 > ACTB > RPL3 > PPIB > RPL7A > EF1A > RPL4 > GAPDH > RPL5 > UBE2 > B-TU. On the other hand, the ranking in the subset for hepatopancreas was RPL7 > RPL3 > RPL8 > ACTB > RPL4 > EF1A > RPL5 > RPL7A > B-TU > UBE2 > PPIB > GAPDH. The pairwise variation assessed by the geNorm program indicates that two reference genes could be sufficient for accurate normalization in both gill and hepatopancreas subsets. Overall, both gill and hepatopancreas subsets recommended ribosomal protein genes (particularly RPL7) as stable references, whereas traditional housekeepers such as ${\beta}-tubulin$ (B-TU) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) genes were ranked as unstable genes. The validation of reference gene selection was confirmed with the quantitative assay of MT transcripts. Conclusions: The present analysis showed the importance of validating reference genes with multiple algorithmic approaches to select genes that are truly stable. Our results indicate that expression stability of a given reference gene could not always have consensus across tissue types. The data from this study could be a good guide for the future design of RT-qPCR studies with respect to metal regulation/detoxification and other related physiologies in this abalone species.

Comparative Statistic Module (CSM) for Significant Gene Selection

  • Kim, Young-Jin;Kim, Hyo-Mi;Kim, Sang-Bae;Park, Chan;Kimm, Kuchan;Koh, InSong
    • Genomics & Informatics
    • /
    • 제2권4호
    • /
    • pp.180-183
    • /
    • 2004
  • Comparative Statistic Module(CSM) provides more reliable list of significant genes to genomics researchers by offering the commonly selected genes and a method of choice by calculating the rank of each statistical test based on the average ranking of common genes across the five statistical methods, i.e. t-test, Kruskal-Wallis (Wilcoxon signed rank) test, SAM, two sample multiple test, and Empirical Bayesian test. This statistical analysis module is implemented in Perl, and R languages.

Prediction of hub genes of Alzheimer's disease using a protein interaction network and functional enrichment analysis

  • Wee, Jia Jin;Kumar, Suresh
    • Genomics & Informatics
    • /
    • 제18권4호
    • /
    • pp.39.1-39.8
    • /
    • 2020
  • Alzheimer's disease (AD) is a chronic, progressive brain disorder that slowly destroys affected individuals' memory and reasoning faculties, and consequently, their ability to perform the simplest tasks. This study investigated the hub genes of AD. Proteins interact with other proteins and non-protein molecules, and these interactions play an important role in understanding protein function. Computational methods are useful for understanding biological problems, in particular, network analyses of protein-protein interactions. Through a protein network analysis, we identified the following top 10 hub genes associated with AD: PTGER3, C3AR1, NPY, ADCY2, CXCL12, CCR5, MTNR1A, CNR2, GRM2, and CXCL8. Through gene enrichment, it was identified that most gene functions could be classified as integral to the plasma membrane, G-protein coupled receptor activity, and cell communication under gene ontology, as well as involvement in signal transduction pathways. Based on the convergent functional genomics ranking, the prioritized genes were NPY, CXCL12, CCR5, and CNR2.

Validation of housekeeping genes as candidate internal references for quantitative expression studies in healthy and nervous necrosis virus-infected seven-band grouper (Hyporthodus septemfasciatus)

  • Krishnan, Rahul;Qadiri, Syed Shariq Nazir;Kim, Jong-Oh;Kim, Jae-Ok;Oh, Myung-Joo
    • Fisheries and Aquatic Sciences
    • /
    • 제22권12호
    • /
    • pp.28.1-28.8
    • /
    • 2019
  • Background: In the present study, we evaluated four commonly used housekeeping genes, viz., actin-β, elongation factor-1α (EF1α), acidic ribosomal protein (ARP), and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as internal references for quantitative analysis of immune genes in nervous necrosis virus (NNV)-infected seven-band grouper, Hyporthodus septemfasciatus. Methods: Expression profiles of the four genes were estimated in 12 tissues of healthy and infected seven-band grouper. Expression stability of the genes was calculated using the delta Ct method, BestKeeper, NormFinder, and geNorm algorithms. Consensus ranking was performed using RefFinder, and statistical analysis was done using GraphpadPrism 5.0. Results: Tissue-specific variations were observed in the four tested housekeeping genes of healthy and NNV-infected seven-band grouper. Fold change calculation for interferon-1 and Mx expression using the four housekeeping genes as internal references presented varied profiles for each tissue. EF1α and actin-β was the most stable expressed gene in tissues of healthy and NNV-infected seven-band grouper, respectively. Consensus ranking using RefFinder suggested EF1α as the least variable and highly stable gene in the healthy and infected animals. Conclusions: These results suggest that EF1α can be a fairly better internal reference in comparison to other tested genes in this study during the NNV infection process. This forms the pilot study on the validation of reference genes in Hyporthodus septemfasciatus, in the context of NNV infection.

효모 마이크로어레이 유전자 발현데이터에 대한 가우시안 과정 회귀를 이용한 유전자 선별 및 군집화 (Screening and Clustering for Time-course Yeast Microarray Gene Expression Data using Gaussian Process Regression)

  • 김재희;김태훈
    • 응용통계연구
    • /
    • 제26권3호
    • /
    • pp.389-399
    • /
    • 2013
  • 본 연구에서는 가우시안 과정회귀방법을 소개하고 시계열 마이크로어레이 유전자 발현데이터에 대해 가우시안 과정회귀를 적용한 사례를 보이고자한다. 가우시안 과정회귀를 적합하여 로그 주변우도함수 비를 이용한 유전자를 선별방법에 대한 모의실험을 통해 민감도, 특이도, 위발견율 등을 계산하여 선별방법으로의 활용성을 보였다. 실제 효모세포주기 데이터에 대해 제곱지수공분산함수를 고려한 가우시안 과정회귀를 적합하여 로그 주변우도함수 비를 이용하여 차변화된 유전자를 선별한 후, 선별된 유전자들에 대해 가우시안 모형기반 군집화를 하고 실루엣 값으로 군집유효성을 보였다.

유전자 알고리즘 기반 유사도 변환을 이용한 순위 재조정 검색 모델 (Re-Ranking Retrieval Model Using Similarity Transformation Based on Gene Algorithm)

  • 이재훈;이성주
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2005년도 추계학술대회 학술발표 논문집 제15권 제2호
    • /
    • pp.331-334
    • /
    • 2005
  • 정보$\cdot$통신과학의 발달로 다양한 영역에서 수많은 정보들이 발생하고 있다. 그 결과 사용자의 요구에 무분별한 응답을 제시하는 검색 모델도 발생하였다. 본 논문은 정보들 사이의 유사도를 변환하고 순위를 재조정하여 더욱 적합한 정보를 상위 순위에 제시함으로써 사용자 요구에 더욱 적합한 정보를 획득할 수 있는 모델에 대해 연구하였다.

  • PDF

Assessment of Suitable Reference Genes for RT-qPCR Normalization with Developmental Samples in Pacific Abalone Haliotis discus hannai

  • Lee, Sang Yoon;Park, Choul-Ji;Nam, Yoon Kwon
    • 한국동물생명공학회지
    • /
    • 제34권4호
    • /
    • pp.280-291
    • /
    • 2019
  • Potential utility of 14 candidate housekeeping genes as normalization reference for RT-qPCR analysis with developmental samples (fertilized eggs to late veliger larvae) in Pacific abalone Haliotis discus hannai was evaluated using four different statistical algorithms (geNorm, NormFinder, BestKeeper and comparative ΔCT method). Different algorithms identified different genes as the best candidates, and geometric mean-based final ranking from the most to the least stable expression was as follow: RPL5, RPL4, RPS18, RPL8, RPL7, UBE2, RPL7A, GAPDH, RPL36, PPIB, EF1A, ACTB and B-TU. The findings were further validated via relative quantification of metallothionein (MT) transcripts using the stable and unstable reference genes, and expression levels of MT were greatly influenced according to the choice of reference genes. In overall, our data suggest that RPL5 and RPS18, either singly or in combination, are appropriate for normalizing gene expression in developmental samples of this abalone species, whereas ACTB, B-TU and EF1A are less stable and not recommended. In addition, our findings propose that standard deviations in geometric ranking as well as geometric mean itself should also be taken into account for the final selection of reference gene(s). This study could be a useful basis to facilitate the generation of accurate and reliable RT-qPCR data with developmental samples in this abalone species.

Ranking Candidate Genes for the Biomarker Development in a Cancer Diagnostics

  • Kim, In-Young;Lee, Sun-Ho;Rha, Sun-Young;Kim, Byung-Soo
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2004년도 The 3rd Annual Conference for The Korean Society for Bioinformatics Association of Asian Societies for Bioinformatics 2004 Symposium
    • /
    • pp.272-278
    • /
    • 2004
  • Recently, Pepe et al. (2003) employed the receiver operating characteristic (ROC) approach to rank candidate genes from a microarray experiment that can be used for the biomarker development with the ultimate purpose of the population screening of a cancer, In the cancer microarray experiment based on n patients the researcher often wants to compare the tumor tissue with the normal tissue within the same individual using a common reference RNA. This design is referred to as a reference design or an indirect design. Ideally, this experiment produces n pairs of microarray data, where each pair consists of two sets of microarray data resulting from reference versus normal tissue and reference versus tumor tissue hybridizations. However, for certain individuals either normal tissue or tumor tissue is not large enough for the experimenter to extract enough RNA for conducting the microarray experiment, hence there are missing values either in the normal or tumor tissue data. Practically, we have $n_1$ pairs of complete observations, $n_2$ 'normal only' and $n_3$ 'tumor only' data for the microarray experiment with n patients, where n=$n_1$+$n_2$+$n_3$. We refer to this data set as a mixed data set, as it contains a mix of fully observed and partially observed pair data. This mixed data set was actually observed in the microarray experiment based on human tissues, where human tissues were obtained during the surgical operations of cancer patients. Pepe et al. (2003) provide the rationale of using ROC approach based on two independent samples for ranking candidate gene instead of using t or Mann -Whitney statistics. We first modify ROC approach of ranking genes to a paired data set and further extend it to a mixed data set by taking a weighted average of two ROC values obtained by the paired data set and two independent data sets.

  • PDF