• Title/Summary/Keyword: outlier cells

Search Result 6, Processing Time 0.027 seconds

Effect of Genetic Correlations on the P Values from Randomization Test and Detection of Significant Gene Groups (유전자 연관성이 랜덤검정 P값과 유의 유전자군의 탐색에 미치는 영향)

  • Yi, Mi-Sung;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.781-792
    • /
    • 2009
  • At an early stage of genomic investigations, a small sample of microarrays is used in gene expression experiments to identify small subsets of candidate genes for a further accurate investigation. Unlike the statistical analysis methods for a large sample of microarrays, an appropriate statistical method for identifying small subsets is a randomization test that provides exact P values. These exact P values from a randomization test for a small sample of microarrays are discrete. The possible existence of differentially expressed genes in the sample of a full set of genes can be tested for the null hypothesis of a uniform distribution. Subsets of smaller P values are of prime interest for a further accurate investigation and identifying these outlier cells from a multinomial distribution of P values is possible by M test of Fuchs et al. (1980). Above all, the genome-wide gene expressions in microarrays are correlated, but the majority of statistical analysis methods in the microarray analysis are based on an independence assumption of genes and ignore the possibly correlated expression levels. We investigated with simulation studies the effect that correlated gene expression levels could have on the randomization test results and M test results, and found that the effects are often not ignorable.

Creating Subnetworks from Transcriptomic Data on Central Nervous System Diseases Informed by a Massive Transcriptomic Network

  • Feng, Yaping;Syrkin-Nikolau, Judith A.;Wurtele, Eve S.
    • Interdisciplinary Bio Central
    • /
    • v.5 no.1
    • /
    • pp.1.1-1.8
    • /
    • 2013
  • High quality publicly-available transcriptomic data representing relationships in gene expression across a diverse set of biological conditions is used as a context network to explore transcriptomics of the CNS. The context network, 18367Hu-matrix, contains pairwise Pearson correlations for 22,215 human genes across18,637 human tissue samples1. To do this, we compute a network derived from biological samples from CNS cells and tissues, calculate clusters of co-expressed genes from this network, and compare the significance of these to clusters derived from the larger 18367Hu-matrix network. Sorting and visualization uses the publicly available software, MetaOmGraph (http://www.metnetdb.org/MetNet_MetaOm-Graph.htm). This identifies genes that characterize particular disease conditions. Specifically, differences in gene expression within and between two designations of glial cancer, astrocytoma and glioblastoma, are evaluated in the context of the broader network. Such gene groups, which we term outlier-networks, tease out abnormally expressed genes and the samples in which this expression occurs. This approach distinguishes 48 subnetworks of outlier genes associated with astrocytoma and glioblastoma. As a case study, we investigate the relationships among the genes of a small astrocytoma-only subnetwork. This astrocytoma-only subnetwork consists of SVEP1, IGF1, CHRNA3, and SPAG6. All of these genes are highly coexpressed in a single sample of anaplastic astrocytoma tumor (grade III) and a sample of juvenile pilocytic astrocytoma. Three of these genes are also associated with nicotine. This data lead us to formulate a testable hypothesis that this astrocytoma outlier-network provides a link between some gliomas/astrocytomas and nicotine.

HQSAR Study of Tricyclic Azepine Derivatives as an EGFR (Epidermal Growth Factor Receptor) Inhibitors

  • Chung, Hwan-Won;Lee, Kyu-Whan;Oh, Jung-Soo;Cho, Seung-Joo
    • Molecular & Cellular Toxicology
    • /
    • v.3 no.3
    • /
    • pp.159-164
    • /
    • 2007
  • Stimulation of epidermal growth factor receptor (EGFR) is essential in signaling pathway of tumor cells. Thus, EGFR has intensely studied as an anticancer target. We developed hologram quantitative structure activity relationship (HQSAR) models for data set which consists of tricyclic azepine derivatives showing inhibitory activities for EGFR. The optimal HQSAR model was generated with fragment size of 6 to 7 while differentiating fragments having different atom and connectivity. The model showed cross-validated $q^2$ value of 0.61 and non-cross-validated $r^2$ value of 0.93. When the model was validated with an external set excluding one outlier, it gave predictive $r^2$ value of 0.43. The contribution maps generated from this model were used to interpret the atomic contribution of each atom to the overall inhibition activity. This can be used to find more efficient EGFR inhibitors.

Accelerated Evolution of the Regulatory Sequences of Brain Development in the Human Genome

  • Lee, Kang Seon;Bang, Hyoeun;Choi, Jung Kyoon;Kim, Kwoneel
    • Molecules and Cells
    • /
    • v.43 no.4
    • /
    • pp.331-339
    • /
    • 2020
  • Genetic modifications in noncoding regulatory regions are likely critical to human evolution. Human-accelerated noncoding elements are highly conserved noncoding regions among vertebrates but have large differences across humans, which implies human-specific regulatory potential. In this study, we found that human-accelerated noncoding elements were frequently coupled with DNase I hypersensitive sites (DHSs), together with monomethylated and trimethylated histone H3 lysine 4, which are active regulatory markers. This coupling was particularly pronounced in fetal brains relative to adult brains, non-brain fetal tissues, and embryonic stem cells. However, fetal brain DHSs were also specifically enriched in deeply conserved sequences, implying coexistence of universal maintenance and human-specific fitness in human brain development. We assessed whether this coexisting pattern was a general one by quantitatively measuring evolutionary rates of DHSs. As a result, fetal brain DHSs showed a mixed but distinct signature of regional conservation and outlier point acceleration as compared to other DHSs. This finding suggests that brain developmental sequences are selectively constrained in general, whereas specific nucleotides are under positive selection or constraint relaxation simultaneously. Hence, we hypothesize that human- or primate-specific changes to universally conserved regulatory codes of brain development may drive the accelerated, and most likely adaptive, evolution of the regulatory network of the human brain.

DNA Microarray Analysis of the Gene Expression Profile of Activated Human Umbilical Vein En-dothelial Cells. (올리고 마이크로어래이를 이용한 활성화된 인간 제대 정맥 내피세포의 유전자 발현 조사)

  • 김선용;오호균;이수영;남석우;이정용;안현영;신종철;홍용길;조영애
    • Journal of Life Science
    • /
    • v.14 no.5
    • /
    • pp.874-881
    • /
    • 2004
  • Angiogenesis has been implicated in progression of inflammation, arthritis, psoriasis, atherosclerosis as well as tumor growth and metastasis. Intensive studies have been carried out to develop a strategy for cancer treatment by blocking angiogenesis. During angiogenesis, endothelial proliferation and migration essentially occurs upon activation. In this study, we compared the expression profiles of human umbilical endothelial cells activated by incubating in vitro in the rich medium containing several growth factors, and non-activated ones. cDNA targets derived from total RNAs of HUVEC activated for 13 h in M199 medium containing endothelial cell growth supplement, 20% fetal bovine serum, and heparin, after reaching 70~80% confluency, or non-activated, were hybridized onto oligonucleotide microarrays containing 1,8864 genetic elements. Unsupervised hierarchical clustering analysis resulted in two subgroups on dendrogram exhibiting activated and non-activated HUVECs. We then extracted 122 outlier genes which were shown to be up-regulated or under-expressed by at least 2-folds in activated HUVECs. Among these, 32 annotated genes were up-regulated and 38 were down-regulated in activated HUVECs. Interestingly, genes involved in cell proliferation, motility, and inflammation/ immune response were up-regulated in activated HUVEC, whereas genes for cell adhesion or vessel morphogenesis/function were down-regulated. Unexpectedly, the expression of genes well-characterized as angiogenesis markers was not changed except Eph-B4, which was down-regulated about 4 folds. 52 unknown genes were also up- or down-regulated. Therefore, these results could provide an opportunity to targeting new vascular molecules for the development of anti-angiogenic molecules.

An Enhanced Density and Grid based Spatial Clustering Algorithm for Large Spatial Database (대용량 공간데이터베이스를 위한 확장된 밀도-격자 기반의 공간 클러스터링 알고리즘)

  • Gao, Song;Kim, Ho-Seok;Xia, Ying;Kim, Gyoung-Bae;Bae, Hae-Young
    • The KIPS Transactions:PartD
    • /
    • v.13D no.5 s.108
    • /
    • pp.633-640
    • /
    • 2006
  • Spatial clustering, which groups similar objects based on their distance, connectivity, or their relative density in space, is an important component of spatial data mining. Density-based and grid-based clustering are two main clustering approaches. The former is famous for its capability of discovering clusters of various shapes and eliminating noises, while the latter is well known for its high speed. Clustering large data sets has always been a serious challenge for clustering algorithms, because huge data set would make the clustering process extremely costly. In this paper, we propose an enhanced Density-Grid based Clustering algorithm for Large spatial database by setting a default number of intervals and removing the outliers effectively with the help of a proper measurement to identify areas of high density in the input data space. We use a density threshold DT to recognize dense cells before neighbor dense cells are combined to form clusters. When proposed algorithm is performed on large dataset, a proper granularity of each dimension in data space and a density threshold for recognizing dense areas can improve the performance of this algorithm. We combine grid-based and density-based methods together to not only increase the efficiency but also find clusters with arbitrary shape. Synthetic datasets are used for experimental evaluation which shows that proposed method has high performance and accuracy in the experiments.