• Title/Summary/Keyword: Microarray Data

Search Result 473, Processing Time 0.024 seconds

Bioinformatics Resources of the Korean Bioinformation Center (KOBIC)

  • Lee, Byung-Wook;Chu, In-Sun;Kim, Nam-Shin;Lee, Jin-Hyuk;Kim, Seon-Yong;Kim, Wan-Kyu;Lee, Sang-Hyuk
    • Genomics & Informatics
    • /
    • v.8 no.4
    • /
    • pp.165-169
    • /
    • 2010
  • The Korean Bioinformation Center (KOBIC) is a national bioinformatics research center in Korea. We developed many bioinformatics algorithms and applications to facilitate the biological interpretation of OMICS data. Here we present an introduction to major bioinformatics resources of databases and tools developed at KOBIC. These resources are classified into three main fields: genome, proteome, and literature. In the genomic resources, we constructed several pipelines for next generation sequencing (NGS) data processing and developed analysis algorithms and web-based database servers including miRGator, ESTpass, and CleanEST. We also built integrated databases and servers for microarray expression data such as MDCDP. As for the proteome data, VnD database, WDAC, Localizome, and CHARMM_HM web servers are available for various purposes. We constructed IntoPub server and Patome database in the literature field. We continue constructing and maintaining the bioinformatics infrastructure and developing algorithms.

An integrated Bayesian network framework for reconstructing representative genetic regulatory networks.

  • Lee, Phil-Hyoun;Lee, Do-Heon;Lee, Kwang-Hyung
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.164-169
    • /
    • 2003
  • In this paper, we propose the integrated Bayesian network framework to reconstruct genetic regulatory networks from genome expression data. The proposed model overcomes the dimensionality problem of multivariate analysis by building coherent sub-networks from confined gene clusters and combining these networks via intermediary points. Gene Shaving algorithm is used to cluster genes that share a common function or co-regulation. Retrieved clusters incorporate prior biological knowledge such as Gene Ontology, pathway, and protein protein interaction information for extracting other related genes. With these extended gene list, system builds genetic sub-networks using Bayesian network with MDL score and Sparse Candidate algorithm. Identifying functional modules of genes is done by not only microarray data itself but also well-proved biological knowledge. This integrated approach can improve there liability of a network in that false relations due to the lack of data can be reduced. Another advantage is the decreased computational complexity by constrained gene sets. To evaluate the proposed system, S. Cerevisiae cell cycle data [1] is applied. The result analysis presents new hypotheses about novel genetic interactions as well as typical relationships known by previous researches [2].

  • PDF

CONSTRUCTING GENE REGULATORY NETWORK USING FREQUENT GENE EXPRESSION PATTERN MINING AND CHAIN RULES

  • Park, Hong-Kyu;Lee, Heon-Gyu;Cho, Kyung-Hwan;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.623-626
    • /
    • 2006
  • Group of genes controls the functioning of a cell by complex interactions. These interacting gene groups are called Gene Regulatory Networks (GRNs). Two previous data mining approaches, clustering and classification have been used to analyze gene expression data. While these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rule. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and detect gene expression patterns applying FP-growth algorithm. And then, we construct gene regulatory network from frequent gene patterns using chain rule. Finally, we validated our proposed method by showing that our experimental results are consistent with published results.

  • PDF

Non-negligible Occurrence of Errors in Gender Description in Public Data Sets

  • Kim, Jong Hwan;Park, Jong-Luyl;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • v.14 no.1
    • /
    • pp.34-40
    • /
    • 2016
  • Due to advances in omics technologies, numerous genome-wide studies on human samples have been published, and most of the omics data with the associated clinical information are available in public repositories, such as Gene Expression Omnibus and ArrayExpress. While analyzing several public datasets, we observed that errors in gender information occur quite often in public datasets. When we analyzed the gender description and the methylation patterns of gender-specific probes (glucose-6-phosphate dehydrogenase [G6PD], ephrin-B1 [EFNB1], and testis specific protein, Y-linked 2 [TSPY2]) in 5,611 samples produced using Infinium 450K HumanMethylation arrays, we found that 19 samples from 7 datasets were erroneously described. We also analyzed 1,819 samples produced using the Affymetrix U133Plus2 array using several gender-specific genes (X (inactive)-specific transcript [XIST], eukaryotic translation initiation factor 1A, Y-linked [EIF1AY], and DEAD [Asp-Glu-Ala-Asp] box polypeptide 3, Y-linked [DDDX3Y]) and found that 40 samples from 3 datasets were erroneously described. We suggest that the users of public datasets should not expect that the data are error-free and, whenever possible, that they should check the consistency of the data.

Meta-analysis of Gene Expression Data Identifies Causal Genes for Prostate Cancer

  • Wang, Xiang-Yang;Hao, Jian-Wei;Zhou, Rui-Jin;Zhang, Xiang-Sheng;Yan, Tian-Zhong;Ding, De-Gang;Shan, Lei
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.1
    • /
    • pp.457-461
    • /
    • 2013
  • Prostate cancer is a leading cause of death in male populations across the globe. With the advent of gene expression arrays, many microarray studies have been conducted in prostate cancer, but the results have varied across different studies. To better understand the genetic and biologic mechanisms of prostate cancer, we conducted a meta-analysis of two studies on prostate cancer. Eight key genes were identified to be differentially expressed with progression. After gene co-expression analysis based on data from the GEO database, we obtained a co-expressed gene list which included 725 genes. Gene Ontology analysis revealed that these genes are involved in actin filament-based processes, locomotion and cell morphogenesis. Further analysis of the gene list should provide important clues for developing new prognostic markers and therapeutic targets.

Class prediction of an independent sample using a set of gene modules consisting of gene-pairs which were condition(Tumor, Normal) specific (조건(암, 정상)에 따라 특이적 관계를 나타내는 유전자 쌍으로 구성된 유전자 모듈을 이용한 독립샘플의 클래스예측)

  • Jeong, Hyeon-Iee;Yoon, Young-Mi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.12
    • /
    • pp.197-207
    • /
    • 2010
  • Using a variety of data-mining methods on high-throughput cDNA microarray data, the level of gene expression in two different tissues can be compared, and DEG(Differentially Expressed Gene) genes in between normal cell and tumor cell can be detected. Diagnosis can be made with these genes, and also treatment strategy can be determined according to the cancer stages. Existing cancer classification methods using machine learning select the marker genes which are differential expressed in normal and tumor samples, and build a classifier using those marker genes. However, in addition to the differences in gene expression levels, the difference in gene-gene correlations between two conditions could be a good marker in disease diagnosis. In this study, we identify gene pairs with a big correlation difference in two sets of samples, build gene classification modules using these gene pairs. This cancer classification method using gene modules achieves higher accuracy than current methods. The implementing clinical kit can be considered since the number of genes in classification module is small. For future study, Authors plan to identify novel cancer-related genes with functionality analysis on the genes in a classification module through GO(Gene Ontology) enrichment validation, and to extend the classification module into gene regulatory networks.

Comparison of recently developed classification tools in microarray data analysis (마이크로어레이자료분석에서의 최신 분류방법들의 비교연구)

  • Lee, Jae-Won;Lee, Jeong-Bok;Park, Mi-Ra
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.05a
    • /
    • pp.99-104
    • /
    • 2002
  • cDNA 마이크로어레이자료를 이용한 분류방법은 수많은 유전자의 발현을 동시에 모니터링 할 수 있으므로 특정 질병간의 분자생물학적 변이를 이해하는데 있어 기존의 분류방법보다 신뢰성이 훨씬 높을 것으로 기대되고 있다 최근에 Dudoit et al.(2001)은 cDNA 마이크로어레이를 이용한 유전자발현자료의 분석에 있어 분류를 위한 여러 고전적인 판별분류기법 및 최근에 개발된 기법들을 비교, 평가하였다. 본 논문에서는 Dudoit et al.(2001)에서 다루지 않았던 많은 최신 기법들을 포함하여 인간의 종양 자료뿐만이 아니라 농작물을 포함한 동식물 자료에 적용하여 보다 폭넓은 비교연구를 하였다.

  • PDF

Spatial pattern and temporal mode analysis of microarray time-series data by independent component analysis (독립성분분석에 의한 유전자 발현 시계열 데이터의 공간적 패턴과 시간적 모드 분석)

  • Sookjeong, Kim;Seungjin, Choi
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.250-252
    • /
    • 2004
  • In this paper we apply several variations of independent component analysis( ICA) methods, such as spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA), to yeast cell cycle datasets, and compare their performance in finding components that result in gene clusters coherent with annotations and in extract ins meaningful temporal modes. It turns out that the results of tICA are superior to those of PCA, sICA, and stICA in terms of gene clustering and the temporal modes extracted by stICA highlights particular cellular processes.

  • PDF

Development of Correlation Based Feature Selection Method by Predicting the Markov Blanket for Gene Selection Analysis

  • Adi, Made;Yun, Zhen;Keong, Kwoh-Chee
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.183-187
    • /
    • 2005
  • In this paper, we propose a heuristic method to select features using a Two-Phase Markov Blanket-based (TPMB) algorithm. The first phase, filtering phase, of TPMB algorithm works by filtering the obviously redundant features. A non-linear correlation method based on Information theory is used as a metric to measure the redundancy of a feature [1]. In second phase, approximating phase, the Markov Blanket (MB) of a system is estimated by employing the concept of cross entropy to identify the MB. We perform experiments on microarray data and report two popular dataset, AML-ALL [3] and colon tumor [4], in this paper. The experimental results show that the TPMB algorithm can significantly reduce the number of features while maintaining the accuracy of the classifiers.

  • PDF

Genomic Applications of Biochip Informatics (유전체 발현의 정보학적 분석과 응용)

  • Kim, Ju-Han
    • KOGO NEWS
    • /
    • v.5 no.4
    • /
    • pp.9-16
    • /
    • 2005
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic expression data transforms the challenges m biomedical research into ones in bioinformatics. Clinical informatics has long developed technologies to imp개ve biomedical research by integrating experimental and clinical information systems. Biomedical informatics, powered by high throughput techniques, genomic-scale databases and advanced clinical information system, is likely to transform our biomedical understanding forever much the same way that biochemistry did to biology a generation ago. The emergence of healthcare and biomedical informatics revolutionizing both bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics and prognostics.

  • PDF