• 제목/요약/키워드: Bioinformatics data

검색결과 645건 처리시간 0.028초

Information Technology Infrastructure for Agriculture Genotyping Studies

  • Pardamean, Bens;Baurley, James W.;Perbangsa, Anzaludin S.;Utami, Dwinita;Rijzaani, Habib;Satyawan, Dani
    • Journal of Information Processing Systems
    • /
    • 제14권3호
    • /
    • pp.655-665
    • /
    • 2018
  • In efforts to increase its agricultural productivity, the Indonesian Center for Agricultural Biotechnology and Genetic Resources Research and Development has conducted a variety of genomic studies using high-throughput DNA genotyping and sequencing. The large quantity of data (big data) produced by these biotechnologies require high performance data management system to store, backup, and secure data. Additionally, these genetic studies are computationally demanding, requiring high performance processors and memory for data processing and analysis. Reliable network connectivity with large bandwidth to transfer data is essential as well as database applications and statistical tools that include cleaning, quality control, querying based on specific criteria, and exporting to various formats that are important for generating high yield varieties of crops and improving future agricultural strategies. This manuscript presents a reliable, secure, and scalable information technology infrastructure tailored to Indonesian agriculture genotyping studies.

Short Reads Phasing to Construct Haplotypes in Genomic Regions That Are Associated with Body Mass Index in Korean Individuals

  • Lee, Kichan;Han, Seonggyun;Tark, Yeonjeong;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • 제12권4호
    • /
    • pp.165-170
    • /
    • 2014
  • Genome-wide association (GWA) studies have found many important genetic variants that affect various traits. Since these studies are useful to investigate untyped but causal variants using linkage disequilibrium (LD), it would be useful to explore the haplotypes of single-nucleotide polymorphisms (SNPs) within the same LD block of significant associations based on high-density variants from population references. Here, we tried to make a haplotype catalog affecting body mass index (BMI) through an integrative analysis of previously published whole-genome next-generation sequencing (NGS) data of 7 representative Korean individuals and previously known Korean GWA signals. We selected 435 SNPs that were significantly associated with BMI from the GWA analysis and searched 53 LD ranges nearby those SNPs. With the NGS data, the haplotypes were phased within the LDs. A total of 44 possible haplotype blocks for Korean BMI were cataloged. Although the current result constitutes little data, this study provides new insights that may help to identify important haplotypes for traits and low variants nearby significant SNPs. Furthermore, we can build a more comprehensive catalog as a larger dataset becomes available.

Currents in Integrative Biochip Informatics

  • Kim, Ju-Han
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2001년도 제2회 생물정보 워크샵 (DNA Chip Bioinformatics)
    • /
    • pp.1-9
    • /
    • 2001
  • scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences and information technology. The informatics revolutions both in clinical informatics and bioinformatics will change the current paradigm of biomedical sciences and practice of clinical medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. In this talk, 1 will describe how these technologies will in pact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics. Basic data preprocessing with normalization and filtering, primary pattern analysis, and machine teaming algorithms will be presented. Issues of integrated biochip informatics technologies including multivariate data projection, gene-metabolic pathway mapping, automated biomolecular annotation, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples from ongoing research activities in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

  • PDF

BINGO: Biological Interpretation Through Statistically and Graph-theoretically Navigating Gene $Ontology^{TM}$

  • Lee, Sung-Geun;Yang, Jae-Seong;Chung, Il-Kyung;Kim, Yang-Seok
    • Molecular & Cellular Toxicology
    • /
    • 제1권4호
    • /
    • pp.281-283
    • /
    • 2005
  • Extraction of biologically meaningful data and their validation are very important for toxicogenomics study because it deals with huge amount of heterogeneous data. BINGO is an annotation mining tool for biological interpretation of gene groups. Several statistical modeling approaches using Gene Ontology (GO) have been employed in many programs for that purpose. The statistical methodologies are useful in investigating the most significant GO attributes in a gene group, but the coherence of the resultant GO attributes over the entire group is rarely assessed. BINGO complements the statistical methods with graph-theoretic measures using the GO directed acyclic graph (DAG) structure. In addition, BINGO visualizes the consistency of a gene group more intuitively with a group-based GO subgraph. The input group can be any interesting list of genes or gene products regardless of its generation process if the group is built under a functional congruency hypothesis such as gene clusters from DNA microarray analysis.

Computational Challenges for Integrative Genomics

  • Kim, Junhyong;Magwene, Paul
    • Genomics & Informatics
    • /
    • 제2권1호
    • /
    • pp.7-18
    • /
    • 2004
  • Integrated genomics refers to the use of large-scale, systematically collected data from various sources to address biological and biomedical problems. A critical ingredient to a successful research program in integrated genomics is the establishment of an effective computational infrastructure. In this review, we suggest that the computational infrastructure challenges include developing tools for heterogeneous data organization and access, innovating techniques for combining the results of different analyses, and establishing a theoretical framework for integrating biological and quantitative models. For each of the three areas - data integration, analyses integration, and model integration - we review some of the current progress and suggest new topics of research. We argue that the primary computational challenges lie in developing sound theoretical foundations for understanding the genome rather than simply the development of algorithms and programs.

Whole-genome sequence analysis through online web interfaces: a review

  • Gunasekara, A.W.A.C.W.R.;Rajapaksha, L.G.T.G.;Tung, T.L.
    • Genomics & Informatics
    • /
    • 제20권1호
    • /
    • pp.3.1-3.10
    • /
    • 2022
  • The recent development of whole-genome sequencing technologies paved the way for understanding the genomes of microorganisms. Every whole-genome sequencing (WGS) project requires a considerable cost and a massive effort to address the questions at hand. The final step of WGS is data analysis. The analysis of whole-genome sequence is dependent on highly sophisticated bioinformatics tools that the research personal have to buy. However, many laboratories and research institutions do not have the bioinformatics capabilities to analyze the genomic data and therefore, are unable to take maximum advantage of whole-genome sequencing. In this aspect, this study provides a guide for research personals on a set of bioinformatics tools available online that can be used to analyze whole-genome sequence data of bacterial genomes. The web interfaces described here have many advantages and, in most cases exempting the need for costly analysis tools and intensive computing resources.

Graphical Models for DNA Microarray Data Mining

  • 양진산;장병탁
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2002년도 제1차워크샵
    • /
    • pp.49-61
    • /
    • 2002
  • 현대적 실험방법 및 유전공학의 발전으로 최근 생물학적 자료는 비약적으로 늘어나고 있다. 이러한 자료의 기계학습을 이용한 분석방법은 많은 비용과 시간을 요구하는 전통적인 생물적 실험에 있어서 실험 시간을 단축시켜주고 실험비용을 줄여 주게 된다. 본 논문에서는 특별히 micro array data의 분석에 있어서 graphical model에 기반한 기계학습 방법들을 소개한다. 이중 GTM 은 특히 시각화 효과가 뛰어난 방법으로 Graphical model 에 기반한 GTM의 제반 특성을 소개하고 이를 yeast data의 분석에 적용시킨 결과를 자세히 알아보고자 한다. (**Presentation file을 수신 보관 중)

  • PDF

Discovering information from biological data

  • Wong, Lim-Soon
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.39-40
    • /
    • 2000
  • Knowledge discovery has attracted increased attention in the biomedical industry in recent years is due to the increased availability of huge amount of biomedical data and the imminent need to turn such data into useful information and knowledge. In this talk, we discuss knowledge discovery techniques for gene expression analysis and MHC-peptide binding prediction in the context of discovering protein antigens and hot spots in these antigens.

  • PDF

독성유전체학 연구를 위한 지능적 데이터 관리 시스템 (TEST DB: The intelligent data management system for Toxicogenomics)

  • Lee, Wan-Seon;Jeon, Ki-Seon;Um, Chan-Hwi;Hwang, Seung-Young;Jung, Jin-Wook;Kim, Seung-Jun;Kang, Kyung-Sun;Park, Joon-Suk;Hwang, Jae-Woong;Kang, Jong-Soo;Lee, Gyoung-Jae;Chon, Kum-Jin;Kim, Yang-Suk
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.66-72
    • /
    • 2003
  • Toxicogenomics is now emerging as one of the most important genomics application because the toxicity test based on gene expression profiles is expected more precise and efficient than current histopathological approach in pre-clinical phase. One of the challenging points in Toxicogenomics is the construction of intelligent database management system which can deal with very heterogeneous and complex data from many different experimental and information sources. Here we present a new Toxicogenomics database developed as a part of 'Toxicogenomics for Efficient Safety Test (TEST) project'. The TEST database is especially focused on the connectivity of heterogeneous data and intelligent query system which enables users to get inspiration from the complex data sets. The database deals with four kinds of information; compound information, histopathological information, gene expression information, and annotation information. Currently, TEST database has Toxicogenomics information fer 12 molecules with 4 efficacy classes; anti cancer, antibiotic, hypotension, and gastric ulcer. Users can easily access all kinds of detailed information about there compounds and simultaneously, users can also check the confidence of retrieved information by browsing the quality of experimental data and toxicity grade of gene generated from our toxicology annotation system. Intelligent query system is designed for multiple comparisons of experimental data because the comparison of experimental data according to histopathological toxicity, compounds, efficacy, and individual variation is crucial to find common genetic characteristics .Our presented system can be a good information source for the study of toxicology mechanism in the genome-wide level and also can be utilized fur the design of toxicity test chip.

  • PDF

Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for Primitive Bioinformatics Tasks and Choosing a Suitable Language

  • Ryu, Tae-Wan
    • International Journal of Contents
    • /
    • 제5권2호
    • /
    • pp.6-15
    • /
    • 2009
  • Recently many different programming languages have emerged for the development of bioinformatics applications. In addition to the traditional languages, languages from open source projects such as BioPerl, BioPython, and BioJava have become popular because they provide special tools for biological data processing and are easy to use. However, it is not well-studied which of these programming languages will be most suitable for a given bioinformatics task and which factors should be considered in choosing a language for a project. Like many other application projects, bioinformatics projects also require various types of tasks. Accordingly, it will be a challenge to characterize all the aspects of a project in order to choose a language. However, most projects require some common and primitive tasks such as file I/O, text processing, and basic computation for counting, translation, statistics, etc. This paper presents the benchmarking results of six popular languages, Perl, BioPerl, Python, BioPython, Java, and BioJava, for several common and simple bioinformatics tasks. The experimental results of each language are compared through quantitative evaluation metrics such as execution time, memory usage, and size of the source code. Other qualitative factors, including writeability, readability, portability, scalability, and maintainability, that affect the success of a project are also discussed. The results of this research can be useful for developers in choosing an appropriate language for the development of bioinformatics applications.