• 제목/요약/키워드: Bioinformatics data

검색결과 645건 처리시간 0.023초

Building an Integrated Protein Data Management System Using the XPath Query Process

  • Cha Hyo Soung;Jung Kwang Su;Jung Young Jin;Ryu Keun Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.99-102
    • /
    • 2004
  • Recently according to developing of bioinformatics techniques, there are a lot of researches about large amount of biological data. And a variety of files and databases are being used to manage these data efficiently. However, because of the deficiency of standardization there are a lot of problems to manage the data and transform one into the other among heterogeneous formats. We are interested in integrating. saving, and managing gene and protein sequence data generated through sequencing. Accordingly, in this paper the goal of our research is to implement the system to manage sequence data and transform a sequence file format into other format. To satisfy these requirements, we adopt BSML (Bioinformatics Sequence Markup Language) as the standard to manage the bioinformatics data. And then we integrate and store the heterogeneous 리at file formats using BSML schema based DTD. And we developed the system to apply the characteristics of object-oriented database and to process XPath query, one of the efficient structural query. that saves and manages XML documents easily.

  • PDF

Considerations on gene chip data analysis

  • Lee, Jae-K.
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2001년도 제2회 생물정보학 국제심포지엄
    • /
    • pp.77-102
    • /
    • 2001
  • Different high-throughput chip technologies are available for genome-wide gene expression studies. Quality control and prescreening analysis are important for rigorous analysis on each type of gene expression data. Statistical significance evaluation of differential expression patterns is needed. Major genome institutes develop database and analysis systems for information sharing of precious expression data.

  • PDF

Data-processing pipeline and database design for integrated analysis of mycoviruses

  • Je, Mikyung;Son, Hyeon Seok;Kim, Hayeon
    • International journal of advanced smart convergence
    • /
    • 제8권3호
    • /
    • pp.115-122
    • /
    • 2019
  • Recent and ongoing discoveries of mycoviruses with new properties demand the development of an appropriate research infrastructure to analyze their evolution and classification. In particular, the discovery of negative-sense single-stranded mycoviruses is worth noting in genome types in which double-stranded RNA virus and positive-sense single-stranded RNA virus were predominant. In addition, some genomic properties of mycoviruses are more interesting because they have been reported to have similarities with the pathogenic virus family that infects humans and animals. Genetic information on mycoviruses continues to accumulate in public repositories; however, these databases have some difficulty reflecting the latest taxonomic information and obtaining specialized data for mycoviruses. Therefore, in this study, we developed a bioinformatics-based pipeline to efficiently utilize this genetic information. We also designed a schema for data processing and database construction and an algorithm to keep taxonomic information of mycoviruses up to date. The pipeline and database (termed 'mycoVDB') presented in this study are expected to serve as useful foundations for improving the accuracy and efficiency of future research on mycoviruses.

EXTENDED ONLINE DIVISIVE AGGLOMERATIVE CLUSTERING

  • Musa, Ibrahim Musa Ishag;Lee, Dong-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.406-409
    • /
    • 2008
  • Clustering data streams has an importance over many applications like sensor networks. Existing hierarchical methods follow a semi fuzzy clustering that yields duplicate clusters. In order to solve the problems, we propose an extended online divisive agglomerative clustering on data streams. It builds a tree-like top-down hierarchy of clusters that evolves with data streams using geometric time frame for snapshots. It is an enhancement of the Online Divisive Agglomerative Clustering (ODAC) with a pruning strategy to avoid duplicate clusters. Our main features are providing update time and memory space which is independent of the number of examples on data streams. It can be utilized for clustering sensor data and network monitoring as well as web click streams.

  • PDF

Comparative Statistic Module (CSM) for Significant Gene Selection

  • Kim, Young-Jin;Kim, Hyo-Mi;Kim, Sang-Bae;Park, Chan;Kimm, Kuchan;Koh, InSong
    • Genomics & Informatics
    • /
    • 제2권4호
    • /
    • pp.180-183
    • /
    • 2004
  • Comparative Statistic Module(CSM) provides more reliable list of significant genes to genomics researchers by offering the commonly selected genes and a method of choice by calculating the rank of each statistical test based on the average ranking of common genes across the five statistical methods, i.e. t-test, Kruskal-Wallis (Wilcoxon signed rank) test, SAM, two sample multiple test, and Empirical Bayesian test. This statistical analysis module is implemented in Perl, and R languages.

DESIGN OF A CONTEXT ANALYSIS MODEL ON USN ENVIRONMENT

  • Jin, Cheng-Hao;Lee, Yong-Mi;Nam, Kwang-Woo;Lee, Jun-Wook;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.122-125
    • /
    • 2008
  • Sensors used in many USN (Ubiquitous Sensor Network) domain applications generate a large amount of sensor stream data. The volume of sensor stream data is too huge to store the whole data and data speed is too fast to control each of them. In order to provide rapid and reliable context analysis service over sensor stream data, we propose a WHEN-DO context analysis model that supports the functionality of sliding window. This model is designed to be used as follows: If the sensor stream data satisfies condition in 'WHEN' clause, then it will execute actions in 'DO' clause in WHEN-DO context analysis model. The proposed WHEN-DO context analysis model can be applied to many other USN environment applications such as monitoring the status of a building and then taking actions in corresponding context condition.

  • PDF

Genome data mining for everyone

  • Lee, Gir-Won;Kim, Sang-Soo
    • BMB Reports
    • /
    • 제41권11호
    • /
    • pp.757-764
    • /
    • 2008
  • The genomic sequences of a huge number of species have been determined. Typically, these genome sequences and the associated annotation data are accessed through Internet-based genome browsers that offer a user-friendly interface. Intelligent use of the data should expedite biological knowledge discovery. Such activity is collectively called data mining and involves queries that can be simple, complex, and even combinational. Various tools have been developed to make genome data mining available to computational and experimental biologists alike. In this mini-review, some tools that have proven successful will be introduced along with examples taken from published reports.

IVAG: An Integrative Visualization Application for Various Types of Genomic Data Based on R-Shiny and the Docker Platform

  • Lee, Tae-Rim;Ahn, Jin Mo;Kim, Gyuhee;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • 제15권4호
    • /
    • pp.178-182
    • /
    • 2017
  • Next-generation sequencing (NGS) technology has become a trend in the genomics research area. There are many software programs and automated pipelines to analyze NGS data, which can ease the pain for traditional scientists who are not familiar with computer programming. However, downstream analyses, such as finding differentially expressed genes or visualizing linkage disequilibrium maps and genome-wide association study (GWAS) data, still remain a challenge. Here, we introduce a dockerized web application written in R using the Shiny platform to visualize pre-analyzed RNA sequencing and GWAS data. In addition, we have integrated a genome browser based on the JBrowse platform and an automated intermediate parsing process required for custom track construction, so that users can easily build and navigate their personal genome tracks with in-house datasets. This application will help scientists perform series of downstream analyses and obtain a more integrative understanding about various types of genomic data by interactively visualizing them with customizable options.

차세대 염기서열 분석기법과 생물정보학 (Next Generation Sequencing and Bioinformatics)

  • 김기봉
    • 생명과학회지
    • /
    • 제25권3호
    • /
    • pp.357-367
    • /
    • 2015
  • 매우 빠른 속도로 발전하고 있는 차세대 염기서열 분석 플랫폼과 최신 생물정보학적 분석도구들로 말미암아, 1,000달러 이하의 가격으로 인간 유전체 염기서열을 해독하고자 하는 궁극적인 목표가 조만간 곧 실현될 수 있을 것 같다. 차세대 염기서열 분석 분야의 급속한 기술적 진전은 NGS 데이터의 분석과 관리를 위한 통계적 방법과 생물정보학적 분석도구들에 대한 수요를 꾸준히 증대시키고 있다. NGS 플랫폼이 상용화되어 쓰이기 시작한 초창기부터, NGS 데이터를 분석하고 해석하거나, 가시화 해주는 다수의 응용프로그램이나 도구들이 개발되어 활용되어 왔다. 그러나, NGS 데이터의 엄청난 범람으로 데이터 저장, 데이터 분석 및 관리 등에 있어서 해결해야 할 많은 문제들이 부각되고 있다. NGS 데이터 분석은 단편서열과 참조서열간의 서열정렬, 염기식별, 다형성 발견, 쌍단편 서열이나 비쌍단편 서열 등을 이용한 어셈블리 작업, 구조변이 발견, 유전체 브라우징 등을 본질적으로 포함한다. 본 논문은 주요 차세대 염기서열 결정기술과 NGS 데이터 분석을 위한 생물정보학적 분석도구들에 대해 개관적으로 소개하고자 한다.

Computational Chemistry as a Key to Structural Bioinformatics

  • Kang, Young-Kee
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.32-34
    • /
    • 2000
  • Computational chemistry is a discipline using computational methods for the calculation of molecular structure, properties, and reaction or for the simulation of molecular behavior. Relating and turning the complexity of data from genomics, high-throughput screening, combinatorial chemical synthesis, gene-expression investigations, pharmacogenomics, and proteomics into useful information and knowledge is the primary goal of bioinformatics. In particular, the structure-based molecular design is one of essential fields in bioinformatics and it can be called as structural bioinformatics. Therefore, the conformational analysis for proteins and peptides using the techniques of computational chemistry is expected to play a role in structural bioinformatics. There are two major computational methods for conformational analysis of proteins and peptides; one is the molecular orbital (MO) method and the other is the force field (or empirical potential function) method. The MO method can be classified into ab initio and semiempirical methods, which have been applied to relatively small and large molecules, respectively. However, the improvement in computer hardwares and softwares enables us to use the ab initio MO method for relatively larger biomolecules with up to v100 atoms or ∼800 basis functions. In order to show how computational chemistry can be used in structural bioinformatics, 1 will present on (1) cis-trans isomerization of proline dipeptide and its derivatives, (2) positional preference of proline in ${\alpha}$-helices, and (3) conformations and activities of Arg-Gly-Asp-containing tetrapeptides.

  • PDF