• 제목/요약/키워드: Bioinformatics data

검색결과 645건 처리시간 0.029초

An Efficient Algorithm for Mining Frequent Sequences In Spatiotemporal Data

  • ;지정희;류근호
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 한국공간정보시스템학회 2005년도 추계학술대회
    • /
    • pp.61-66
    • /
    • 2005
  • Spatiotemporal data mining represents the confluence of several fields including spatiotemporal databases, machine loaming, statistics, geographic visualization, and information theory. Exploration of spatial data mining and temporal data mining has received much attention independently in knowledge discovery in databases and data mining research community. In this paper, we introduce an algorithm Max_MOP for discovering moving sequences in mobile environment. Max_MOP mines only maximal frequent moving patterns. We exploit the characteristic of the problem domain, which is the spatiotemporal proximity between activities, to partition the spatiotemporal space. The task of finding moving sequences is to consider all temporally ordered combination of associations, which requires an intensive computation. However, exploiting the spatiotemporal proximity characteristic makes this task more cornputationally feasible. Our proposed technique is applicable to location-based services such as traffic service, tourist service, and location-aware advertising service.

  • PDF

Inference of Gene Regulatory Networks via Boolean Networks Using Regression Coefficients

  • Kim, Ha-Seong;Choi, Ho-Sik;Lee, Jae-K.;Park, Tae-Sung
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.339-343
    • /
    • 2005
  • Boolean networks(BN) construction is one of the commonly used methods for building gene networks from time series microarray data. However, BN has two major drawbacks. First, it requires heavy computing times. Second, the binary transformation of the microarray data may cause a loss of information. This paper propose two methods using liner regression to construct gene regulatory networks. The first proposed method uses regression based BN variable selection method, which reduces the computing time significantly in the BN construction. The second method is the regression based network method that can flexibly incorporate the interaction of the genes using continuous gene expression data. We construct the network structure from the simulated data to compare the computing times between Boolean networks and the proposed method. The regression based network method is evaluated using a microarray data of cell cycle in Caulobacter crescentus.

  • PDF

DESIGN AND IMPLEMENTATION OF MULTIMEDIA MATADATA MANAGEMENT SYSTEM FOR HETEROGENOUS SOURCES

  • Park, Seong-Kyu;Lee, Yang-Koo;Chai, Duck-Jin;Kim, Hi-Seok;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.398-401
    • /
    • 2008
  • With the advance of internet and computer processing technique, users can easily access and use the multimedia contents involving the various pictures, videos and audios information. And users request more convenient and accurate multimedia services. In these environments, it is difficulties to integrate and manage metadata standards because there are various standards in multimedia applications according to types of services and data formats individually. In this paper, we design and implement the multimedia metadata management system for integrating from heterogeneous sources. In our system, we managed heterogeneous metadata by integrating to unified schema using mapping table. Through proposed system, users can search multimedia data easily without considering variety of application services.

  • PDF

생명정보학과 유전체의학 (Bioinformatics and Genomic Medicine)

  • 김주한
    • Journal of Preventive Medicine and Public Health
    • /
    • 제35권2호
    • /
    • pp.83-91
    • /
    • 2002
  • Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolutions both in bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. The paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization, primary pattern analysis, and machine learning algorithms will be presented. Use of integrated biochip informatics technologies, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

Accuracy of Imputation of Microsatellite Markers from BovineSNP50 and BovineHD BeadChip in Hanwoo Population of Korea

  • Sharma, Aditi;Park, Jong-Eun;Park, Byungho;Park, Mi-Na;Roh, Seung-Hee;Jung, Woo-Young;Lee, Seung-Hwan;Chai, Han-Ha;Chang, Gul-Won;Cho, Yong-Min;Lim, Dajeong
    • Genomics & Informatics
    • /
    • 제16권1호
    • /
    • pp.10-13
    • /
    • 2018
  • Until now microsatellite (MS) have been a popular choice of markers for parentage verification. Recently many countries have moved or are in process of moving from MS markers to single nucleotide polymorphism (SNP) markers for parentage testing. FAO-ISAG has also come up with a panel of 200 SNPs to replace the use of MS markers in parentage verification. However, in many countries most of the animals were genotyped by MS markers till now and the sudden shift to SNP markers will render the data of those animals useless. As National Institute of Animal Science in South Korea plans to move from standard ISAG recommended MS markers to SNPs, it faces the dilemma of exclusion of old animals that were genotyped by MS markers. Thus to facilitate this shift from MS to SNPs, such that the existing animals with MS data could still be used for parentage verification, this study was performed. In the current study we performed imputation of MS markers from the SNPs in the 500-kb region of the MS marker on either side. This method will provide an easy option for the labs to combine the data from the old and the current set of animals. It will be a cost efficient replacement of genotyping with the additional markers. We used 1,480 Hanwoo animals with both the MS data and SNP data to impute in the validation animals. We also compared the imputation accuracy between BovineSNP50 and BovineHD BeadChip. In our study the genotype concordance of 40% and 43% was observed in the BovineSNP50 and BovineHD BeadChip respectively.

CONSTRUCTING GENE REGULATORY NETWORK USING FREQUENT GENE EXPRESSION PATTERN MINING AND CHAIN RULES

  • Park, Hong-Kyu;Lee, Heon-Gyu;Cho, Kyung-Hwan;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2006년도 Proceedings of ISRS 2006 PORSEC Volume II
    • /
    • pp.623-626
    • /
    • 2006
  • Group of genes controls the functioning of a cell by complex interactions. These interacting gene groups are called Gene Regulatory Networks (GRNs). Two previous data mining approaches, clustering and classification have been used to analyze gene expression data. While these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rule. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and detect gene expression patterns applying FP-growth algorithm. And then, we construct gene regulatory network from frequent gene patterns using chain rule. Finally, we validated our proposed method by showing that our experimental results are consistent with published results.

  • PDF

IMPLEMENTATION OF SUBSEQUENCE MAPPING METHOD FOR SEQUENTIAL PATTERN MINING

  • Trang, Nguyen Thu;Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2006년도 Proceedings of ISRS 2006 PORSEC Volume II
    • /
    • pp.627-630
    • /
    • 2006
  • Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

  • PDF

Implementation of Subsequence Mapping Method for Sequential Pattern Mining

  • Trang Nguyen Thu;Lee Bum-Ju;Lee Heon-Gyu;Park Jeong-Seok;Ryu Keun-Ho
    • 대한원격탐사학회지
    • /
    • 제22권5호
    • /
    • pp.457-462
    • /
    • 2006
  • Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

의료보건을 위한 의료정보처리에 관한 연구 (A Syudy on the Biomedical Information Processing for Biomedicine and Healthcare)

  • 정현철;박병전;배상현
    • 통합자연과학논문집
    • /
    • 제2권4호
    • /
    • pp.243-251
    • /
    • 2009
  • This paper surveys some researches to accomplish on bioinformatics. These researches wish to propose a database architecture combining a general view of bioinformatics data as a graph of data objects and data relationships, with the efficiency and robustness of data management and query provided by indexing and generic programming techniques. Here, these invert the role of the index, and make it a first-class citizen in the query language. It is possible to do this in a structured way, allowing users to mention indexes explicitly without yielding to a procedural query model, by converting functional relations into explicit functions. In the limit, the database becomes a graph, in which the edges are these indexes. Function composition can be specified either explicitly or implicitly as path queries. The net effect of the inversion is to convert the database into a hyperdatabase: a database of databases, connected by indexes or functions. The inversion approach was motivated by their work in biological databases, for which hyperdatabases are a good model. The need for a good model has slowed progress in bioinformatics.

  • PDF

AMR 데이터에서의 전력 부하 패턴 분류 (Power Load Pattern Classification from AMR Data)

  • ;박진형;이헌규;신진호;류근호
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2008년도 춘계학술발표대회
    • /
    • pp.231-234
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in load demand data. The main aim of our work is to forecast customers' contract information from capacity of daily power consumption patterns. According to the result, we try to evaluate the contract information's suitability. The proposed our approach consists of three stages: (i) data preprocessing: noise or outlier is detected and removed (ii) cluster analysis: SOMs clustering is used to create load patterns and the representative load profiles and (iii) classification: we applied the K-NNs classifier in order to predict the customers' contract information base on power consumption patterns. According to the our proposed methodology, power load measured from AMR(automatic meter reading) system, as well as customer indexes, were used as inputs. The output was the classification of representative load profiles (or classes). Lastly, in order to evaluate KNN classification technique, the proposed methodology was applied on a set of high voltage customers of the Korea power system and the results of our experiments was presented.