• Title/Summary/Keyword: Biological Data Mining

Search Result 68, Processing Time 0.025 seconds

The BIOWAY System: A Data Warehouse for Generalized Representation & Visualization of Bio-Pathways

  • Kim, Min Kyung;Seo, Young Joo;Lee, Sang Ho;Song, Eun Ha;Lee, Ho Il;Ahn, Chang Shin;Choi, Eun Chung;Park, Hyun Seok
    • Genomics & Informatics
    • /
    • v.2 no.4
    • /
    • pp.191-194
    • /
    • 2004
  • Exponentially increasing biopathway data in recent years provide us with means to elucidate the large-scale modular organization of the cell. Given the existing information on metabolic and regulatory networks, inferring biopathway information through scientific reasoning or data mining of large scale array data or proteomics data get great attention. Naturally, there is a need for a user-friendly system allowing the user to combine large and diverse pathway data sets from different resources. We built a data warehouse - BIOWAY - for analyzing and visualizing biological pathways, by integrating and customizing resources. We have collected many different types of data in regards to pathway information, including metabolic pathway data from KEGG/LIGAND, signaling pathway data from BIND, and protein information data from SWISS-PROT. In addition to providing general data retrieval mechanism, a successful user interface should provide convenient visualization mechanism since biological pathway data is difficult to conceptualize without graphical representations. Still, the visual interface in the previous systems, at best, uses static images only for the specific categorized pathways. Thus, it is difficult to cope with more complex pathways. In the BIOWAY system, all the pathway data can be displayed in computer generated graphical networks, rather than manually drawn image data. Furthermore, it is designed in such a way that all the pathway maps can be expanded or shrinked, by introducing the concept of super node. A subtle graphic layout algorithm has been applied to best display the pathway data.

Identifying literature-based significant genes and discovering novel drug indications on PPI network

  • Park, Minseok;Jang, Giup;Lee, Taekeon;Yoon, Youngmi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.3
    • /
    • pp.131-138
    • /
    • 2017
  • New drug development is time-consuming and costly. Hence, it is necessary to repurpose old drugs for finding new indication. We suggest the way that repurposing old drug using massive literature data and biological network. We supposed a disease-drug relationship can be available if signal pathways of the relationship include significant genes identified in literature data. This research is composed of three steps-identifying significant gene using co-occurrence in literature; analyzing the shortest path on biological network; and scoring a relationship with comparison between the significant genes and the shortest paths. Based on literatures, we identify significant genes based on the co-occurrence frequency between a gene and disease. With the network that include weight as possibility of interaction between genes, we use shortest paths on the network as signal pathways. We perform comparing genes that identified as significant gene and included on signal pathways, calculating the scores and then identifying the candidate drugs. With this processes, we show the drugs having new possibility of drug repurposing and the use of our method as the new method of drug repurposing.

Increasing Splicing Site Prediction by Training Gene Set Based on Species

  • Ahn, Beunguk;Abbas, Elbashir;Park, Jin-Ah;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.11
    • /
    • pp.2784-2799
    • /
    • 2012
  • Biological data have been increased exponentially in recent years, and analyzing these data using data mining tools has become one of the major issues in the bioinformatics research community. This paper focuses on the protein construction process in higher organisms where the deoxyribonucleic acid, or DNA, sequence is filtered. In the process, "unmeaningful" DNA sub-sequences (called introns) are removed, and their meaningful counterparts (called exons) are retained. Accurate recognition of the boundaries between these two classes of sub-sequences, however, is known to be a difficult problem. Conventional approaches for recognizing these boundaries have sought for solely enhancing machine learning techniques, while inherent nature of the data themselves has been overlooked. In this paper we present an approach which makes use of the data attributes inherent to species in order to increase the accuracy of the boundary recognition. For experimentation, we have taken the data sets for four different species from the University of California Santa Cruz (UCSC) data repository, divided the data sets based on the species types, then trained a preprocessed version of the data sets on neural network(NN)-based and support vector machine(SVM)-based classifiers. As a result, we have observed that each species has its own specific features related to the splice sites, and that it implies there are related distances among species. To conclude, dividing the training data set based on species would increase the accuracy of predicting splicing junction and propose new insight to the biological research.

EST Knowledge Integrated Systems (EKIS): An Integrated Database of EST Information for Research Application

  • Kim, Dae-Won;Jung, Tae-Sung;Choi, Young-Sang;Nam, Seong-Hyeuk;Kwon, Hyuk-Ryul;Kim, Dong-Wook;Choi, Han-Suk;Choi, Sang-Heang;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • v.7 no.1
    • /
    • pp.38-40
    • /
    • 2009
  • The EST Knowledge Integrated System, EKIS (http://ekis.kribb.re.kr), was established as a part of Korea's Ministry of Education, Science and Technology initiative for genome sequencing and application research of the biological model organisms (GEAR) project. The goals of the EKIS are to collect EST information from GEAR projects and make an integrated database to provide transcriptomic and metabolomic information for biological scientists. The EKIS constitutes five independent categories and several retrieval systems in each category for incorporating massive EST data from high-throughput sequencing of 65 different species. Through the EKIS database, scientists can freely access information including BLAST functional annotation as well as Genechip and pathway information for KEGG. By integrating complex data into a framework of existing EST knowledge information, the EKIS provides new insights into specialized metabolic pathway information for an applied industrial material.

MAPPING WETLANDS AND FLOODS IN THE TONLE SAP BASIN, CAMBODIA, USING AIRSAR DATA

  • Milne, A.K.;Tapley, I.J.
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.441-441
    • /
    • 2002
  • In order to ensure a balance between economic development and a healthy Mekong Basin environment supporting natural resources diversity and productivity critical to the livelihood of its 65 million inhabitants, the Mekong River Commission (MRC) has been investigating the use of radar to remotely characterize and monitor the diversity, complexity, size and connectivity of the Basin's aquatic habitats. The PACRIM AIRSAR Mission provided an opportunity to evaluate the usefulness of radar technology to derive information for assessing, forecasting and mitigating possible cumulative and long-term impacts of development on the natural environment and the people's livelihood. This paper presents the results of mapping wetland cover types using multi-polarimetric radar for an area of the north-western corner of the Tonle Sap basin with data acquired from the AIRSAR Mission in September 2000. The implementation of a newly developed segmentation classification routine used to derive the image classification is described and the results of a fieldwork campaign to check the classification is presented.

  • PDF

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model (텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.231-250
    • /
    • 2014
  • Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

Co-occurrence Based Drug-disease Relationship Inference with Genes as Mediators (유전자를 중간 매개로 고려한 동시발생 기반의 약물-질병 관계 추론)

  • Shin, Sangwon;Sin, Yeeun;Jang, Giup;Yoo, Youngmi
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.11
    • /
    • pp.1-9
    • /
    • 2018
  • Drug repositioning is to discover new uses of drugs. Text mining derives knowledge from unstructured text. We propose a method to predict new drug-disease relationships by taking into account the rate of frequency of genes simultaneously measured in disease-gene and gene-drug. Co-occurrence of drug-gene and gene-disease in the biological literature is counted and calculate the rate of the gene for each drug and disease. Weights of drug-disease relationships are calculated using the average of the rates of genes that are measured and used to measure the accuracy for each disease. In measuring drug-disease relationships, a more accurate identification of relationships was shown by measuring the frequency on a sentence and considering multiple relationships than existing method.

PACRIM SCIENCE APPLICATIONS: A DECADE WITH AIRSAR

  • Milne, A.K.;Tapley, I.J.
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.428-428
    • /
    • 2002
  • The scientific objectives of PACRIM (Pacific Rim) are to advance the understanding of polarimetric and interferometric radar and to promote its application in environmental research designed to detect and quantify changes found in both the physical and humanly dominated ecosystems on the earth's surface. The information derived is used to more readily identify environments at risk; improve environmental decision making and the management of resources and thereby lead to the implementation of more effective and sustainable land use practices. PACRIM is a collaborative research project was organized by NASA's Mission to Planet Earth, Airborne Sciences Program; the Jet Propulsion Laboratory; CSIRO-COSSA and the Centre for Remote Sensing and GIS at the University of New South Wales. A decade of working with AIRSAR data (1993-2003) in the Australia-Asian-Pacific region has provided the opportunity for more than 400 investigators from 20 countries to collect, analyse, interpret and apply state-of-the-art radar data to earth-science studies. This has been achieved by scientists working within seven broad research themes; o Forestry and vegetation o Geology and tectonic processes o Interferometry o Disaster management o Coastal analysis o Agriculture o Urban and regional development. This paper presents an overview of the three data acquisition missions (1993,1996 and 2000) and the science research outcomes achieved from analyzing high quality radar data.

  • PDF

Implementing Biological Network Analysis System through Oriental Medical Literature Analysis (한의학 분야 문헌 분석을 통한 생물학적 네트워크 분석시스템 개발)

  • Yu, Seok Jong;Cho, Yongseong;Lee, Junehawk;Seo, Dongmin;Yea, Sang-Jun;Kim, Chul
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.10
    • /
    • pp.616-625
    • /
    • 2015
  • Currently, oriental medicine research is focused with modern research technology and validate it's various biochemical effect by combining with molecular biology technology. But there are few searching system for finding biochemical mechanism which is related to major compounds in oriental medicine. In this research, we aimed developing korean herb database based on text-mining system by analyzing PubMed data. We have developed prototype system for searching chemical, gene and biological relation in oriental medicine. It is characterized by modern oriental medicine research trend with major chemical, gene and protein information. Analysis results can be searched on the prototype system with visualization of the biological interactions.