• Title/Summary/Keyword: 생물학적 정보

Search Result 668, Processing Time 0.03 seconds

Cluster Analysis of SNPs with Entropy Distance and Prediction of Asthma Type Using SVM (엔트로피 거리와 SVM를 이용한 SNP 군집분석과 천식 유형 예측)

  • Lee, Jung-Seob;Shin, Ki-Seob;Wee, Kyu-Bum
    • The KIPS Transactions:PartB
    • /
    • v.18B no.2
    • /
    • pp.67-72
    • /
    • 2011
  • Single nucleotide polymorphisms (SNPs) are a very important tool for the study of human genome structure. Cluster analysis of the large amount of gene expression data is useful for identifying biologically relevant groups of genes and for generating networks of gene-gene interactions. In this paper we compared the clusters of SNPs within asthma group and normal control group obtained by using hierarchical cluster analysis method with entropy distance. It appears that the 5-cluster collections of the two groups are significantly different. We searched the best set of SNPs that are useful for diagnosing the two types of asthma using representative SNPs of the clusters of the asthma group. Here support vector machines are used to evaluate the prediction accuracy of the selected combinations. The best combination model turns out to be the five-locus SNPs including one on the gene ALOX12 and their accuracy in predicting aspirin tolerant asthma disease risk among asthmatic patients is 66.41%.

Building a Classifier for Integrated Microarray Datasets through Two-Stage Approach (2 단계 접근법을 통한 통합 마이크로어레이 데이타의 분류기 생성)

  • Yoon, Young-Mi;Lee, Jong-Chan;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.46-58
    • /
    • 2007
  • Since microarray data acquire tens of thousands of gene expression values simultaneously, they could be very useful in identifying the phenotypes of diseases. However, the results of analyzing several microarray datasets which were independently carried out with the same biological objectives, could turn out to be different. One of the main reasons is attributable to the limited number of samples involved in one microarry experiment. In order to increase the classification accuracy, it is desirable to augment the sample size by integrating and maximizing the use of independently-conducted microarray datasets. In this paper, we propose a novel two-stage approach which firstly integrates individual microarray datasets to overcome the problem caused by limited number of samples, and identifies informative genes, secondly builds a classifier using only the informative genes. The classifier from large samples by integrating independent microarray datasets achieves high accuracy up to 24.19% increase as against other comparison methods, sensitivity, and specificity on independent test sample dataset.

Cloning and Characterization of Zebrafish Microsomal Epoxide Hydrolase Based on Bioinformatics (생물정보학을 이용한 Zebrafish Microsomal Epoxide Hydrolase 클로닝 및 특성연구)

  • Lee Eun-Yeol;Kim Hee-Sook
    • Microbiology and Biotechnology Letters
    • /
    • v.34 no.2
    • /
    • pp.129-135
    • /
    • 2006
  • A gene encoding for a putative microsomal epoxide hydrolase (mEH) of a zebrafish, Danio rerio, was cloned and characterized. The putative mEH protein of D. rerio exhibited sequence similarity with mammalian mEH and some other bacterial EHs. A structural model for the putative mEH was constructed using homology modeling based on the crystallographic templates, 1 qo7 and 1 ehy. The catalytic triad consisting of $Asp^{233}$, $Glu^{413}$, and $His^{440}$ was identified, and the characteristic features such as two tyrosine residues and oxyanion hole were found to be highly conserved. Based on bioinformatic analysis together with EH activity assay, the putative protein was annotated as mEH of D. rerio. Enantiopure styrene oxide with enantiopurity of 99%ee and yield of 33.5% was obtained from racemic styrene oxide by the enantioselective hydrolysis activity of recombinant mEH of D. rerio for 45 min.

A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction (효율적 구조 학습 알고리즘과 데이타 차원축소를 통한 베이지안망 기반의 마이크로어레이 데이타 분석법)

  • 황규백;장정호;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.775-784
    • /
    • 2002
  • Microarray data, obtained from DNA chip technologies, is the measurement of the expression level of thousands of genes in cells or tissues. It is used for gene function prediction or cancer diagnosis based on gene expression patterns. Among diverse methods for data analysis, the Bayesian network represents the relationships among data attributes in the form of a graph structure. This property enables us to discover various relations among genes and the characteristics of the tissue (e.g., the cancer type) through microarray data analysis. However, most of the present microarray data sets are so sparse that it is difficult to apply general analysis methods, including Bayesian networks, directly. In this paper, we harness an efficient structural learning algorithm and data dimensionality reduction in order to analyze microarray data using Bayesian networks. The proposed method was applied to the analysis of real microarray data, i.e., the NC160 data set. And its usefulness was evaluated based on the accuracy of the teamed Bayesian networks on representing the known biological facts.

Inferring Disease-related Genes using Title and Body in Biomedical Text (생물학 문헌 데이터의 제목과 본문을 이용한 질병 관련 유전자 추론 방법)

  • Kim, Jeongwoo;Kim, Hyunjin;Yeo, Yunku;Shin, Mincheol;Park, Sanghyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.28-36
    • /
    • 2017
  • After the genome projects of the 90s, a vast number of gene studies have been stored in online databases. By using these databases, several biological relationships can be inferred. In this study, we proposed a method to infer disease-gene relationships using title and body in biomedical text. The title was used to extract hub genes from data in the literature; whereas, the body of the literature was used to extract sub genes that are related to hub genes. Through these steps, we were able to construct a local gene-network for each report in the literature. By integrating the local gene-networks, we then constructed a global gene-network. Subsequent analyses of the global gene-network allowed inference of disease-related genes with high rank. We validated the proposed method by comparing with previous methods. The results indicated that the proposed method is a meaningful approach to infer disease-related genes.

A Performance Comparison of Multi-Label Classification Methods for Protein Subcellular Localization Prediction (단백질의 세포내 위치 예측을 위한 다중레이블 분류 방법의 성능 비교)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.4
    • /
    • pp.992-999
    • /
    • 2014
  • This paper presents an extensive experimental comparison of a variety of multi-label learning methods for the accurate prediction of subcellular localization of proteins which simultaneously exist at multiple subcellular locations. We compared several methods from three categories of multi-label classification algorithms: algorithm adaptation, problem transformation, and meta learning. Experimental results are analyzed using 12 multi-label evaluation measures to assess the behavior of the methods from a variety of view-points. We also use a new summarization measure to find the best performing method. Experimental results show that the best performing methods are power-set method pruning a infrequently occurring subsets of labels and classifier chains modeling relevant labels with an additional feature. futhermore, ensembles of many classifiers of these methods enhance the performance further. The recommendation from this study is that the correlation of subcellular locations is an effective clue for classification, this is because the subcellular locations of proteins performing certain biological function are not independent but correlated.

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.

The Training Data Generation and a Technique of Phylogenetic Tree Generation using Decision Tree (트레이닝 데이터 생성과 의사 결정 트리를 이용한 계통수 생성 방법)

  • Chae, Deok-Jin;Sin, Ye-Ho;Cheon, Tae-Yeong;Go, Heung-Seon;Ryu, Geun-Ho;Hwang, Bu-Hyeon
    • The KIPS Transactions:PartD
    • /
    • v.10D no.6
    • /
    • pp.897-906
    • /
    • 2003
  • The traditional animal phylogenetic tree is to align the body structure of the animal phylums from simple to complex based on the initial development character. Currently, molecular systematics research based on the molecular, it is on the fly, is again estimating prior trend and show the new genealogy and interest of the evolution. In this paper, we generate the training set which is obtained from a DNA sequence ans apply to the classification. We made use of the mitochondrial DNA for the experiment, and then proved the accuracy using the MEGA program which is anaysis program, it is used in the biology field. Although the result of the mining has to proved through biological experiment, it can provede the methodology for the efficient classify and can reduce the time and effort to the experiment.

An Efficient Method to Find Accurate Spot-matching Patterns in Protein 2-DE Image Analysis (단백질 2-DE 이미지 분석에서 정확한 스팟 매칭 패턴 검색을 위한 효과적인 방법)

  • Jin, Yan-Hua;Lee, Won-Suk
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.551-555
    • /
    • 2010
  • In protein 2-DE image analysis, the accuracy of spot-matching operation which identifies the spot of the same protein in each 2-DE gel image is intensively influenced by the errors caused by the various experimental conditions. This paper proposes an efficient method to find more accurate spot-matching patterns based on multiple reference gel images in spot-matching pattern analysis in protein 2-DE image analysis. Additionally, in order to improve the reduce the execution time which is increased exponentially along with the increasing number of gel images, a "partition then extension" framework is used to find spot-matching pattern of long length and of higher accuracy. In the experiments on real 2-DE images of human liver tissue are used to confirm the accuracy and the efficiency of the proposed algorithm.

Development of LED Irradiation System for Cell Proliferation (세포증식을 위한 LED 조사 시스템 개발)

  • Cheon, Min-Woo;Park, Yong-Pil;Lee, Ho-Shik;Kim, Tae-Gon;Kim, Young-Pyo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.581-582
    • /
    • 2010
  • This paper performed the basic study for developing the Photodynamic Therapy Equipment for medical treatment. We developed the equipment palpating cell proliferation using a high brightness LED. This equipment was fabricated using a micro-controller and a high brightness LED, and designed to enable us to control light irradiation time, intensity, frequency and so on. Especially, to control the light irradiation frequency, FPGA was used, and to control the change of output value, TLC5941 was used. Control stage is divided into 30 step by program. Consequently, the current value could be controlled by the change of level in Continue Wave(CW) and Pulse Width Modulation(PWM), and the output of a high brightness LED could be controlled stage by stage.

  • PDF