• Title/Summary/Keyword: Gene Algorithm

Search Result 232, Processing Time 0.026 seconds

Variable Selection of Feature Pattern using SVM-based Criterion with Q-Learning in Reinforcement Learning (SVM-기반 제약 조건과 강화학습의 Q-learning을 이용한 변별력이 확실한 특징 패턴 선택)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.20 no.4
    • /
    • pp.21-27
    • /
    • 2019
  • Selection of feature pattern gathered from the observation of the RNA sequencing data (RNA-seq) are not all equally informative for identification of differential expressions: some of them may be noisy, correlated or irrelevant because of redundancy in Big-Data sets. Variable selection of feature pattern aims at differential expressed gene set that is significantly relevant for a special task. This issues are complex and important in many domains, for example. In terms of a computational research field of machine learning, selection of feature pattern has been studied such as Random Forest, K-Nearest and Support Vector Machine (SVM). One of most the well-known machine learning algorithms is SVM, which is classical as well as original. The one of a member of SVM-criterion is Support Vector Machine-Recursive Feature Elimination (SVM-RFE), which have been utilized in our research work. We propose a novel algorithm of the SVM-RFE with Q-learning in reinforcement learning for better variable selection of feature pattern. By comparing our proposed algorithm with the well-known SVM-RFE combining Welch' T in published data, our result can show that the criterion from weight vector of SVM-RFE enhanced by Q-learning has been improved by an off-policy by a more exploratory scheme of Q-learning.

Ovarian Cancer Microarray Data Classification System Using Marker Genes Based on Normalization (표준화 기반 표지 유전자를 이용한 난소암 마이크로어레이 데이타 분류 시스템)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.9
    • /
    • pp.2032-2037
    • /
    • 2011
  • Marker genes are defined as genes in which the expression level characterizes a specific experimental condition. Such genes in which the expression levels differ significantly between different groups are highly informative relevant to the studied phenomenon. In this paper, first the system can detect marker genes that are selected by ranking genes according to statistics after normalizing data with methods that are the most widely used among several normalization methods proposed the while, And it compare and analyze a performance of each of normalization methods with mult-perceptron neural network layer. The Result that apply Multi-Layer perceptron algorithm at Microarray data set including eight of marker gene that are selected using ANOVA method after Lowess normalization represent the highest classification accuracy of 99.32% and the lowest prediction error estimate.

Automated Generation of Optimal Security Defense Strategy using Simulation-based Evolutionary Techniques (시뮬레이션 기반 진화기법을 이용한 최적 보안 대응전략 자동생성)

  • Lee, Jang-Se;Hwang, Hun-Gyu;Yun, Jin-Sik;Park, Geun-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.11
    • /
    • pp.2514-2520
    • /
    • 2010
  • The objective of this paper is to propose the methodology for automated generation of the optimal security defense strategies using evolutionary techniques. As damages by penetration exploiting vulnerability in computer systems and networks are increasing, security techniques have been researched actively. However it is difficult to generate optimal defense strategies because it needs to consider various situations on network environment according to countermeasures. Thus we have adopted a genetic algorithm in order to generate an optimal defense strategy as combination of countermeasures. We have represented gene information with countermeasures. And by using simulation technique, we have evaluated fitness through evaluating the vulnerability of system having applied various countermeasures. Finally, we have examined the feasibility by experiments on the system implemented by proposed method.

Design of the System and Algorithm for the Pattern Analysis of the Bio-Data (바이오 데이터 패턴 분석을 위한 시스템 및 알고리즘 설계)

  • Song, Young-Ohk;Kim, Sung-Young;Chang, Duk-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.8
    • /
    • pp.104-110
    • /
    • 2010
  • In the field of biotechnology, computer can play varied roles such as the ordinal analysis, ordianl comparison, nutation tracing, analogy comparison for drug design, estimation of protein function, cell mechanism, and verifying the role of a gene for preventing diseases. Additionally, by constructing database, it can provide an application for the cloning process in other data researches, and be used as a basis for the comparative genetics. For the most of researcher about biotechnology, they need to use the tool that can do all of job above. This study is focused on looking into problems of existing systems to analysis bio data, and designing an improved analyzing system that can propose a solution. In additional, it has been considered to improve the performance of each constituent, and all the constituents, which have been separately processed, are combind in a single system to get over old problems of the existing system.

Assessment of the crest cracks of the Pubugou rockfill dam based on parameters back analysis

  • Zhou, Wei;Li, Shao-Lin;Ma, Gang;Chang, Xiao-Lin;Cheng, Yong-Gang;Ma, Xing
    • Geomechanics and Engineering
    • /
    • v.11 no.4
    • /
    • pp.571-585
    • /
    • 2016
  • The crest of the Pubugou central core rockfill dam (CCRD) cracked in the first and second impounding periods. To evaluate the safety of the Pubugou CCRD, an inversion analysis of the constitutive model parameters for rockfill materials is performed based on the in situ deformation monitoring data. The aim of this work is to truly reflect the deformation state of the Pubugou CCRD and determine the causes of the dam crest cracks. A novel real-coded genetic algorithm based upon the differences in gene fragments (DGFX) is proposed. It is used in combination with the radial based function neural network (RBFNN) to perform the parameters back analysis. The simulated settlements show good agreements with the monitoring data, illustrating that the back analysis is reasonable and accurate. Furthermore, the deformation gradient of the dam crest has been analysed. The dam crest has a great possibility of cracking due to the uncoordinated deformation, which agrees well with the field investigation. The deformation gradient decreases to the value lower than the critical one and reaches a stable state after the second full reservoir.

Haplotype Analysis and Single Nucleotide Polymorphism Frequency of PEPT1 Gene (Exon 5 and 16) in Korean (한국인에 있어서 PEPT1 유전자(exon 5 및 16)의 단일염기변이 빈도 및 일배체형 분석)

  • Kim, Se-Mi;Lee, Sang-No;Kang, Hyun-Ah;Cho, Hea-Young;Lee, Il-Kwon;Lee, Yong-Bok
    • Journal of Pharmaceutical Investigation
    • /
    • v.39 no.6
    • /
    • pp.411-416
    • /
    • 2009
  • The aim of this study was to investigate the frequency of the SNPs on PEPT1 exon 5 and 16 and to analyze haplotype frequency on PEPT1 exon 5 and 16 in Korean population. A total of 519 healthy subjects was genotyped for PEPT1, using pyrosequencing analysis and polymerase chain reaction-based diagnostic tests. Haplotype was statistically inferred using an algorithm based on the expectation-maximization (EM). PEPT1 exon 5 G381A genotyping revealed that the frequency for homozygous wild-type (G/G), heterozygous (G/A) and homozygous mutant-type (A/A) was 30.4, 53.4 and 16.2%, respectively. PEPT1 exon 16 G1287C genotyping revealed that the frequency for homozygous G/G, heterozygous G/C and homozygous C/C type was 88.8, 10.0 and 1.2%, respectively. Based on these genotype data, haplotype analysis between PEPT1 exon 5 G381A and exon 16 G1287C using HapAnalyzer and PL-EM has proceeded. The result has revealed that linkage disequilibrium between alleles is not obvious (|D'|=0.3667).

Bayesian Survival Analysis of High-Dimensional Microarray Data for Mantle Cell Lymphoma Patients

  • Moslemi, Azam;Mahjub, Hossein;Saidijam, Massoud;Poorolajal, Jalal;Soltanian, Ali Reza
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.1
    • /
    • pp.95-100
    • /
    • 2016
  • Background: Survival time of lymphoma patients can be estimated with the help of microarray technology. In this study, with the use of iterative Bayesian Model Averaging (BMA) method, survival time of Mantle Cell Lymphoma patients (MCL) was estimated and in reference to the findings, patients were divided into two high-risk and low-risk groups. Materials and Methods: In this study, gene expression data of MCL patients were used in order to select a subset of genes for survival analysis with microarray data, using the iterative BMA method. To evaluate the performance of the method, patients were divided into high-risk and low-risk based on their scores. Performance prediction was investigated using the log-rank test. The bioconductor package "iterativeBMAsurv" was applied with R statistical software for classification and survival analysis. Results: In this study, 25 genes associated with survival for MCL patients were identified across 132 selected models. The maximum likelihood estimate coefficients of the selected genes and the posterior probabilities of the selected models were obtained from training data. Using this method, patients could be separated into high-risk and low-risk groups with high significance (p<0.001). Conclusions: The iterative BMA algorithm has high precision and ability for survival analysis. This method is capable of identifying a few predictive variables associated with survival, among many variables in a set of microarray data. Therefore, it can be used as a low-cost diagnostic tool in clinical research.

Linear-Time Search in Suffix Arrays (접미사 배열을 이용한 선형시간 탐색)

  • Sin Jeong SeoP;Kim Dong Kyue;Park Heejin;Park Kunsoo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.5
    • /
    • pp.255-259
    • /
    • 2005
  • To search a pattern P in a text, such index data structures as suffix trees and suffix arrays are widely used in diverse applications of string processing and computational biology. It is well known that searching in suffix trees is faster than suffix ways in the aspect of time complexity, i.e., it takes O(${\mid}P{\mid}$) time to search P on a constant-size alphabet in a suffix tree while it takes O(${\mid}P{\mid}+logn$) time in a suffix way where n is the length of the text. In this paper we present a linear-tim8 search algorithm in suffix arrays for constant-size alphabets. For a gene.al alphabet $\Sigma$, it takes O(${\mid}P{\mid}log{\mid}{\Sigma}{\mid}$) time.

Implementation of the Image Processing Algorithm for HPV DNA chip (HPV DNA 칩의 영상처리 알고리즘 구현)

  • 김종대;연석희;이용업;김종원
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.8C
    • /
    • pp.803-810
    • /
    • 2003
  • This paper addresses an image processing technique for the human papillomavirus (HPV) DNA chip to discriminate whether the probes are hybridized with the target DNA. HPV DNA chip is designed to determine HPV gene-types by using DNA probes for 22 HPV types. In addition to the probes, the HPV DNA chip has markers that always react with the sample DNA. The positions of probe-dots in the final scanned image are fixed relative to the marker- dot locations with a small variation attributable to the accuracy of the dotter and the scanner. The probes are quadruplicated to enhance diagnostic fidelity. frier knowledge including the marker relative distance and the replication information of probes is integrated into the template matching technique with normalized covariance measure. It was demonstrated that the employment of both of the prior knowledges can be accomplished by simply averaging the template matching measures over the positions of the markers and probes. The resulting proposed scheme yields stable marker locating and probe classification.

Investigation of Conserved Genes in Eukaryotes Common to Prokaryotes (원핵생물과 공통인 진핵생물의 보존적 유전자 탐색)

  • Lee, Dong-Geun
    • Journal of Life Science
    • /
    • v.23 no.4
    • /
    • pp.595-601
    • /
    • 2013
  • The clusters of orthologous groups of proteins (COG) algorithm was applied to identify essential proteins in eukaryotes and to measure the degree of conservation. Sixty-three orthologous groups, which were conserved in 66 microbial genomes, enlarged to 104 eukaryotic orthologous groups (KOGs) and 71 KOGs were conserved at the nuclear genome of 7 eucaryotes. Fifty-four of 71 translation-related genes were conserved, highlighting the importance of proteins in modern organisms. Translation initiation factors (KOG0343, KOG3271) and prolyl-tRNA synthetase (KOG4163) showed high conservation based on the distance value analysis. The genes of Caenorhabditis elegans appear to harbor high genetic variation because the genome showed the highest variation at 71 conserved proteins among 7 genomes. The 71 conserved genes will be valuable in basic and applied research, for example, targeting for antibiotic development.