• Title/Summary/Keyword: Gene prediction

Search Result 297, Processing Time 0.03 seconds

Computational Approaches to Gene Prediction

  • Do Jin-Hwan;Choi Dong-Kug
    • Journal of Microbiology
    • /
    • v.44 no.2
    • /
    • pp.137-144
    • /
    • 2006
  • The problems associated with gene identification and the prediction of gene structure in DNA sequences have been the focus of increased attention over the past few years with the recent acquisition by large-scale sequencing projects of an immense amount of genome data. A variety of prediction programs have been developed in order to address these problems. This paper presents a review of the computational approaches and gene-finders used commonly for gene prediction in eukaryotic genomes. Two approaches, in general, have been adopted for this purpose: similarity-based and ab initio techniques. The information gleaned from these methods is then combined via a variety of algorithms, including Dynamic Programming (DP) or the Hidden Markov Model (HMM), and then used for gene prediction from the genomic sequences.

Introduction to Gene Prediction Using HMM Algorithm

  • Kim, Keon-Kyun;Park, Eun-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.489-506
    • /
    • 2007
  • Gene structure prediction, which is to predict protein coding regions in a given nucleotide sequence, is the most important process in annotating genes and greatly affects gene analysis and genome annotation. As eukaryotic genes have more complicated structures in DNA sequences than those of prokaryotic genes, analysis programs for eukaryotic gene structure prediction have more diverse and more complicated computational models. There are Ab Initio method, Similarity-based method, and Ensemble method for gene prediction method for eukaryotic genes. Each Method use various algorithms. This paper introduce how to predict genes using HMM(Hidden Markov Model) algorithm and present the process of gene prediction with well-known gene prediction programs.

  • PDF

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

Feasibility study of deep learning based radiosensitivity prediction model of National Cancer Institute-60 cell lines using gene expression

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • v.54 no.4
    • /
    • pp.1439-1448
    • /
    • 2022
  • Background: We investigated the feasibility of in vitro radiosensitivity prediction with gene expression using deep learning. Methods: A microarray gene expression of the National Cancer Institute-60 (NCI-60) panel was acquired from the Gene Expression Omnibus. The clonogenic surviving fractions at an absorbed dose of 2 Gy (SF2) from previous publications were used to measure in vitro radiosensitivity. The radiosensitivity prediction model was based on the convolutional neural network. The 6-fold cross-validation (CV) was applied to train and validate the model. Then, the leave-one-out cross-validation (LOOCV) was applied by using the large-errored samples as a validation set, to determine whether the error was from the high bias of the folded CV. The criteria for correct prediction were defined as an absolute error<0.01 or a relative error<10%. Results: Of the 174 triplicated samples of NCI-60, 171 samples were correctly predicted with the folded CV. Through an additional LOOCV, one more sample was correctly predicted, representing a prediction accuracy of 98.85% (172 out of 174 samples). The average relative error and absolute errors of 172 correctly predicted samples were 1.351±1.875% and 0.00596±0.00638, respectively. Conclusion: We demonstrated the feasibility of a deep learning-based in vitro radiosensitivity prediction using gene expression.

An XML-Based Analysis Tool for Gene Prediction Results (XML기반의 유전자 예측결과 분석도구)

  • Kim Jin-Hong;Byun Sang-Hee;Lee Myung-Joon;Park Yang-Su
    • The KIPS Transactions:PartD
    • /
    • v.12D no.5 s.101
    • /
    • pp.755-764
    • /
    • 2005
  • Recently, as it is considered more important to identify the function of ail unknown genes in living things, many tools for gene prediction have been developed to identify genes in the DNA sequences. Unfortunately, most of those tools use their own schemes to represent their programs results, requiring researchers to make additional efforts to understand the result generated by them So, it is desirable to provide a standardized method of representing predicted gene information, which makes it possible to automatically produce the predicted results for a given set of gene data In this paper, we describe an effective U representation for various predicted gene information, and present an XML-based analysis tool for gene predication results based on this representation. The developed system helps users of gene prediction tools to conveniently analyze the predicted results and to automatically produce the statistical results of the prediction. To show the usefulness of the tool, we applied our programs to the results generated by GenScan and GeneID, which are widely used gene prediction systems.

A Study on Construction of Integrated Prokaryotes Gene Prediction System (통합형 미생물 유전자 예측 시스템의 구축에 관한 연구)

  • Chang Jong-won;Ryoo Yoon-kyu;Ku Ja-hyo;Yoon Young-woo
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.6 no.1
    • /
    • pp.27-32
    • /
    • 2005
  • As a large quantity of Genome sequencing has happened to be done a very much a surprising speed in short period, an automatic genome annotation process has become prerequisite. The most difficult process among with this kind of genome annotation works is to finding out the protein-coding genes within a genome. The main 2 subjects of gene prediction are Eukaryotes and Prokaryotes ; their genes have different structures, therefore, their gene prediction methods will also obviously varies. Until now, it is found that among of the 231 genome sequenced species, 200 have been found to be prokaryotes, therefore, for study of biotechnology studies, through comparative genomics, prokaryotes, rather than eukaryotes could may be more appropriate than eukaryotes. Even more, prokaryotes does not have the gene structure called an intron, so it makes the gene prediction easier. Former prokaryotes gene predictions have been shown to be 80%~ to 90% of accuracy. A recent study is aiming at 100% of gene prediction accuracy. In this paper, especially in the case of the E. coli K-12 and S. typhi genomes, gene prediction accuracy which showed 98.5% and 98.7% was more efficient than previous GLIMMER.

  • PDF

Comparison and optimization of deep learning-based radiosensitivity prediction models using gene expression profiling in National Cancer Institute-60 cancer cell line

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • v.54 no.8
    • /
    • pp.3027-3033
    • /
    • 2022
  • Background: In this study, various types of deep-learning models for predicting in vitro radiosensitivity from gene-expression profiling were compared. Methods: The clonogenic surviving fractions at 2 Gy from previous publications and microarray gene-expression data from the National Cancer Institute-60 cell lines were used to measure the radiosensitivity. Seven different prediction models including three distinct multi-layered perceptrons (MLP), four different convolutional neural networks (CNN) were compared. Folded cross-validation was applied to train and evaluate model performance. The criteria for correct prediction were absolute error < 0.02 or relative error < 10%. The models were compared in terms of prediction accuracy, training time per epoch, training fluctuations, and required calculation resources. Results: The strength of MLP-based models was their fast initial convergence and short training time per epoch. They represented significantly different prediction accuracy depending on the model configuration. The CNN-based models showed relatively high prediction accuracy, low training fluctuations, and a relatively small increase in the memory requirement as the model deepens. Conclusion: Our findings suggest that a CNN-based model with moderate depth would be appropriate when the prediction accuracy is important, and a shallow MLP-based model can be recommended when either the training resources or time are limited.

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 2018.05a
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

Ovarian Cancer Prognostic Prediction Model Using RNA Sequencing Data

  • Jeong, Seokho;Mok, Lydia;Kim, Se Ik;Ahn, TaeJin;Song, Yong-Sang;Park, Taesung
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.32.1-32.7
    • /
    • 2018
  • Ovarian cancer is one of the leading causes of cancer-related deaths in gynecological malignancies. Over 70% of ovarian cancer cases are high-grade serous ovarian cancers and have high death rates due to their resistance to chemotherapy. Despite advances in surgical and pharmaceutical therapies, overall survival rates are not good, and making an accurate prediction of the prognosis is not easy because of the highly heterogeneous nature of ovarian cancer. To improve the patient's prognosis through proper treatment, we present a prognostic prediction model by integrating high-dimensional RNA sequencing data with their clinical data through the following steps: gene filtration, pre-screening, gene marker selection, integrated study of selected gene markers and prediction model building. These steps of the prognostic prediction model can be applied to other types of cancer besides ovarian cancer.

A Eukaryotic Gene Structure Prediction Program Using Duration HMM (Duration HMM을 이용한 진핵생물 유전자 예측 프로그램 개발)

  • Tae, Hong-Seok;Park, Gi-Jeong
    • Korean Journal of Microbiology
    • /
    • v.39 no.4
    • /
    • pp.207-215
    • /
    • 2003
  • Gene structure prediction, which is to predict protein coding regions in a given nucleotide sequence, is the most important process in annotating genes and greatly affects gene analysis and genome annotation. As eukaryotic genes have more complicated stuructures in DNA sequences than those of prokaryotic genes, analysis programs for eukaryotic gene structure prediction have more diverse and more complicated computational models. We have developed EGSP, a eukaryotic gene structure program, using duration hidden markov model. The program consists of two major processes, one of which is a training process to produce parameter values from training data sets and the other of which is to predict protein coding regions based on the parameter values. The program predicts multiple genes rather than a single gene from a DNA sequence. A few computational models were implemented to detect signal pattern and their scanning efficiency was tested. Prediction performance was calculated and was compared with those of a few commonly used programs, GenScan, GeneID and Morgan based on a few criteria. The results show that the program can be practically used as a stand-alone program and a module in a system. For gene prediction of eukaryotic microbial genomes, training and prediction analysis was done with Saccharomyces chromosomes and the result shows the program is currently practically applicable to real eukaryotic microbial genomes.