• 제목/요약/키워드: gene prediction

검색결과 287건 처리시간 0.025초

Computational Approaches to Gene Prediction

  • Do Jin-Hwan;Choi Dong-Kug
    • Journal of Microbiology
    • /
    • 제44권2호
    • /
    • pp.137-144
    • /
    • 2006
  • The problems associated with gene identification and the prediction of gene structure in DNA sequences have been the focus of increased attention over the past few years with the recent acquisition by large-scale sequencing projects of an immense amount of genome data. A variety of prediction programs have been developed in order to address these problems. This paper presents a review of the computational approaches and gene-finders used commonly for gene prediction in eukaryotic genomes. Two approaches, in general, have been adopted for this purpose: similarity-based and ab initio techniques. The information gleaned from these methods is then combined via a variety of algorithms, including Dynamic Programming (DP) or the Hidden Markov Model (HMM), and then used for gene prediction from the genomic sequences.

Introduction to Gene Prediction Using HMM Algorithm

  • Kim, Keon-Kyun;Park, Eun-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권2호
    • /
    • pp.489-506
    • /
    • 2007
  • Gene structure prediction, which is to predict protein coding regions in a given nucleotide sequence, is the most important process in annotating genes and greatly affects gene analysis and genome annotation. As eukaryotic genes have more complicated structures in DNA sequences than those of prokaryotic genes, analysis programs for eukaryotic gene structure prediction have more diverse and more complicated computational models. There are Ab Initio method, Similarity-based method, and Ensemble method for gene prediction method for eukaryotic genes. Each Method use various algorithms. This paper introduce how to predict genes using HMM(Hidden Markov Model) algorithm and present the process of gene prediction with well-known gene prediction programs.

  • PDF

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

Feasibility study of deep learning based radiosensitivity prediction model of National Cancer Institute-60 cell lines using gene expression

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • 제54권4호
    • /
    • pp.1439-1448
    • /
    • 2022
  • Background: We investigated the feasibility of in vitro radiosensitivity prediction with gene expression using deep learning. Methods: A microarray gene expression of the National Cancer Institute-60 (NCI-60) panel was acquired from the Gene Expression Omnibus. The clonogenic surviving fractions at an absorbed dose of 2 Gy (SF2) from previous publications were used to measure in vitro radiosensitivity. The radiosensitivity prediction model was based on the convolutional neural network. The 6-fold cross-validation (CV) was applied to train and validate the model. Then, the leave-one-out cross-validation (LOOCV) was applied by using the large-errored samples as a validation set, to determine whether the error was from the high bias of the folded CV. The criteria for correct prediction were defined as an absolute error<0.01 or a relative error<10%. Results: Of the 174 triplicated samples of NCI-60, 171 samples were correctly predicted with the folded CV. Through an additional LOOCV, one more sample was correctly predicted, representing a prediction accuracy of 98.85% (172 out of 174 samples). The average relative error and absolute errors of 172 correctly predicted samples were 1.351±1.875% and 0.00596±0.00638, respectively. Conclusion: We demonstrated the feasibility of a deep learning-based in vitro radiosensitivity prediction using gene expression.

XML기반의 유전자 예측결과 분석도구 (An XML-Based Analysis Tool for Gene Prediction Results)

  • 김진홍;변상희;이명준;박양수
    • 정보처리학회논문지D
    • /
    • 제12D권5호
    • /
    • pp.755-764
    • /
    • 2005
  • 생명체의 주된 기능 요소인 유전자를 모두 식별하는 작업의 중요성이 증가함에 따라, 최근에 유전자 예측도구들이 활발히 개발되고 있다. 그러나 유전자 예측 프로그램들은 예측 결과를 그들 고유의 형식으로 제공하여 사용자가 그 결과를 이해하기 위해서는 상당히 많은 추가적인 노력이 필요하다. 따라서 유전자 예측결과에 대한 표준화된 표현과 유전자 데이터 집합에 대한 예측결과를 자동으로 계산하는 방법을 지원하는 것이 바람직하다. 본 논문에서는 다양한 유전자 예측 정보에 대한 효과적인 XML 표현과 이를 바탕으로 예측된 유전자 결과를 자동으로 분석하는 in 기반 분석 도구에 대하여 기술한다. 개발된 도구는 유전자 예측도구를 사용하는 사용자들이 편리하게 예측결과를 분석하고 예측결과에 대한 통계결과를 자동으로 산출할 수 있도록 지원한다. 도구의 유용성을 보여주기 위하여 널리 사용되는 유전자 예측 도구인 GenScan과 GeneID의 처리결과를 개발된 도구에 적용시켜 보았다.

통합형 미생물 유전자 예측 시스템의 구축에 관한 연구 (A Study on Construction of Integrated Prokaryotes Gene Prediction System)

  • 장종원;류윤규;구자효;윤영우
    • 융합신호처리학회논문지
    • /
    • 제6권1호
    • /
    • pp.27-32
    • /
    • 2005
  • 유전자 서열 분석기의 발달로 유전체 서열 데이터는 급속도로 증가하여 자동적으로 유전체에 주석을 첨부하는 과정이 필요하다. 유전체에 주석을 다는 작업 중 가장 어려운 과정이 유전체내에 존재하는 단백질을 코드화하고 있는 유전자의 탐색이다. 진핵생물과 원핵생물은 유전자 구조에서 현격한 차이를 보이고 있으므로 유전자를 예측하는 방법도 각각 달라야 한다. 지금까지 전체 유전체 서열이 밝혀진 231종의 생물에서 200종이 원핵생물이다. 그러므로 비교 유전체학을 통한 생물공학 연구에서 진핵생물보다 원핵생물이 더 적합하다 할 것이다. 게다가 원핵생물의 경우 intron이라는 구조를 가지고 있지 않아 유전자 예측이 더 간단하다. 이전에 연구된 원핵생물의 유전자 예측 정확성은 80%~90%에 이르고 있고 최근의 연구에서는 유전자 예측 정확도 100%를 목표로 하고 있고, 본 논문에서는 E. coli K-12와 S. typhi 유전체의 경우, 유전체 예측 정확도가 각각 98.5%와 98.7%를 보여 기존의 GLIMMER보다 더 우수한 결과를 나타내었다.

  • PDF

Comparison and optimization of deep learning-based radiosensitivity prediction models using gene expression profiling in National Cancer Institute-60 cancer cell line

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • 제54권8호
    • /
    • pp.3027-3033
    • /
    • 2022
  • Background: In this study, various types of deep-learning models for predicting in vitro radiosensitivity from gene-expression profiling were compared. Methods: The clonogenic surviving fractions at 2 Gy from previous publications and microarray gene-expression data from the National Cancer Institute-60 cell lines were used to measure the radiosensitivity. Seven different prediction models including three distinct multi-layered perceptrons (MLP), four different convolutional neural networks (CNN) were compared. Folded cross-validation was applied to train and evaluate model performance. The criteria for correct prediction were absolute error < 0.02 or relative error < 10%. The models were compared in terms of prediction accuracy, training time per epoch, training fluctuations, and required calculation resources. Results: The strength of MLP-based models was their fast initial convergence and short training time per epoch. They represented significantly different prediction accuracy depending on the model configuration. The CNN-based models showed relatively high prediction accuracy, low training fluctuations, and a relatively small increase in the memory requirement as the model deepens. Conclusion: Our findings suggest that a CNN-based model with moderate depth would be appropriate when the prediction accuracy is important, and a shallow MLP-based model can be recommended when either the training resources or time are limited.

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 한국균학회 2018년도 춘계학술대회 및 임시총회
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

Ovarian Cancer Prognostic Prediction Model Using RNA Sequencing Data

  • Jeong, Seokho;Mok, Lydia;Kim, Se Ik;Ahn, TaeJin;Song, Yong-Sang;Park, Taesung
    • Genomics & Informatics
    • /
    • 제16권4호
    • /
    • pp.32.1-32.7
    • /
    • 2018
  • Ovarian cancer is one of the leading causes of cancer-related deaths in gynecological malignancies. Over 70% of ovarian cancer cases are high-grade serous ovarian cancers and have high death rates due to their resistance to chemotherapy. Despite advances in surgical and pharmaceutical therapies, overall survival rates are not good, and making an accurate prediction of the prognosis is not easy because of the highly heterogeneous nature of ovarian cancer. To improve the patient's prognosis through proper treatment, we present a prognostic prediction model by integrating high-dimensional RNA sequencing data with their clinical data through the following steps: gene filtration, pre-screening, gene marker selection, integrated study of selected gene markers and prediction model building. These steps of the prognostic prediction model can be applied to other types of cancer besides ovarian cancer.

Duration HMM을 이용한 진핵생물 유전자 예측 프로그램 개발 (A Eukaryotic Gene Structure Prediction Program Using Duration HMM)

  • 태홍석;박기정
    • 미생물학회지
    • /
    • 제39권4호
    • /
    • pp.207-215
    • /
    • 2003
  • 주어진 염기서열에서 단백질로 코딩되는 영역을 예측하는 유전자 구조 예측은 유전자 annotation의 가장 핵심적인 부분으로 유전자 분석 및 유전체 프로젝트 전체에 큰 영향을 준다. 진핵생물의 유전자가 원핵생물의 유전자에 비해 더 복잡한 구조를 가지기 때문에 진핵생물의 유전자 구조 예측 모델 역시 원핵생물에 비해 다양하고 복잡한 모델로 구성되어 있다. 본 연구팀은 duration hidden markov model을 기본형태로 하여 진핵생물의 유전자 구조 예측 프로그램인 EGSP를 개발하였다. 이 프로그램은 각 생명체의 유전자 구조 예측에 필요한 파라메터를 생성하는 학습기능과, 이를 기반으로 핵산 서열을 입력으로 해서 단백질을 코딩하는 부위를 예측하여 출력하는 기능으로 구성되며, 최근의 프로그램들의 추세대로 복수 개 유전자 예측의 기능을 갖추고 있다. EGSP의 학습과 예측에 사용되는 각 파라메터의 전체 성능에 대한 효과 분석 등을 위해 여러 개 signal에 대한 개별 모델이 주는 효과 등을 분석하였다. 진핵생물의 유전자 구조 예측에 가장 많이 연구되는 human dataset을 이용하여 현재 개발된 유전자 구조 예측 프로그램인 GenScan과 GeneID, Morgan 등 보편적으로 사용되는 프로그램들과의 성능을 여러 가지 기준에서 비교한 결과, 본 프로그램이 실용성 있는 수준을 보여주는 것을 확인하였다. 그리고 진핵 미생물인 Saccharomyces cerevisiae로 성능을 테스트한 결과 만족할 만한 수준의 성능을 나타내는 것을 알 수 있었다.