• Title/Summary/Keyword: gene prediction

Search Result 295, Processing Time 0.021 seconds

ORF Miner: a Web-based ORF Search Tool

  • Park, Sin-Gi;Kim, Ki-Bong
    • Genomics & Informatics
    • /
    • v.7 no.4
    • /
    • pp.217-219
    • /
    • 2009
  • The primary clue for locating protein-coding regions is the open reading frame and the determination of ORFs (Open Reading Frames) is the first step toward the gene prediction, especially for prokaryotes. In this respect, we have developed a web-based ORF search tool called ORF Miner. The ORF Miner is a graphical analysis utility which determines all possible open reading frames of a selectable minimum size in an input sequence. This tool identifies all open reading frames using alternative genetic codes as well as the standard one and reports a list of ORFs with corresponding deduced amino acid sequences. The ORF Miner can be employed for sequence annotation and give a crucial clue to determination of actual protein-coding regions.

A prediction model for strength and strain of CFRP-confined concrete cylinders using gene expression programming

  • Sema, Alacali
    • Computers and Concrete
    • /
    • v.30 no.6
    • /
    • pp.377-391
    • /
    • 2022
  • The use of carbon fiber-reinforced polymers (CFRP) has widely increased due to its enhancement in the ultimate strength and ductility of the reinforced concrete (RC) structures. This study presents a prediction model for the axial compressive strength and strain of normal-strength concrete cylinders confined with CFRP. Besides, soft computing approaches have been extensively used to model in many areas of civil engineering applications. Therefore, the genetic expression programming (GEP) models to predict axial compressive strength and strain of CFRP-confined concrete specimens were used in this study. For this purpose, the parameters of 283 CFRP-confined concrete specimens collected from 38 experimental studies in the literature were taken into account as input variables to predict GEP based models. Then, the results of GEP models were statistically compared with those of models proposed by various researchers. The values of R2 for strength and strain of CFRP-confined concrete were obtained as 0.897 and 0.713, respectively. The results of the comparison reveal that the proposed GEP-based models for CFRP-confined concrete have the best efficiency among the existing models and provide the best performance.

Prediction of earthquake-induced crest settlement of embankment dams using gene expression programming

  • Evren, Seyrek;Sadettin, Topcu
    • Geomechanics and Engineering
    • /
    • v.31 no.6
    • /
    • pp.637-651
    • /
    • 2022
  • The seismic design of embankment dams requires more comprehensive studies to understand the behaviour of dams. Deformations primarily control this behaviour occur during or after earthquake loading. Dam failures and incidents show that the impacts of deformations should be reviewed for existing and new embankment dams. Overtopping erosion failure can occur if crest deformations exceed the freeboard at the time of the deformations. Therefore, crest settlement is one of the most critical deformations. This study developed empirical formulas using Gene Expression Programming (GEP) based on 88 cases. In the analyses, dam height (Hd), alluvium thickness (Ha), the magnitude-acceleration-factor (MAF) values developed based on earthquake magnitude (Mw) and peak ground acceleration (PGA) within this study have been chosen as variables. Results show that GEP models developed in the paper are remarkably robust and accessible tools to predict earthquake-induced crest settlement of embankment dams and perform superior to the existing formulation. Also, dam engineering professionals can use them practically because the variables of prediction equations are easily accessible after the earthquake.

Correlation between MR Image-Based Radiomics Features and Risk Scores Associated with Gene Expression Profiles in Breast Cancer (유방암에서 자기공명영상 근거 영상표현형과 유전자 발현 프로파일 근거 위험도의 관계)

  • Ga Ram Kim;You Jin Ku;Jun Ho Kim;Eun-Kyung Kim
    • Journal of the Korean Society of Radiology
    • /
    • v.81 no.3
    • /
    • pp.632-643
    • /
    • 2020
  • Purpose To investigate the correlation between magnetic resonance (MR) image-based radiomics features and the genomic features of breast cancer by focusing on biomolecular intrinsic subtypes and gene expression profiles based on risk scores. Materials and Methods We used the publicly available datasets from the Cancer Genome Atlas and the Cancer Imaging Archive to extract the radiomics features of 122 breast cancers on MR images. Furthermore, PAM50 intrinsic subtypes were classified and their risk scores were determined from gene expression profiles. The relationship between radiomics features and biomolecular characteristics was analyzed. A penalized generalized regression analysis was performed to build prediction models. Results The PAM50 subtype demonstrated a statistically significant association with the maximum 2D diameter (p = 0.0189), degree of correlation (p = 0.0386), and inverse difference moment normalized (p = 0.0337). Among risk score systems, GGI and GENE70 shared 8 correlated radiomic features (p = 0.0008-0.0492) that were statistically significant. Although the maximum 2D diameter was most significantly correlated to both score systems (p = 0.0139, and p = 0.0008), the overall degree of correlation of the prediction models was weak with the highest correlation coefficient of GENE70 being 0.2171. Conclusion Maximum 2D diameter, degree of correlation, and inverse difference moment normalized demonstrated significant relationships with the PAM50 intrinsic subtypes along with gene expression profile-based risk scores such as GENE70, despite weak correlations.

Classification of Genes Based on Age-Related Differential Expression in Breast Cancer

  • Lee, Gunhee;Lee, Minho
    • Genomics & Informatics
    • /
    • v.15 no.4
    • /
    • pp.156-161
    • /
    • 2017
  • Transcriptome analysis has been widely used to make biomarker panels to diagnose cancers. In breast cancer, the age of the patient has been known to be associated with clinical features. As clinical transcriptome data have accumulated significantly, we classified all human genes based on age-specific differential expression between normal and breast cancer cells using public data. We retrieved the values for gene expression levels in breast cancer and matched normal cells from The Cancer Genome Atlas. We divided genes into two classes by paired t test without considering age in the first classification. We carried out a secondary classification of genes for each class into eight groups, based on the patterns of the p-values, which were calculated for each of the three age groups we defined. Through this two-step classification, gene expression was eventually grouped into 16 classes. We showed that this classification method could be applied to establish a more accurate prediction model to diagnose breast cancer by comparing the performance of prediction models with different combinations of genes. We expect that our scheme of classification could be used for other types of cancer data.

Structure Prediction of the Peptide Synthesized with the Nonribosomal Peptide Synthetase Gene from Bradyrhizobium japonicum

  • JUNG BO-RA;LEE YUKYUNG;LIM YOONGHO;AHN JOONG-HOON
    • Journal of Microbiology and Biotechnology
    • /
    • v.15 no.3
    • /
    • pp.656-659
    • /
    • 2005
  • Small peptides synthesized by nonribosomal peptide synthetases (NRPSs) genes are found in bacteria and fungi. While some microbial taxa have few, others make a large number and variety. However, biochemical characterization of the products synthesized by NPRS demands a great deal of efforts. Since the completion of genome projects of numerous microorganisms, the numbers of available NRPSs genes are being expanded. Prediction of the peptides encoded by NRPS could save time and efforts. We chose the NRPS gene from Bradyrhizobium japonicum as a model to predict the peptide structure encoded by NRPS genes. Using computational analyses, the domain structure of this gene was defined, and the structure of a peptide synthesized by this NRPS was deduced. It was found that it encoded a tripeptide consisting of proline-serine-phenylalanine. This method would be helpful to predict the structure of small peptides with various NPRS genes from the genome sequence.

Multi-gene genetic programming for the prediction of the compressive strength of concrete mixtures

  • Ghahremani, Behzad;Rizzo, Piervincenzo
    • Computers and Concrete
    • /
    • v.30 no.3
    • /
    • pp.225-236
    • /
    • 2022
  • In this article, Multi-Gene Genetic Programming (MGGP) is proposed for the estimation of the compressive strength of concrete. MGGP is known to be a powerful algorithm able to find a relationship between certain input space features and a desired output vector. With respect to most conventional machine learning algorithms, which are often used as "black boxes" that do not provide a mathematical formulation of the output-input relationship, MGGP is able to identify a closed-form formula for the input-output relationship. In the study presented in this article, MGPP was used to predict the compressive strength of plain concrete, concrete with fly ash, and concrete with furnace slag. A formula was extracted for each mixture and the performance and the accuracy of the predictions were compared to the results of Artificial Neural Network (ANN) and Extreme Learning Machine (ELM) algorithms, which are conventional and well-established machine learning techniques. The results of the study showed that MGGP can achieve a desirable performance, as the coefficients of determination for plain concrete, concrete with ash, and concrete with slag from the testing phase were equal to 0.928, 0.906, 0.890, respectively. In addition, it was found that MGGP outperforms ELM in all cases and its' accuracy is slightly less than ANN's accuracy. However, MGGP models are practical and easy-to-use since they extract closed-form formulas that may be implemented and used for the prediction of compressive strength.

Statistical Analysis for Feature Subset Selection Procedures.

  • Kim, In-Young;Lee, Sun-Ho;Kim, Sang-Cheol;Rha, Sun-Young;Chung, Hyun-Cheol;Kim, Byung-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.101-106
    • /
    • 2003
  • In this paper, we propose using Hotelling's T2 statistic for the detection of a set of a set of differentially expressed (DE) genes in colorectal cancer based on its gene expression level in tumor tissues compared with those in normal tissues and to evaluate its predictivity which let us rank genes for the development of biomarkers for population screening of colorectal cancer. We compared the prediction rate based on the DE genes selected by Hotelling's T2 statistic and univariate t statistic using various prediction methods, a regulized discrimination analysis and a support vector machine. The result shows that the prediction rate based on T2 is better than that of univatiate t. This implies that it may not be sufficient to look at each gene in a separate universe and that evaluating combinations of genes reveals interesting information that will not be discovered otherwise.

  • PDF

Partial AUC maximization for essential gene prediction using genetic algorithms

  • Hwang, Kyu-Baek;Ha, Beom-Yong;Ju, Sanghun;Kim, Sangsoo
    • BMB Reports
    • /
    • v.46 no.1
    • /
    • pp.41-46
    • /
    • 2013
  • Identifying genes indispensable for an organism's life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, protein-protein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature's relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods.

Prediction of creep in concrete using genetic programming hybridized with ANN

  • Hodhod, Osama A.;Said, Tamer E.;Ataya, Abdulaziz M.
    • Computers and Concrete
    • /
    • v.21 no.5
    • /
    • pp.513-523
    • /
    • 2018
  • Time dependent strain due to creep is a significant factor in structural design. Multi-gene genetic programming (MGGP) and artificial neural network (ANN) are used to develop two models for prediction of creep compliance in concrete. The first model was developed by MGGP technique and the second model by hybridized MGGP-ANN. In the MGGP-ANN, the ANN is working in parallel with MGGP to predict errors in MGGP model. A total of 187 experimental data sets that contain 4242 data points are filtered from the NU-ITI database. These data are used in developing the MGGP and MGGP-ANN models. These models contain six input variables which are: average compressive strength at 28 days, relative humidity, volume to surface ratio, cement type, age at start of loading and age at the creep measurement. Practical equation based on MGGP was developed. A parametric study carried out with a group of hypothetical data generated among the range of data used to check the generalization ability of MGGP and MGGP-ANN models. To confirm validity of MGGP and MGGP-ANN models; two creep prediction code models (ACI209 and CEB), two empirical models (B3 and GL 2000) are used to compare their results with NU-ITI database.