• Title/Summary/Keyword: Gene Prediction

Search Result 294, Processing Time 0.027 seconds

Cloning and Sequence Analysis of Glyceraldehyde-3-Phosphate Dehydrogenase Gene in Yak

  • Li, Sheng-Wei;Jiang, Ming-Feng;Liu, Yong-Tao;Yang, Tu-Feng;Wang, Yong;Zhong, Jin-Cheng
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.11
    • /
    • pp.1673-1679
    • /
    • 2008
  • In order to study the biological function of gapdh gene in yak, and prove whether the gapdh gene was a useful intra-reference gene that can be given an important role in molecular biology research of yak, the cDNA sequence encoding glyceraldehyde-3-phosphate dehydrogenase from yak was cloned by the RT-PCR method using gene specific PCR primers. The sequence results indicated that the cloned cDNA fragment (1,008 bp) contained a 1,002 bp open reading frame, encoding 333 amino acids (AAs) with a molecular mass of 35.753 kDa. The deduced amino acids sequence showed a high level of sequence identity to Bos Taurus (99.70%), Xenopus laevis (94.29%), Homo sapiens (97.01%), Mus musculus (97.90%) and Sus scrofa (98.20%). The expression of yak's gapdh gene in heart, spleen, kidney and brain tissues was also detected; the results showed that the gapdh gene was expressed in all these tissues. Further analysis of yak GAPDH amino acid sequence implied that it contained a complete glyceraldehyde-3-phosphate dehydrogenase active site (ASCTTNCL) which ranged from 148 to 155 amino acid residues. It also contained two conserved domains, a NAD binding domain in its N-terminal and a complete catalytic domain of sugar transport in its C-terminal. The phylogenetic analysis showed that yak and Bos taurus were the closest species. The prediction of secondary structures indicated that GAPDH of yak had a similar secondary structure to other isolated GAPDH. The results of this study suggested that the gapdh gene of yak was similar to other species and could be used as the intra-reference to analyze the expression of other genes in yak.

Gene Sequences Clustering for the Prediction of Functional Domain (기능 도메인 예측을 위한 유전자 서열 클러스터링)

  • Han Sang-Il;Lee Sung-Gun;Hou Bo-Kyeng;Byun Yoon-Sup;Hwang Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.10
    • /
    • pp.1044-1049
    • /
    • 2006
  • Multiple sequence alignment is a method to compare two or more DNA or protein sequences. Most of multiple sequence alignment tools rely on pairwise alignment and Smith-Waterman algorithm to generate an alignment hierarchy. Therefore, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST and CDD (Conserved Domain Database)search were combined with a clustering tool. Our clustering and annotating tool consists of constructing suffix tree, overlapping common subsequences, clustering gene sequences and annotating gene clusters by BLAST and CDD search. The system was successfully evaluated with 36 gene sequences in the pentose phosphate pathway, clustering 10 clusters, finding out representative common subsequences, and finally identifying functional domains by searching CDD database.

Multifactor-Dimensionality Reduction in the Presence of Missing Observations

  • Chung, Yu-Jin;Lee, Seung-Yeoun;Park, Tae-Sung
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.31-36
    • /
    • 2005
  • An identification and characterization of susceptibility genes for common complex multifactorial diseases is a challengeable task, in which the effect of single genetic variation will be likely dependent on other genetic variations(gene-gene interaction) and environmental factors (gene-environment interaction). To address is issue, the multifactor dimensionality reduction (MDR) has been proposed and implemented by Ritchie et al. (2001), Moore et al. (2002), Hahn et al.(2003) and Ritchie et al. (2003). With MDR, multilocus genotypes effectively reduce the dimension of genotype predictors from n to one, which improves the identification of polymorphism combinations associated with disease risk. However, MDR cannot handle missing observations appropriately, in which missing observation is treated as an additional genotype category. This approach may suffer from a sparseness problem since when high-order interactions are considered, an additional missing category would make the contingency table cells more sparse. We propose a new MDR approach with minimum loss of sample sizes by considering missing data over all possible multifactor classes. We evaluate the proposed MDR by using the prediction errors and cross validation consistency.

  • PDF

Conditional Variational Autoencoder-based Generative Model for Gene Expression Data Augmentation (유전자 발현량 데이터 증대를 위한 Conditional VAE 기반 생성 모델)

  • Hyunsu Bong;Minsik Oh
    • Journal of Broadcast Engineering
    • /
    • v.28 no.3
    • /
    • pp.275-284
    • /
    • 2023
  • Gene expression data can be utilized in various studies, including the prediction of disease prognosis. However, there are challenges associated with collecting enough data due to cost constraints. In this paper, we propose a gene expression data generation model based on Conditional Variational Autoencoder. Our results demonstrate that the proposed model generates synthetic data with superior quality compared to two other state-of-the-art models for gene expression data generation, namely the Wasserstein Generative Adversarial Network with Gradient Penalty based model and the structured data generation models CTGAN and TVAE.

Prediction of Hypoxia-inducible Factor Binding Site in Whale Genome and Analysis of Target Genes Regulated by Predicted Sites (고래의 게놈에서 hypoxia-inducible factor binding site의 예측과 target gene에 대한 분석)

  • Yim, Hyung-Soon;Lee, Jae-Hak
    • Journal of Marine Bioscience and Biotechnology
    • /
    • v.7 no.2
    • /
    • pp.35-41
    • /
    • 2015
  • Whales are marine mammals that are fully adapted to aquatic environment. Whales breathe by lungs so they require adaptive system to low oxygen concentration (hypoxia) while deep and prolonged diving. However, the study for the molecular mechanism underlying cetacean adaptation to hypoxia has been limited. Hypoxia-inducible factor (HIF) is the central transcription factor that regulates hypoxia-related gene expression. Here we identified HIF-binding sites in whale genome by phylogenetic footprinting and analyzed HIF-target genes to understand how whales cope with hypoxia. By comparison with the HIF-target genes of terrestrial mammals, it was suggested that whales may retain unique adaptation mechanisms to hypoxia.

Gene Set Analysis - Absolute and Trim (절대치와 절삭을 이용한 유전자 집단 분석)

  • Lee, Kwang-Hyun;Lee, Sun-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.523-535
    • /
    • 2008
  • Initial work of microarray data analysis focused on identification of differentially expressed genes, and recently, the focus has moved to discovering significant sets of functionally related genes. We describe some problems of GSEA and PAGE, and propose a modified method to identify significant gene sets. The results based on a simulated experiment and real data analysis using a set of publicly available data show the superiority of the newly proposed method, GSA-AT, in detecting significant pathways with the accurate prediction.

Prediction of hub genes of Alzheimer's disease using a protein interaction network and functional enrichment analysis

  • Wee, Jia Jin;Kumar, Suresh
    • Genomics & Informatics
    • /
    • v.18 no.4
    • /
    • pp.39.1-39.8
    • /
    • 2020
  • Alzheimer's disease (AD) is a chronic, progressive brain disorder that slowly destroys affected individuals' memory and reasoning faculties, and consequently, their ability to perform the simplest tasks. This study investigated the hub genes of AD. Proteins interact with other proteins and non-protein molecules, and these interactions play an important role in understanding protein function. Computational methods are useful for understanding biological problems, in particular, network analyses of protein-protein interactions. Through a protein network analysis, we identified the following top 10 hub genes associated with AD: PTGER3, C3AR1, NPY, ADCY2, CXCL12, CCR5, MTNR1A, CNR2, GRM2, and CXCL8. Through gene enrichment, it was identified that most gene functions could be classified as integral to the plasma membrane, G-protein coupled receptor activity, and cell communication under gene ontology, as well as involvement in signal transduction pathways. Based on the convergent functional genomics ranking, the prioritized genes were NPY, CXCL12, CCR5, and CNR2.

Evaluation and interpretation of transcriptome data underlying heterogeneous chronic obstructive pulmonary disease

  • Ham, Seokjin;Oh, Yeon-Mok;Roh, Tae-Young
    • Genomics & Informatics
    • /
    • v.17 no.1
    • /
    • pp.2.1-2.12
    • /
    • 2019
  • Chronic obstructive pulmonary disease (COPD) is a type of progressive lung disease, featured by airflow obstruction. Recently, a comprehensive analysis of the transcriptome in lung tissue of COPD patients was performed, but the heterogeneity of the sample was not seriously considered in characterizing the mechanistic dysregulation of COPD. Here, we established a new transcriptome analysis pipeline using a deconvolution process to reduce the heterogeneity and clearly identified that these transcriptome data originated from the mild or moderate stage of COPD patients. Differentially expressed or co-expressed genes in the protein interaction subnetworks were linked with mitochondrial dysfunction and the immune response, as expected. Computational protein localization prediction revealed that 19 proteins showing changes in subcellular localization were mostly related to mitochondria, suggesting that mislocalization of mitochondria-targeting proteins plays an important role in COPD pathology. Our extensive evaluation of COPD transcriptome data could provide guidelines for analyzing heterogeneous gene expression profiles and classifying potential candidate genes that are responsible for the pathogenesis of COPD.

Prediction model for concrete carbonation depth using gene expression programming

  • Murad, Yasmin Z;Tarawneh, Bashar K;Ashteyat, Ahmed M
    • Computers and Concrete
    • /
    • v.26 no.6
    • /
    • pp.497-504
    • /
    • 2020
  • Concrete can lose its alkalinity by concrete carbonation causing steel corrosion. Thus, the determination of the carbonation depth is necessary. An empirical model is proposed in this research to predict the carbonation depth of concrete using Gene expression programming (GEP). The GEP model was trained and validated using a large and reliable database collected from the literature. The model was developed using the six parameters that predominantly control the carbonation depth of concrete including carbon dioxide CO2 concentration, relative humidity, water-to-cement ratio, maximum aggregate size, aggregate to binder ratio and carbonation period. The model was statistically evaluated and then compared to the Jiang et al. model. A parametric study was finally performed to check the proposed GEP model's sensitivity to the selected input parameters.

MOTIF BASED PROTEIN FUNCTION ANALYSIS USING DATA MINING

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.812-815
    • /
    • 2006
  • Proteins are essential agents for controlling, effecting and modulating cellular functions, and proteins with similar sequences have diverged from a common ancestral gene, and have similar structures and functions. Function prediction of unknown proteins remains one of the most challenging problems in bioinformatics. Recently, various computational approaches have been developed for identification of short sequences that are conserved within a family of closely related protein sequence. Protein function is often correlated with highly conserved motifs. Motif is the smallest unit of protein structure and function, and intends to make core part among protein structural and functional components. Therefore, prediction methods using data mining or machine learning have been developed. In this paper, we describe an approach for protein function prediction of motif-based models using data mining. Our work consists of three phrases. We make training and test data set and construct classifier using a training set. Also, through experiments, we evaluate our classifier with other classifiers in point of the accuracy of resulting classification.

  • PDF