• 제목/요약/키워드: gene prediction

검색결과 295건 처리시간 0.023초

Cloning and Sequence Analysis of Glyceraldehyde-3-Phosphate Dehydrogenase Gene in Yak

  • Li, Sheng-Wei;Jiang, Ming-Feng;Liu, Yong-Tao;Yang, Tu-Feng;Wang, Yong;Zhong, Jin-Cheng
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제21권11호
    • /
    • pp.1673-1679
    • /
    • 2008
  • In order to study the biological function of gapdh gene in yak, and prove whether the gapdh gene was a useful intra-reference gene that can be given an important role in molecular biology research of yak, the cDNA sequence encoding glyceraldehyde-3-phosphate dehydrogenase from yak was cloned by the RT-PCR method using gene specific PCR primers. The sequence results indicated that the cloned cDNA fragment (1,008 bp) contained a 1,002 bp open reading frame, encoding 333 amino acids (AAs) with a molecular mass of 35.753 kDa. The deduced amino acids sequence showed a high level of sequence identity to Bos Taurus (99.70%), Xenopus laevis (94.29%), Homo sapiens (97.01%), Mus musculus (97.90%) and Sus scrofa (98.20%). The expression of yak's gapdh gene in heart, spleen, kidney and brain tissues was also detected; the results showed that the gapdh gene was expressed in all these tissues. Further analysis of yak GAPDH amino acid sequence implied that it contained a complete glyceraldehyde-3-phosphate dehydrogenase active site (ASCTTNCL) which ranged from 148 to 155 amino acid residues. It also contained two conserved domains, a NAD binding domain in its N-terminal and a complete catalytic domain of sugar transport in its C-terminal. The phylogenetic analysis showed that yak and Bos taurus were the closest species. The prediction of secondary structures indicated that GAPDH of yak had a similar secondary structure to other isolated GAPDH. The results of this study suggested that the gapdh gene of yak was similar to other species and could be used as the intra-reference to analyze the expression of other genes in yak.

기능 도메인 예측을 위한 유전자 서열 클러스터링 (Gene Sequences Clustering for the Prediction of Functional Domain)

  • 한상일;이성근;허보경;변윤섭;황규석
    • 제어로봇시스템학회논문지
    • /
    • 제12권10호
    • /
    • pp.1044-1049
    • /
    • 2006
  • Multiple sequence alignment is a method to compare two or more DNA or protein sequences. Most of multiple sequence alignment tools rely on pairwise alignment and Smith-Waterman algorithm to generate an alignment hierarchy. Therefore, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST and CDD (Conserved Domain Database)search were combined with a clustering tool. Our clustering and annotating tool consists of constructing suffix tree, overlapping common subsequences, clustering gene sequences and annotating gene clusters by BLAST and CDD search. The system was successfully evaluated with 36 gene sequences in the pentose phosphate pathway, clustering 10 clusters, finding out representative common subsequences, and finally identifying functional domains by searching CDD database.

Multifactor-Dimensionality Reduction in the Presence of Missing Observations

  • Chung, Yu-Jin;Lee, Seung-Yeoun;Park, Tae-Sung
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.31-36
    • /
    • 2005
  • An identification and characterization of susceptibility genes for common complex multifactorial diseases is a challengeable task, in which the effect of single genetic variation will be likely dependent on other genetic variations(gene-gene interaction) and environmental factors (gene-environment interaction). To address is issue, the multifactor dimensionality reduction (MDR) has been proposed and implemented by Ritchie et al. (2001), Moore et al. (2002), Hahn et al.(2003) and Ritchie et al. (2003). With MDR, multilocus genotypes effectively reduce the dimension of genotype predictors from n to one, which improves the identification of polymorphism combinations associated with disease risk. However, MDR cannot handle missing observations appropriately, in which missing observation is treated as an additional genotype category. This approach may suffer from a sparseness problem since when high-order interactions are considered, an additional missing category would make the contingency table cells more sparse. We propose a new MDR approach with minimum loss of sample sizes by considering missing data over all possible multifactor classes. We evaluate the proposed MDR by using the prediction errors and cross validation consistency.

  • PDF

유전자 발현량 데이터 증대를 위한 Conditional VAE 기반 생성 모델 (Conditional Variational Autoencoder-based Generative Model for Gene Expression Data Augmentation)

  • 봉현수;오민식
    • 방송공학회논문지
    • /
    • 제28권3호
    • /
    • pp.275-284
    • /
    • 2023
  • 유전자 발현 데이터는 질병의 예후 예측, 약물 반응성 예측 등 질병에 대한 이해와 정밀 의료 실현을 위한 연구들에 활용될 수 있지만 충분한 양의 데이터를 수집하는 데 많은 비용적 문제가 있다. 본 논문에서는 Conditional VAE에 기반한 유전자 발현 데이터 생성 모델을 제안하였다. 이전 연구인 WGAN-GP기반의 유전자 발현 생성 모델과 정형 데이터 생성 모델인 CTGAN, TVAE와 비교하여 본 논문의 Conditional VAE기반 모델이 생물학적, 통계학적으로 더 유의미한 합성 데이터를 생성할 수 있음을 보였다.

고래의 게놈에서 hypoxia-inducible factor binding site의 예측과 target gene에 대한 분석 (Prediction of Hypoxia-inducible Factor Binding Site in Whale Genome and Analysis of Target Genes Regulated by Predicted Sites)

  • 임형순;이재학
    • 한국해양바이오학회지
    • /
    • 제7권2호
    • /
    • pp.35-41
    • /
    • 2015
  • Whales are marine mammals that are fully adapted to aquatic environment. Whales breathe by lungs so they require adaptive system to low oxygen concentration (hypoxia) while deep and prolonged diving. However, the study for the molecular mechanism underlying cetacean adaptation to hypoxia has been limited. Hypoxia-inducible factor (HIF) is the central transcription factor that regulates hypoxia-related gene expression. Here we identified HIF-binding sites in whale genome by phylogenetic footprinting and analyzed HIF-target genes to understand how whales cope with hypoxia. By comparison with the HIF-target genes of terrestrial mammals, it was suggested that whales may retain unique adaptation mechanisms to hypoxia.

절대치와 절삭을 이용한 유전자 집단 분석 (Gene Set Analysis - Absolute and Trim)

  • 이광현;이선호
    • 응용통계연구
    • /
    • 제21권3호
    • /
    • pp.523-535
    • /
    • 2008
  • 본 연구의 목적은 마이크로어레이 자료로부터 암 또는 질병에 유의한 유전자집단을 찾아내는 보다 효과적인 방법을 제안하고자 하는 것이다. 유전자 집단 분석의 대표적 방법인 PAGE와 GSEA의 한계점을 살펴보고, 그것을 보완하기 위한 GSA-AT라는 방법을 제안하였다. 모의실험과 실제자료실험을 통해 분석해 본 결과 본 연구에서 제안한 GSA-AT 방법에서 더 의미 있는 결과를 도출하였다.

Prediction of hub genes of Alzheimer's disease using a protein interaction network and functional enrichment analysis

  • Wee, Jia Jin;Kumar, Suresh
    • Genomics & Informatics
    • /
    • 제18권4호
    • /
    • pp.39.1-39.8
    • /
    • 2020
  • Alzheimer's disease (AD) is a chronic, progressive brain disorder that slowly destroys affected individuals' memory and reasoning faculties, and consequently, their ability to perform the simplest tasks. This study investigated the hub genes of AD. Proteins interact with other proteins and non-protein molecules, and these interactions play an important role in understanding protein function. Computational methods are useful for understanding biological problems, in particular, network analyses of protein-protein interactions. Through a protein network analysis, we identified the following top 10 hub genes associated with AD: PTGER3, C3AR1, NPY, ADCY2, CXCL12, CCR5, MTNR1A, CNR2, GRM2, and CXCL8. Through gene enrichment, it was identified that most gene functions could be classified as integral to the plasma membrane, G-protein coupled receptor activity, and cell communication under gene ontology, as well as involvement in signal transduction pathways. Based on the convergent functional genomics ranking, the prioritized genes were NPY, CXCL12, CCR5, and CNR2.

Evaluation and interpretation of transcriptome data underlying heterogeneous chronic obstructive pulmonary disease

  • Ham, Seokjin;Oh, Yeon-Mok;Roh, Tae-Young
    • Genomics & Informatics
    • /
    • 제17권1호
    • /
    • pp.2.1-2.12
    • /
    • 2019
  • Chronic obstructive pulmonary disease (COPD) is a type of progressive lung disease, featured by airflow obstruction. Recently, a comprehensive analysis of the transcriptome in lung tissue of COPD patients was performed, but the heterogeneity of the sample was not seriously considered in characterizing the mechanistic dysregulation of COPD. Here, we established a new transcriptome analysis pipeline using a deconvolution process to reduce the heterogeneity and clearly identified that these transcriptome data originated from the mild or moderate stage of COPD patients. Differentially expressed or co-expressed genes in the protein interaction subnetworks were linked with mitochondrial dysfunction and the immune response, as expected. Computational protein localization prediction revealed that 19 proteins showing changes in subcellular localization were mostly related to mitochondria, suggesting that mislocalization of mitochondria-targeting proteins plays an important role in COPD pathology. Our extensive evaluation of COPD transcriptome data could provide guidelines for analyzing heterogeneous gene expression profiles and classifying potential candidate genes that are responsible for the pathogenesis of COPD.

Prediction model for concrete carbonation depth using gene expression programming

  • Murad, Yasmin Z;Tarawneh, Bashar K;Ashteyat, Ahmed M
    • Computers and Concrete
    • /
    • 제26권6호
    • /
    • pp.497-504
    • /
    • 2020
  • Concrete can lose its alkalinity by concrete carbonation causing steel corrosion. Thus, the determination of the carbonation depth is necessary. An empirical model is proposed in this research to predict the carbonation depth of concrete using Gene expression programming (GEP). The GEP model was trained and validated using a large and reliable database collected from the literature. The model was developed using the six parameters that predominantly control the carbonation depth of concrete including carbon dioxide CO2 concentration, relative humidity, water-to-cement ratio, maximum aggregate size, aggregate to binder ratio and carbonation period. The model was statistically evaluated and then compared to the Jiang et al. model. A parametric study was finally performed to check the proposed GEP model's sensitivity to the selected input parameters.

MOTIF BASED PROTEIN FUNCTION ANALYSIS USING DATA MINING

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2006년도 Proceedings of ISRS 2006 PORSEC Volume II
    • /
    • pp.812-815
    • /
    • 2006
  • Proteins are essential agents for controlling, effecting and modulating cellular functions, and proteins with similar sequences have diverged from a common ancestral gene, and have similar structures and functions. Function prediction of unknown proteins remains one of the most challenging problems in bioinformatics. Recently, various computational approaches have been developed for identification of short sequences that are conserved within a family of closely related protein sequence. Protein function is often correlated with highly conserved motifs. Motif is the smallest unit of protein structure and function, and intends to make core part among protein structural and functional components. Therefore, prediction methods using data mining or machine learning have been developed. In this paper, we describe an approach for protein function prediction of motif-based models using data mining. Our work consists of three phrases. We make training and test data set and construct classifier using a training set. Also, through experiments, we evaluate our classifier with other classifiers in point of the accuracy of resulting classification.

  • PDF