• Title/Summary/Keyword: DNA sequence database

Search Result 209, Processing Time 0.03 seconds

A Framework of Intelligent Middleware for DNA Sequence Analysis in Cloud Computing Environment (DNA 서열 분석을 위한 클라우드 컴퓨팅 기반 지능형 미들웨어 설계)

  • Oh, Junseok;Lee, Yoonjae;Lee, Bong Gyou
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.29-43
    • /
    • 2014
  • The development of NGS technologies, such as scientific workflows, has reduced the time required for decoding DNA sequences. Although the automated technologies change the genome sequence analysis environment, limited computing resources still pose problems for the analysis. Most scientific workflow systems are pre-built platforms and are highly complex because a lot of the functions are implemented into one system platform. It is also difficult to apply components of pre-built systems to a new system in the cloud environment. Cloud computing technologies can be applied to the systems to reduce analysis time and enable simultaneous analysis of massive DNA sequence data. Web service techniques are also introduced for improving the interoperability between DNA sequence analysis systems. The workflow-based middleware, which supports Web services, DBMS, and cloud computing, is proposed in this paper for expecting to reduceanalysis time and aiding lightweight virtual instances. It uses DBMS for managing the pipeline status and supporting the creation of lightweight virtual instances in the cloud environment. Also, the RESTful Web services with simple URI and XML contents are applied for improving the interoperability. The performance test of the system needs to be conducted by comparing results other developed DNA analysis services at the stabilization stage.

Revealing Regulatory Networks of DNA Repair Genes in S. Cerevisiae

  • Kim, Min-Sung;Lee, Do-Heon;Yi, Gwan-Su
    • Bioinformatics and Biosystems
    • /
    • v.2 no.1
    • /
    • pp.12-16
    • /
    • 2007
  • DNA repair means a collection of processes that a cell identifies and corrects damage to genome sequence. The DNA repair processes are important because a genome would not be able to maintain its essential cellular functions without the processes. In this research, we make some gene regulatory networks of DNA repair in S. cerevisiae to know how each gene interacts with others. Two approaches are adapted to make the networks; Bayesian Network and ARACNE. After construction of gene regulatory networks based on the two approaches, the two networks are compared to each other to predict which genes have important roles in the DNA repair processes by finding conserved interactions and looking for hubs. In addition, each interaction between genes in the networks is validated with interaction information in S. cerevisiae genome database to support the meaning of predicted interactions in the networks.

  • PDF

Construction and Characterization of a cDNA Library from the Camelina sativa L. as an Alternative Oil-Seed Crop (신 바이오디젤 원료 작물인 Camelina의 cDNA library 제작 및 유전자 특성)

  • Park, Won;Jang, Young-Seok;Ahn, Sung-Ju
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.55 no.2
    • /
    • pp.151-158
    • /
    • 2010
  • Camelina sativa L., known as popular names "gold-of-pleasure" or "false flax" is an alternative oilseed crop that can be grown under different climatic and soil conditions. Up to date, however, the genomic information of Camelina has not been studied in detail. Therefore, a cDNA library was constructed and characterized from young leaves. The constructed cDNA library incorporated of 1334 cDNA clones and the size of the insertion fragments average was 736 base pair. We generated a total of 1269 high-quality expressed sequence tags (ESTs) sequences. The result of cluster analysis of EST sequences showed that the number of unigene was 851. According to subsequent analysis, the 476 (55.9%) unigenes were highly homologous to known function genes and the other 375 (44.1%) unigenes were unknown. Remaining 63 (7.4%) unigenes had no homology with any other peptide in NCBI database, indicating that these seemed to be novel genes expressed in leaves of Camelina. The database-matched ESTs were further classified into 17 categories according to their functional annotation. The most abundant of categories were "protein with binding function or cofactor requirement (27%)", "metabolism (11%)", "subcellular localization (11%)", "cellular transport, transport facilities and transport routes (7%)", "energy (6%)", "regulation of metabolism and protein function (6%)". Our result in this study provides an overview of mRNA expression profile and a basal genetic information of Camelina as an oilseed crop.

A Practical Approximate Sub-Sequence Search Method for DNA Sequence Databases (DNA 시퀀스 데이타베이스를 위한 실용적인 유사 서브 시퀀스 검색 기법)

  • Won, Jung-Im;Hong, Sang-Kyoon;Yoon, Jee-Hee;Park, Sang-Hyun;Kim, Sang-Wook
    • Journal of KIISE:Databases
    • /
    • v.34 no.2
    • /
    • pp.119-132
    • /
    • 2007
  • In molecular biology, approximate subsequence search is one of the most important operations. In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results. To verify the superiority of the proposed method, we conducted performance evaluation via a series of experiments. The results reveal that the proposed method, which requires smaller storage space, achieves 4 to 17 times improvement in performance over the suffix tree based method. Even when the length of a query sequence is large, our method is more than an order of magnitude faster than the suffix tree based method and the Smith-Waterman algorithm.

Analysis of partial cDNA sequence from Theileria sergenti

  • Park, Jin-ho;Chae, Joon-seok;Kim, Dae-hyuk;Jang, Yong-suk;Kwon, Oh-deog;Lee, Joo-mook
    • Korean Journal of Veterinary Research
    • /
    • v.39 no.4
    • /
    • pp.797-805
    • /
    • 1999
  • T sergenti cDNA library were constructed to get a more broad information about the structural, functional or antigenic properties of the proteins, and analyzes for their partial cDNA sequences and expression sequences tags(ESTg). The mRNA were purified from T sergenti isolates to identify the information of antigen gene, then first and second strand cDNA was synthesized. EcoR I adaptor ligation and Xho I enzyme restriction were used to the synthesized cDNA, and ligated into a Uni-ZAP XR vector. T sergenti cDNA library was constructed with packaging and amplification in vitro. Antibody screening was performed with constructed T sergenti cDNA library using antisera against T sergenti. Among those clones, eight phagemids were rescued from the recombinant in vivo excision with f1 helper phage. Using the analysis of endonuclease restriction and PCR, the recombinant cDNA were proved having a 0.5-3.0kb of inserts. The eight of partial cDNA clones' sequences were obtained and examined for their homology using BLASTN and BLASTX. The eight of sequenced clones were classified into three groups according to the basis of database searches. A total 3,045bp of partial cDNA sequence were determined from six clones. The putatively identified clones contain a cytochrome c gene, a heat shock protein gene, a cyclophilin gene, and a ribosomal protein gene. The unidentified clones have a homology to ATP-binding protein(mtrA) gene of S argillaceus, DNA-binding protein(DBP) gene of Pseudorabies virus 85kDa merozoite protein gene of B bovis, mRNA spm1 protein of T annulata and glycine-rich RNA-binding protein mRNA of O sativa etc.

  • PDF

Screening of Differentially Expressed Genes in Heterosigma akashiwo, a Red-Tide Causing Organism, Induced by Exposure to High Light

  • Ko, Young-Seok;Cho, Kyung-Je;Moon, Byoung-Yong
    • Journal of Photoscience
    • /
    • v.8 no.3_4
    • /
    • pp.93-97
    • /
    • 2001
  • Heterosigma akashiwo has been reported as red-tide causing phytoplankton in the Korean coastal area during summer when they are exposed to high light. It also shows photosynthetic adaptability to strong light during culture in the laboratory. On the basis of these observations, we tried to find out some genes specifically expressed in Heterosimga akashiwo during exposure to high light, assuming that they might have some resistant mechanisms associated with light adaptation. For this purpose, we carried out DD-PCR to detect differentially expressed mRNAs from cells that had been illuminated under high light for 3 days. We found eight cDNA clones that had been expressed specificically for high light. When they were further screened by reverse Northern hybridization, three of them were identified to be positive cDNA clones. When these cDNA fragments were subjected to DNA sequencing and then their base sequences were compared to GenBank database, one of them showed sequence homology 86% identical to the partial sequence of 16S rRNA gene of eubacterium CRO-18.

  • PDF

Trend and Technology of Gene and Genome Research (유전자 및 유전체 연구 기술과 동향)

  • 이진성;김기환;서동상;강석우;황재삼
    • Journal of Sericultural and Entomological Science
    • /
    • v.42 no.2
    • /
    • pp.126-141
    • /
    • 2000
  • A major step towards understanding of the genetic basis of an organism is the complete sequence determination of all genes in target genome. The nucleotide sequence encoded in the genome contains the information that specifies the amino acid sequence of every protein and functional RNA molecule. In principle, it will be possible to identify every protein resposible for the structure and function of the body of the target organism. The pattern of expression in different cell types will specify where and when each protein is used. The amino acid sequence of the proteins encoded by each gene will be derived from the conceptional translation of the nucleotide sequence. Comparison of these sequences with those of known proteins, whose sequences are sorted in database, will suggest an approximate function for many proteins. This mini review describes the development of new sequencing methods and the optimization of sequencing strategies for whole genome, various cDNA and genomic analysis.

  • PDF

Effective Biological Sequence Alignment Method using Divide Approach

  • Choi, Hae-Won;Kim, Sang-Jin;Pi, Su-Young
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.41-50
    • /
    • 2012
  • This paper presents a new sequence alignment method using the divide approach, which solves the problem by decomposing sequence alignment into several sub-alignments with respect to exact matching subsequences. Exact matching subsequences in the proposed method are bounded on the generalized suffix tree of two sequences, such as protein domain length more than 7 and less than 7. Experiment results show that protein sequence pairs chosen in PFAM database can be aligned using this method. In addition, this method reduces the time about 15% and space of the conventional dynamic programming approach. And the sequences were classified with 94% of accuracy.

Identification of Fusobacterium nucleatum isolated from Korean by F. nucleatum subspecies-specific DNA probes (Dot blot hybridization법을 이용한 Fusobacterium nucleatum 아종-특이 DNA 프로브의 특이성 평가)

  • Kim, Hwa-Sook;Kook, Joong-Ki
    • Journal of Korean society of Dental Hygiene
    • /
    • v.6 no.4
    • /
    • pp.311-324
    • /
    • 2006
  • The purpose of this investigation was to evaluate of the specificity of Fusobacterium nucleatum subspecies-specific DNA probes using dot blot hybridization. To confirm whether the clinical isolates were F. nucleatum or not, 16S rDNA of them were cloned and sequenced. The sequencing data were used in homology search with database of GenBank. When the homology was above 98% compared with the nucleotide sequence of a certain bacteria, it was judged as the same species with the bacteria. 23 strains of F. nucleatum were isolates from subgingival plaque of periodontitis patient. The clinical isolates of F. nucleatum were classified into 10 groups using phylogenetic analysis of 16S rDNA sequence. F. nucleatum subspecies nucleatum-specific DNA probe Fu4(1.3 kb) reacted with genomic DNAs from 8 type strains of F. nucleatum and it reacted strongly with those from 8 clinical isolates. The Fp4(0.8 kb) reacted with F. nucleatum subsp. polymorphum ATCC 10953 and one clinical isolates. Fv35(1.9 kb) and Fs17(8.2 kb) probes reacted with genomic DNAs from F. nucleatum subsp. vincentii ATCC 49256 and F. nucleatum subsp. fusiform ATCC 51190, respectively. Our results showed that it is not enough to evaluate the specificity of F. nucleatum subspecies-specific DNA probes with only dot blot hybridization. Therefore, Southern blot analysis will be necessary to confirm the specificity of F. nucleatum subspecies-specific DNA probes.

  • PDF

Analyses of Expressed Sequence Tags from Chironomus riparius Using Pyrosequencing : Molecular Ecotoxicology Perspective

  • Nair, Prakash M. Gopalakrishnan;Park, Sun-Young;Choi, Jin-Hee
    • Environmental Analysis Health and Toxicology
    • /
    • v.26
    • /
    • pp.10.1-10.7
    • /
    • 2011
  • Objects: Chironomus riparius, a non-biting midge (Chironomidae, Diptera), is extensively used as a model organism in aquatic ecotoxicological studies, and considering the potential of C. riparius larvae as a bio-monitoring species, little is known about its genome sequences. This study reports the results of an Expressed Sequence Tags (ESTs) sequencing project conducted on C. riparius larvae using 454 pyrosequencing. Method: To gain a better understanding of C. riparius transcriptome, we generated ESTs database of C.ripairus using pyrosequencing method. Results: Sequencing runs, using normalized cDNA collections from fourth instar larvae, yielded 20,020 expressed sequence tags, which were assembled into 8,565 contigs and 11,455 singletons. Sequence analysis was performed by BlastX search against the National Center for Biotechnology Information (NCBI) nucleotide (nr) and uniprot protein database. Based on the gene ontology classifications, 24% (E-value${\leq}1^{-5}$) of the sequences had known gene functions, 24% had unknown functions and 52% of sequences did not match any known sequences in the existing database. Sequence comparison revealed 81% of the genes have homologous genes among other insects belonging to the order Diptera providing tools for comparative genome analyses. Targeted searches using these annotations identified genes associated with essential metabolic pathways, signaling pathways, detoxification of toxic metabolites and stress response genes of ecotoxicological interest. Conclusions: The results obtained from this study would eventually make ecotoxicogenomics possible in a truly environmentally relevant species, such as, C. riparius.