• Title/Summary/Keyword: Biological Sequence Database

Search Result 93, Processing Time 0.058 seconds

Phylogenomics and its Growing Impact on Algal Phylogeny and Evolution

  • Adrian , Reyes-Prieto;Yoon, Hwan-Su;Bhattacharya, Debashish
    • ALGAE
    • /
    • v.21 no.1
    • /
    • pp.1-10
    • /
    • 2006
  • Genomic data is accumulating in public database at an unprecedented rate. Although presently dominated by the sequences of metazoan, plant, parasitic, and picoeukaryotic taxa, both expressed sequence tag (EST) and complete genomes of free-living algae are also slowly appearing. This wealth of information offers the opportunity to clarify many long-standing issues in algal and plant evolution such as the contribution of the plastid endosymbiont to nuclear genome evolution using the tools of comparative genomics and multi-gene phylogenetics. A particularly powerful approach for the automated analysis of genome data from multiple taxa is termed phylogenomics. Phylogenomics is the convergence of genomics science (the study of the function and structure of genes and genomes) and molecular phylogenetics (the study of the hierarchical evolutionary relationships among organisms, their genes and genomes). The use of phylogenetics to drive comparative genome analyses has facilitated the reconstruction of the evolutionary history of genes, gene families, and organisms. Here we survey the available genome data, introduce phylogenomic pipelines, and review some initial results of phylogenomic analyses of algal genome data.

Identification of Novel Cupredoxin Homologs Using Overlapped Conserved Residues Based Approach

  • Goyal, Amit;Madan, Bharat;Hwang, Kyu-Suk;Lee, Sun-Gu
    • Journal of Microbiology and Biotechnology
    • /
    • v.25 no.1
    • /
    • pp.127-136
    • /
    • 2015
  • Cupredoxin-like proteins are mainly copper-binding proteins that conserve a typical rigid Greek-key arrangement consisting of an eight-stranded β-sandwich, even though they share as little as 10-15% sequence similarity. The electron transport function of the Cupredoxins is critical for respiration and photosynthesis, and the proteins have therapeutic potential. Despite their crucial biological functions, the identification of the distant Cupredoxin homologs has been a difficult task due to their low sequence identity. In this study, the overlapped conserved residue (OCR) fingerprint for the Cupredoxin superfamily, which consists of conserved residues in three aspects (i.e., the sequence, structure, and intramolecular interaction), was used to detect the novel Cupredoxin homologs in the NCBI non-redundant protein sequence database. The OCR fingerprint could identify 54 potential Cupredoxin sequences, which were validated by scanning them against the conserved Cupredoxin motif near the Cu-binding site. This study also attempted to model the 3D structures and to predict the functions of the identified potential Cupredoxins. This study suggests that the OCR-based approach can be used efficiently to detect novel homologous proteins with low sequence identity, such as Cupredoxins.

The List of Korean Organisms Registered in the NCBI Nucleotide Database for Environmental DNA Research (환경유전자 연구를 위한 NCBI Nucleotide 데이터베이스에 등록된 국내 생물 목록 현황)

  • Ihn-Sil Kwak;Chang Woo Ji;Won-Seok Kim;Dongsoo Kong
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.4
    • /
    • pp.352-359
    • /
    • 2022
  • Recently, with the development of genetic technology, interest in environmental DNA (eDNA) to study biodiversity according to molecular biological approaches is increasing. Environmental DNA has many advantages over traditional research methods for biological communities distributed in the environment but highly depends on the established base sequence database. This study conducted a comprehensive analysis of the habitat status and classification at the genus level, which is mainly used in eDNA (12S rRNA, 16S rRNA, 18S rRNA, COI, and CYTB), focusing on Korean registration taxon groups (phytoplankton, zooplankton, macroinvertebrates, and fish). As a result, phytoplankton and zooplankton showed the highest taxa proportion in 18S rRNA, and macroinvertebrates observed the highest ratio in the nucleotide sequence database in COI. In fish, all genes except 18S rRNA showed a high taxon ratio. Based on the Korean registration taxon group, the gene construction of the top 20 genera according to bio density observed that most of the phytoplankton were registered in 18S rRNA, and the most significant number of COI nucleotide sequences were established in macroinvertebrates. In addition, it was confirmed that there is a nucleotide sequence for the top 20 genera in 12S rRNA, 16S rRNA, and CYTB in fish. These results provided comprehensive information on the genes suitable for eDNA research for each taxon group.

Identification of $\sigma^{B}$-Dependent Promoters Using Consensus-Directed Search of Streptomyces coelicolor Genome

  • Lee, Eun-Jin;Cho, You-Hee;Kim, Hyo-Sub;Roe, Jung-Hye
    • Journal of Microbiology
    • /
    • v.42 no.2
    • /
    • pp.147-151
    • /
    • 2004
  • $\sigma^{B}$ plays an important role in both osmoprotection and proper differentiation in Streptomyces coelicolor A3(2). We searched for candidate members of the $\sigma^{B}$ regulon from the genome database, using the consensus promoter sequence (GNNTN$_{14-16}$GGGTAC/T). The list consists of l15 genes, and includes all the known $\sigma^{B}$ target genes and many other genes whose functions are related to stress protection and dif-ferentiation.

Mining the Proteome of Fusobacterium nucleatum subsp. nucleatum ATCC 25586 for Potential Therapeutics Discovery: An In Silico Approach

  • Habib, Abdul Musaweer;Islam, Md. Saiful;Sohel, Md.;Mazumder, Md. Habibul Hasan;Sikder, Mohd. Omar Faruk;Shahik, Shah Md.
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.255-264
    • /
    • 2016
  • The plethora of genome sequence information of bacteria in recent times has ushered in many novel strategies for antibacterial drug discovery and facilitated medical science to take up the challenge of the increasing resistance of pathogenic bacteria to current antibiotics. In this study, we adopted subtractive genomics approach to analyze the whole genome sequence of the Fusobacterium nucleatum, a human oral pathogen having association with colorectal cancer. Our study divulged 1,499 proteins of F. nucleatum, which have no homolog's in human genome. These proteins were subjected to screening further by using the Database of Essential Genes (DEG) that resulted in the identification of 32 vitally important proteins for the bacterium. Subsequent analysis of the identified pivotal proteins, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Automated Annotation Server (KAAS) resulted in sorting 3 key enzymes of F. nucleatum that may be good candidates as potential drug targets, since they are unique for the bacterium and absent in humans. In addition, we have demonstrated the three dimensional structure of these three proteins. Finally, determination of ligand binding sites of the 2 key proteins as well as screening for functional inhibitors that best fitted with the ligands sites were conducted to discover effective novel therapeutic compounds against F. nucleatum.

TFSCAN 검색 프로그램 TFSCAN의 개발

  • Lee, Byung-Uk;Park, Kie-Jung;Kim, Ki-Bong;Park, Wan;Park, Yong-Ha
    • Microbiology and Biotechnology Letters
    • /
    • v.24 no.3
    • /
    • pp.371-375
    • /
    • 1996
  • TFD is a transcription factor database which consists of short functional DNA sequences called as signals and their references. SIGNAL SCAN, developed by Dan S. Prestridge, is used to determine what signals of TFD may exist in a DNA sequence. This program searches TFD database by using a simple algorithm for character string comparison. We developed TFSCAN that aims at searching for signals in an input DNA sequence more efficently than SIGNAL SCAN. Our algorithms consist of two parts, one constructs an automata by scanning sequences of rFD, the other searches for signals through this automata. Searching for signal-related references is radically improved in time by using an indexing method. Usage of TFSCAN is very simple and its output is obvious. We developed and installed a TFSCAN input form and a CGI program in GINet Web server, to use TFSCAN. The algorithm applying automata showed drastical results in improvement of computing time. This approach may apply to recognizing several biological patterns. We have been developing our algorithm to optimize the automata and to search more sensitively for signals.

  • PDF

Construction of a full-length cDNA library from Pinus koraiensis and analysis of EST dataset (잣나무(Pinus koraiensis)의 cDNA library 제작 및 EST 분석)

  • Kim, Joon-Ki;Im, Su-Bin;Choi, Sun-Hee;Lee, Jong-Suk;Roh, Mark S.;Lim, Yong-Pyo
    • Korean Journal of Agricultural Science
    • /
    • v.38 no.1
    • /
    • pp.11-16
    • /
    • 2011
  • In this study, we report the generation and analysis of a total of 1,211 expressed sequence tags (ESTs) from Pinus koraiensis. A cDNA library was generated from the young leaf tissue and a total of 1,211 cDNA were partially sequenced. EST and unigene sequence quality were determined by computational filtering, manual review, and BLAST analyses. In all, 857 ESTs were acquired after the removal of the vector sequence and filtering over a minimum length 50 nucleotides. A total of 411 unigene, consisting of 89 contigs and 322 singletons, was identified after assembling. Also, we identified 77 new microsatellite-containing sequences from the unigenes and classified the structure according to their repeat unit. According to homology search with BLASTX against the NCBI database, 63.1% of ESTs were homologous with known function and 22.2% of ESTs were matched with putative or unknown function. The remaining 14.6% of ESTs showed no significant similarity to any protein sequences found in the public database. Gene ontology (GO) classification showed that the most abundant GO terms were transport, nucleotide binding, plastid, in terms biological process, molecular function and cellular component, respectively. The sequence data will be used to characterize potential roles of new genes in Pinus and provided for the useful tools as a genetic resource.

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences (단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1816-1821
    • /
    • 2016
  • Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

EST Knowledge Integrated Systems (EKIS): An Integrated Database of EST Information for Research Application

  • Kim, Dae-Won;Jung, Tae-Sung;Choi, Young-Sang;Nam, Seong-Hyeuk;Kwon, Hyuk-Ryul;Kim, Dong-Wook;Choi, Han-Suk;Choi, Sang-Heang;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • v.7 no.1
    • /
    • pp.38-40
    • /
    • 2009
  • The EST Knowledge Integrated System, EKIS (http://ekis.kribb.re.kr), was established as a part of Korea's Ministry of Education, Science and Technology initiative for genome sequencing and application research of the biological model organisms (GEAR) project. The goals of the EKIS are to collect EST information from GEAR projects and make an integrated database to provide transcriptomic and metabolomic information for biological scientists. The EKIS constitutes five independent categories and several retrieval systems in each category for incorporating massive EST data from high-throughput sequencing of 65 different species. Through the EKIS database, scientists can freely access information including BLAST functional annotation as well as Genechip and pathway information for KEGG. By integrating complex data into a framework of existing EST knowledge information, the EKIS provides new insights into specialized metabolic pathway information for an applied industrial material.

HCoV-IMDB: Database for the Analysis of Interactions between HCoV and Host Immune Proteins

  • Kim, Mi-Ran;Lee, Ji-Hae;Son, Hyeon Seok;Kim, Hayeon
    • International journal of advanced smart convergence
    • /
    • v.8 no.1
    • /
    • pp.1-8
    • /
    • 2019
  • Coronaviruses are known respiratory pathogens. In the past, most human coronaviruses were thought to cause mild symptoms such as cold. However recently, as seen in the Severe Acute Respiratory Syndrome (SARS) and the Middle East Respiratory Syndrome (MERS), infectious diseases with severe pulmonary disease and respiratory symptoms are caused by coronaviruses, making research on coronaviruses become important. Considering previous studies, we constructed 'HCoV-IMDB (Human Corona Virus Immune Database)' to systematically provide genetic information on human coronavirus and host immune information, which can be used to analyze the interaction between human coronavirus and host immune proteins. The 'HCoV-IMDB' constructed in the study can be used to search for genetic information on human coronavirus and host immune protein and to download data. A BLAST search specific to the human coronavirus, one of the database functions, can be used to infer genetic information and evolutionary relationship about the query sequence.