• Title/Summary/Keyword: Contig

Search Result 66, Processing Time 0.022 seconds

Computational Detection of Prokaryotic Core Promoters in Genomic Sequences

  • Kim Ki-Bong;Sim Jeong Seop
    • Journal of Microbiology
    • /
    • v.43 no.5
    • /
    • pp.411-416
    • /
    • 2005
  • The high-throughput sequencing of microbial genomes has resulted in the relatively rapid accumulation of an enormous amount of genomic sequence data. In this context, the problem posed by the detection of promoters in genomic DNA sequences via computational methods has attracted considerable research attention in recent years. This paper addresses the development of a predictive model, known as the dependence decomposition weight matrix model (DDWMM), which was designed to detect the core promoter region, including the -10 region and the transcription start sites (TSSs), in prokaryotic genomic DNA sequences. This is an issue of some importance with regard to genome annotation efforts. Our predictive model captures the most significant dependencies between positions (allowing for non­adjacent as well as adjacent dependencies) via the maximal dependence decomposition (MDD) procedure, which iteratively decomposes data sets into subsets, based on the significant dependence between positions in the promoter region to be modeled. Such dependencies may be intimately related to biological and structural concerns, since promoter elements are present in a variety of combinations, which are separated by various distances. In this respect, the DDWMM may prove to be appropriate with regard to the detection of core promoter regions and TSSs in long microbial genomic contigs. In order to demonstrate the effectiveness of our predictive model, we applied 10-fold cross-validation experiments on the 607 experimentally-verified promoter sequences, which evidenced good performance in terms of sensitivity.

Draft genome sequence of Lactobacillus salivarius KLW001 isolated from a weaning piglet (이유자돈으로부터 분리한 Lactobacillus salivarius KLW001의 유전체 분석)

  • Jin, Gwi-Deuk;Lee, Jun-Yeong;Kim, Eun Bae
    • Korean Journal of Microbiology
    • /
    • v.53 no.2
    • /
    • pp.134-136
    • /
    • 2017
  • Lactobacillus salivarius KLW001, a species of lactic acid bacteria (LAB), was isolated from a weaning piglet in a swine farm, South Korea, to develop an antimicrobial probiotic strain for piglets. Herein, we report the draft genome sequence of the strain. The genome contains 2,326,706 bp with a G+C content of 33.0% in 166 contigs (${\geq}500bp$). From the genome, we found out 4 genes related to antibiotic resistance, 36 genes for phages, 3 genes for bile hydrolysis, and 27 CRISPR spacers.

Complete genome sequence of Flavobacteriaceae strain KCTC 52651 isolated from seawater recirculating aquaculture system (해수 순환여과양식시스템에서 분리된 Flavobacteriaceae 균주 KCTC 52651의 유전체 분석)

  • Kim, Young-Sam;Jeon, Young Jae;Kim, Kyoung-Ho
    • Korean Journal of Microbiology
    • /
    • v.55 no.2
    • /
    • pp.174-176
    • /
    • 2019
  • A novel bacterium, designated strain RR4-38 (= KCTC 52651 = DSM 108068), belonging to the family Flavobacteriaceae was isolated from a biofilter in the seawater recirculating aquaculture system in South Korea. A single complete genome contig which is 3,182,272 bp with 41.9% G+C content was generated using PacBio RS II platform. The genome includes 2,829 protein-coding genes, 6 rRNA genes, 38 tRNA genes, 4 non-coding RNA genes, and 9 pseudogenes. The results will provide insights for understanding microbial activity in the seawater recirculating aquaculture system.

Draft genome sequence of oligosaccharide producing Leuconostoc lactis CCK940 isolated from kimchi in Korea (올리고당을 생산하는 Leuconostoc lactis CCK940 균주의 유전체 염기서열)

  • Lee, Sulhee;Park, Young-Seo
    • Korean Journal of Microbiology
    • /
    • v.54 no.4
    • /
    • pp.445-447
    • /
    • 2018
  • Leuconostoc lactis CCK940, which was isolated from kimchi obtained from a Korean traditional market, produced an oligosaccharide with a degree of polymerization of more than 4. In this study, the draft genome sequence of L. lactis CCK940 was reported by using PacBio 20 kb platform. The genome of this strain was sequenced and the genome assembly revealed 2 contigs. The genome was 1,741,511 base pairs in size with a G + C content of 43.33%, containing 1,698 coding sequences, 12 rRNA genes, and 68 tRNA genes. L. lactis CCK940 contained genes encoding glycosyltransferase, sucrose phosphorylase, maltose phosphorylase, and ${\beta}$-galactosidase which could synthesize oligosaccharide.

The complete genome sequence of a white spot syndrome virus isolated from Litopenaeus vannamei (흰다리새우(Litopenaeus vannamei )에서 분리된 WSSV의 전장유전체 분석)

  • Lee, A-reum;Kong, Kyoung-Hui;Kim, Hwi-Jin;Oh, Myung-Joo;Kim, Do-Hyung;Kim, Jong-Oh;Kim, Wi-Sik
    • Journal of fish pathology
    • /
    • v.35 no.1
    • /
    • pp.129-133
    • /
    • 2022
  • The full genome sequence of a Korean white spot syndrome virus (WSSV, isolate: WSSV-GoC18) is presented here. We obtained a total of 12,320,554 reads with 291,172 bases, 170 gene, and 170 coding DNA sequence, which were assembled in 1 contig. Phylogenetic analysis revealed that the WSSV-GoC18 was closely related to Chinese isolate (WSSV-PC) and distinctly different with previously reported a Korean isolate (WSSV K-LV1). The complete genome sequence of WSSV isolates will be of great help in molecular epidemiological studies, contributing to molecular diagnosis and disease prevention in shrimp aquaculture.

Draft Genome Sequence of Meropenem-Resistant Pseudomonas peli CJ30, Isolated from the Han River, South Korea (대한민국 한강에서 분리된 메로페넴 내성 Pseudomonas peli CJ30의 유전체 서열 초안)

  • Yong-Seok Kim;Chang-Jun Cha
    • Microbiology and Biotechnology Letters
    • /
    • v.51 no.2
    • /
    • pp.214-216
    • /
    • 2023
  • Meropenem-resistant Pseudomonas peli CJ30 was isolated from the Han River, South Korea. The genome of strain CJ30 comprising 4,919,106 bp with a G + C content of 60.0% was assembled to nine contigs. The draft genome sequence contained 5,411 protein-coding genes, 18 rRNA genes, and 70 tRNA genes. Strain CJ30 contained blaSFC-3 and ampC β-lactamase gene.

Comparative analysis of HiSeq3000 and BGISEQ-500 sequencing platform with shotgun metagenomic sequencing data

  • Animesh Kumar;Espen M. Robertsen;Nils P. Willassen;Juan Fu;Erik Hjerde
    • Genomics & Informatics
    • /
    • v.21 no.4
    • /
    • pp.49.1-49.11
    • /
    • 2023
  • Recent advances in sequencing technologies and platforms have enabled to generate metagenomics sequences using different sequencing platforms. In this study, we analyzed and compared shotgun metagenomic sequences generated by HiSeq3000 and BGISEQ-500 platforms from 12 sediment samples collected across the Norwegian coast. Metagenomics DNA sequences were normalized to an equal number of bases for both platforms and further evaluated by using different taxonomic classifiers, reference databases, and assemblers. Normalized BGISEQ-500 sequences retained more reads and base counts after preprocessing, while a slightly higher fraction of HiSeq3000 sequences were taxonomically classified. Kaiju classified a higher percentage of reads relative to Kraken2 for both platforms, and comparison of reference database for taxonomic classification showed that MAR database outperformed RefSeq. Assembly using MEGAHIT produced longer assemblies and higher total contigs count in majority of HiSeq3000 samples than using metaSPAdes, but the assembly statistics notably improved with unprocessed or normalized reads. Our results indicate that both platforms perform comparably in terms of the percentage of taxonomically classified reads and assembled contig statistics for metagenomics samples. This study provides valuable insights for researchers in selecting an appropriate sequencing platform and bioinformatics pipeline for their metagenomics studies.

Complete Genome Sequence of Bifidobacterium longum subsp. longum DS0950 Isolated from Infant Feces with Obesity-Ameliorating Effects

  • Hana Jo;Yong-Sik Kim;Doo-Sang Park
    • Microbiology and Biotechnology Letters
    • /
    • v.52 no.2
    • /
    • pp.218-220
    • /
    • 2024
  • Bifidobacterium longum subsp. longum DS0950 (B. longum DS0950) was isolated from infant feces and has been reported to be effective in preventing obesity. The whole-genome sequence of B. longum DS0950 was obtained using the PacBio RS II platform, and it was consists of a single chromosome of 2,433,092 bp. The B. longum DS0950 contains genes associated with the synthesis of bacteriocins and a series of genes capable of producing xylitol from ribulose-5-phosphate.

Complete genome sequence of Treponema pedis GNW45 isolated from dairy cattle with active bovine digital dermatitis in Korea

  • Hector Espiritu;Lovelia Mamuad;Edeneil Jerome Valete;Sang-Suk Lee;Yong-Il Cho
    • Journal of Animal Science and Technology
    • /
    • v.66 no.5
    • /
    • pp.1079-1082
    • /
    • 2024
  • Treponema pedis, a fastidious anaerobic spirochete, is one of the main pathogens involved in the development and progression of bovine digital dermatitis (BDD), a lameness-causing hoof infection in cattle. Here, the complete genome sequencing of T. pedis GNW45 isolated from a dairy cow infected with BDD, was presented. Libraries for long and short reads were sequenced using PacBioRSII and Illimuna HiSeqXTen platforms, respectively. De-novo assembly was done using the long reads, producing a circular contig, by which the short reads were aligned to generate a more accurate genome sequence. The genome has a total size of 3,077,465 base pairs, with 36.84% guanine-cytosine content. A total of 2,749 protein-coding sequences, seven ribosomal RNA's, and 45 transfer RNA's were annotated. Functional analysis revealed genes associated with pathogenicity and survivability in the complex pathobiome of BDD. This study provided novel insights into the survival and pathogenic mechanisms of T. pedis GNW45.

Construction of a full-length cDNA library from Pinus koraiensis and analysis of EST dataset (잣나무(Pinus koraiensis)의 cDNA library 제작 및 EST 분석)

  • Kim, Joon-Ki;Im, Su-Bin;Choi, Sun-Hee;Lee, Jong-Suk;Roh, Mark S.;Lim, Yong-Pyo
    • Korean Journal of Agricultural Science
    • /
    • v.38 no.1
    • /
    • pp.11-16
    • /
    • 2011
  • In this study, we report the generation and analysis of a total of 1,211 expressed sequence tags (ESTs) from Pinus koraiensis. A cDNA library was generated from the young leaf tissue and a total of 1,211 cDNA were partially sequenced. EST and unigene sequence quality were determined by computational filtering, manual review, and BLAST analyses. In all, 857 ESTs were acquired after the removal of the vector sequence and filtering over a minimum length 50 nucleotides. A total of 411 unigene, consisting of 89 contigs and 322 singletons, was identified after assembling. Also, we identified 77 new microsatellite-containing sequences from the unigenes and classified the structure according to their repeat unit. According to homology search with BLASTX against the NCBI database, 63.1% of ESTs were homologous with known function and 22.2% of ESTs were matched with putative or unknown function. The remaining 14.6% of ESTs showed no significant similarity to any protein sequences found in the public database. Gene ontology (GO) classification showed that the most abundant GO terms were transport, nucleotide binding, plastid, in terms biological process, molecular function and cellular component, respectively. The sequence data will be used to characterize potential roles of new genes in Pinus and provided for the useful tools as a genetic resource.