• Title/Summary/Keyword: genome sequence assembly

Search Result 69, Processing Time 0.029 seconds

Five Computer Simulation Studies of Whole-Genome Fragment Assembly: The Case of Assembling Zymomonas mobilis ZM4 Sequences

  • Jung, Cholhee;Choi, Jin-Young;Park, Hyun Seck;Seo, Jeong-Sun
    • Genomics & Informatics
    • /
    • v.2 no.4
    • /
    • pp.184-190
    • /
    • 2004
  • An approach for genome analysis based on assembly of fragments of DNA from the whole genome can be applied to obtain the complete nucleotide sequence of the genome of Zymomonas mobilis. However, the problem of fragment assembly raise thorny computational issues. Computer simulation studies of sequence assembly usually show some abnormal assemblage of artificial sequences containing repetitive or duplicated regions, and suggest methods to correct those abnormalities. In this paper, we describe five simulation studies which had been performed previous to the actual genome assembly process of Zymomonas mobilis ZM4.

A Simple Java Sequence Alignment Editing Tool for Resolving Complex Repeat Regions

  • Ham, Seong-Il;Lee, Kyung-Eun;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.7 no.1
    • /
    • pp.46-48
    • /
    • 2009
  • Finishing is the most time-consuming step in sequencing, and many genome projects are left unfinished due to complex repeat regions. Here, we have developed BACContigEditor, a prototype shotgun sequence finishing tool. It is essentially an editor that visualizes assemblies of shotgun sequence fragment reads as gapped multiple alignments. The program offers some flexibility that is needed to rapidly resolve complex regions within a working session. The sole purpose of the release is to promote collaborative creation of extensible software for fragment assembly editors, foster collaborative development, and reduce barriers to initial tool development effort. We describe our software architecture and identify current challenges. The program is available under an Open Source license.

Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

  • Lim, Jong-Sung;Choi, Beom-Soon;Lee, Jeong-Soo;Shin, Chan-Seok;Yang, Tae-Jin;Rhee, Jae-Sung;Lee, Jae-Seong;Choi, Ik-Young
    • Genomics & Informatics
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2012
  • Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the nextgeneration DNA sequencer (NGS) Roche/454 and Illumina/ Solexa systems, along with bioinformation analysis technologies of whole-genome $de$ $novo$ assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing $de$ $novo$ assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least $2{\times}$ and $30{\times}$ depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive shortlength reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a wholegenome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through $de$ $novo$ assembly in any whole-genome sequenced species. The $20{\times}$ and $50{\times}$ coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average $30{\times}$ coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

Perspectives on Functional Genomics

  • Song, Kyuyoung
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • v.5 no.5
    • /
    • pp.307-312
    • /
    • 2000
  • As the first assembly of the human genome was announced on June 26, 2000, we have entered post genome era. The genome sequence represents a new starting point for science and medicine with possible impact on research across the life sciences. In this review I tried to offer brief summaries of history and progress of the Human Genome Project and two major challenges ahead, functional genomics and DNA sequence variation research.

  • PDF

A data management system for microbial genome projects

  • Ki-Bong Kim;Hyeweon Nam;Hwajung Seo and Kiejung Park
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.83-85
    • /
    • 2000
  • A lot of microbial genome sequencing projects is being done in many genome centers around the world, since the first genome, Haemophilus influenzae, was sequenced in 1995. The deluge of microbial genome sequence data demands new and highly automatic data flow system in order for genome researchers to manage and analyze their own bulky sequence data from low-level to high-level. In such an aspect, we developed the automatic data management system for microbial genome projects, which consists mainly of local database, analysis programs, and user-friendly interface. We designed and implemented the local database for large-scale sequencing projects, which makes systematic and consistent data management and retrieval possible and is tightly coupled with analysis programs and web-based user interface, That is, parsing and storage of the results of analysis programs in local database is possible and user can retrieve the data in any level of data process by means of web-based graphical user interface. Contig assembly, homology search, and ORF prediction, which are essential in genome projects, make analysis programs in our system. All but Contig assembly program are open as public domain. These programs are connected with each other by means of a lot of utility programs. As a result, this system will maximize the efficiency in cost and time in genome research.

  • PDF

Complete genome sequences of Lactococcus lactis JNU 534, a potential food and feed preservative

  • Sangdon, Ryu;Kiyeop, Kim;Dae-Yeon, Cho;Younghoon, Kim;Sejong, Oh
    • Journal of Animal Science and Technology
    • /
    • v.64 no.3
    • /
    • pp.599-602
    • /
    • 2022
  • A new bacteriocin-producing lactic acid bacteria isolated from kimchi was identified as Lactococcus lactis JNU 534, presenting preservative properties for foods of animal origin. In this study, we present the complete genome sequence of the bacterial strain JNU 534. The final complete genome assembly consists of one circular chromosome (2,443,687 bp [base pair]) with an overall GC (guanine-cytosine) content of 35.2%, one circular plasmid sequence (46,387bp) with a GC content of 34.5%, and one circular contig sequence (7,666 bp) with a GC content of 36.2%.

FASIM: Fragments Assembly Simulation using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing

  • Hur Cheol-Goo;Kim Sunny;Kim Chang-Hoon;Yoon Sung-Ho;In Yong-Ho;Kim Cheol-Min;Cho Hwan-Gue
    • Journal of Microbiology and Biotechnology
    • /
    • v.16 no.5
    • /
    • pp.683-688
    • /
    • 2006
  • We have developed a program for generating shotgun data sets from known genome sequences. Generation of synthetic data sets by computer program is a useful alternative to real data to which students and researchers have limited access. Uniformly-distributed-sampling clones that were adopted by previous programs cannot account for the real situation where sampled reads tend to come from particular regions of the target genome. To reflect such situation, a probabilistic model for biased sampling distribution was developed by using an experimental data set derived from a microbial genome project. Among the experimental parameters tested (varied fragment or read lengths, chimerism, and sequencing error), the extent of sequencing error was the most critical factor that hampered sequence assembly. We propose that an optimum sequencing strategy employing different insert lengths and redundancy can be established by performing a variety of simulations.

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 2018.05a
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

Caution and Curation for Complete Mitochondrial Genome from Next-Generation Sequencing: A Case Study from Dermatobranchus otome (Gastropoda, Nudibranchia)

  • Do, Thinh Dinh;Choi, Yisoo;Jung, Dae-Wui;Kim, Chang-Bae
    • Animal Systematics, Evolution and Diversity
    • /
    • v.36 no.4
    • /
    • pp.336-346
    • /
    • 2020
  • Mitochondrial genome is an important molecule for systematic and evolutionary studies in metazoans. The development of next-generation sequencing (NGS) technique has rapidly increased the number of mitogenome sequences. The process of generating mitochondrial genome based on NGS includes different steps, from DNA preparation, sequencing, assembly, and annotation. Despite the effort to improve sequencing, assembly, and annotation methods of mitogenome, the low quality and/or quantity sequence in the final map can still be generated through the work. Therefore, it is necessary to check and curate mitochondrial genome sequence after annotation for proofreading and feedback. In this study, we introduce the pipeline for sequencing and curation for mitogenome based on NGS. For this purpose, two mitogenome sequences of Dermatobranchus otome were sequenced by Illumina Miseq system with different amount of raw read data. Generated reads were targeted for assembly and annotation with commonly used programs. As abnormal repeat regions present in the mitogenomes after annotation, primers covering these regions were designed and conventional PCR followed by Sanger sequencing were performed to curate the mitogenome sequences. The obtained sequences were used to replace the abnormal region. Following the replacement, each mitochondrial genome was compared with the other as well as the sequences of close species available on the Genbank for confirmation. After curation, two mitogenomes of D. otome showed a typically circular molecule with 14,559 bp in size and contained 13 protein-coding genes, 22 tRNA genes, two rRNA genes. The phylogenetic tree revealed a close relationship between D. otome and Tritonia diomea. The finding of this study indicated the importance of caution and curation for the generation of mitogenome from NGS.

Workflow for Building a Draft Genome Assembly using Public-domain Tools: Toxocara canis as a Case Study (개 회충 게놈 응용 사례에서 공개용 분석 툴을 사용한 드래프트 게놈 어셈블리 생성)

  • Won, JungIm;Kong, JinHwa;Huh, Sun;Yoon, JeeHee
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.513-518
    • /
    • 2014
  • It has become possible for small scale laboratories to interpret large scale genomic DNA, thanks to the reduction of the sequencing cost by the development of next generation sequencing (NGS). De novo assembly is a method which creates a putative original sequence by reconstructing reads without using a reference sequence. There have been various study results on de novo assembly, however, it is still difficult to get the desired results even by using the same assembly procedures and the analysis tools which were suggested in the studies reported. This is mainly because there are no specific guidelines for the assembly procedures or know-hows for the use of such analysis tools. In this study, to resolve these problems, we introduce steps to finding whole genome of an unknown DNA via NGS technology and de novo assembly, while providing the pros and cons of the various analysis tools used in each step. We used 350Mbp of Toxocara canis DNA as an application case for the detailed explanations of each stated step. We also extend our works for prediction of protein-coding genes and their functions from the draft genome sequence by comparing its homology with reference sequences of other nematodes.