• 제목/요약/키워드: genome sequence assembly

검색결과 69건 처리시간 0.025초

Five Computer Simulation Studies of Whole-Genome Fragment Assembly: The Case of Assembling Zymomonas mobilis ZM4 Sequences

  • Jung, Cholhee;Choi, Jin-Young;Park, Hyun Seck;Seo, Jeong-Sun
    • Genomics & Informatics
    • /
    • 제2권4호
    • /
    • pp.184-190
    • /
    • 2004
  • An approach for genome analysis based on assembly of fragments of DNA from the whole genome can be applied to obtain the complete nucleotide sequence of the genome of Zymomonas mobilis. However, the problem of fragment assembly raise thorny computational issues. Computer simulation studies of sequence assembly usually show some abnormal assemblage of artificial sequences containing repetitive or duplicated regions, and suggest methods to correct those abnormalities. In this paper, we describe five simulation studies which had been performed previous to the actual genome assembly process of Zymomonas mobilis ZM4.

A Simple Java Sequence Alignment Editing Tool for Resolving Complex Repeat Regions

  • Ham, Seong-Il;Lee, Kyung-Eun;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • 제7권1호
    • /
    • pp.46-48
    • /
    • 2009
  • Finishing is the most time-consuming step in sequencing, and many genome projects are left unfinished due to complex repeat regions. Here, we have developed BACContigEditor, a prototype shotgun sequence finishing tool. It is essentially an editor that visualizes assemblies of shotgun sequence fragment reads as gapped multiple alignments. The program offers some flexibility that is needed to rapidly resolve complex regions within a working session. The sole purpose of the release is to promote collaborative creation of extensible software for fragment assembly editors, foster collaborative development, and reduce barriers to initial tool development effort. We describe our software architecture and identify current challenges. The program is available under an Open Source license.

Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

  • Lim, Jong-Sung;Choi, Beom-Soon;Lee, Jeong-Soo;Shin, Chan-Seok;Yang, Tae-Jin;Rhee, Jae-Sung;Lee, Jae-Seong;Choi, Ik-Young
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.1-8
    • /
    • 2012
  • Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the nextgeneration DNA sequencer (NGS) Roche/454 and Illumina/ Solexa systems, along with bioinformation analysis technologies of whole-genome $de$ $novo$ assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing $de$ $novo$ assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least $2{\times}$ and $30{\times}$ depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive shortlength reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a wholegenome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through $de$ $novo$ assembly in any whole-genome sequenced species. The $20{\times}$ and $50{\times}$ coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average $30{\times}$ coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

Perspectives on Functional Genomics

  • Song, Kyuyoung
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • 제5권5호
    • /
    • pp.307-312
    • /
    • 2000
  • As the first assembly of the human genome was announced on June 26, 2000, we have entered post genome era. The genome sequence represents a new starting point for science and medicine with possible impact on research across the life sciences. In this review I tried to offer brief summaries of history and progress of the Human Genome Project and two major challenges ahead, functional genomics and DNA sequence variation research.

  • PDF

A data management system for microbial genome projects

  • Ki-Bong Kim;Hyeweon Nam;Hwajung Seo and Kiejung Park
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.83-85
    • /
    • 2000
  • A lot of microbial genome sequencing projects is being done in many genome centers around the world, since the first genome, Haemophilus influenzae, was sequenced in 1995. The deluge of microbial genome sequence data demands new and highly automatic data flow system in order for genome researchers to manage and analyze their own bulky sequence data from low-level to high-level. In such an aspect, we developed the automatic data management system for microbial genome projects, which consists mainly of local database, analysis programs, and user-friendly interface. We designed and implemented the local database for large-scale sequencing projects, which makes systematic and consistent data management and retrieval possible and is tightly coupled with analysis programs and web-based user interface, That is, parsing and storage of the results of analysis programs in local database is possible and user can retrieve the data in any level of data process by means of web-based graphical user interface. Contig assembly, homology search, and ORF prediction, which are essential in genome projects, make analysis programs in our system. All but Contig assembly program are open as public domain. These programs are connected with each other by means of a lot of utility programs. As a result, this system will maximize the efficiency in cost and time in genome research.

  • PDF

Complete genome sequences of Lactococcus lactis JNU 534, a potential food and feed preservative

  • Sangdon, Ryu;Kiyeop, Kim;Dae-Yeon, Cho;Younghoon, Kim;Sejong, Oh
    • Journal of Animal Science and Technology
    • /
    • 제64권3호
    • /
    • pp.599-602
    • /
    • 2022
  • A new bacteriocin-producing lactic acid bacteria isolated from kimchi was identified as Lactococcus lactis JNU 534, presenting preservative properties for foods of animal origin. In this study, we present the complete genome sequence of the bacterial strain JNU 534. The final complete genome assembly consists of one circular chromosome (2,443,687 bp [base pair]) with an overall GC (guanine-cytosine) content of 35.2%, one circular plasmid sequence (46,387bp) with a GC content of 34.5%, and one circular contig sequence (7,666 bp) with a GC content of 36.2%.

FASIM: Fragments Assembly Simulation using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing

  • Hur Cheol-Goo;Kim Sunny;Kim Chang-Hoon;Yoon Sung-Ho;In Yong-Ho;Kim Cheol-Min;Cho Hwan-Gue
    • Journal of Microbiology and Biotechnology
    • /
    • 제16권5호
    • /
    • pp.683-688
    • /
    • 2006
  • We have developed a program for generating shotgun data sets from known genome sequences. Generation of synthetic data sets by computer program is a useful alternative to real data to which students and researchers have limited access. Uniformly-distributed-sampling clones that were adopted by previous programs cannot account for the real situation where sampled reads tend to come from particular regions of the target genome. To reflect such situation, a probabilistic model for biased sampling distribution was developed by using an experimental data set derived from a microbial genome project. Among the experimental parameters tested (varied fragment or read lengths, chimerism, and sequencing error), the extent of sequencing error was the most critical factor that hampered sequence assembly. We propose that an optimum sequencing strategy employing different insert lengths and redundancy can be established by performing a variety of simulations.

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 한국균학회 2018년도 춘계학술대회 및 임시총회
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

Caution and Curation for Complete Mitochondrial Genome from Next-Generation Sequencing: A Case Study from Dermatobranchus otome (Gastropoda, Nudibranchia)

  • Do, Thinh Dinh;Choi, Yisoo;Jung, Dae-Wui;Kim, Chang-Bae
    • Animal Systematics, Evolution and Diversity
    • /
    • 제36권4호
    • /
    • pp.336-346
    • /
    • 2020
  • Mitochondrial genome is an important molecule for systematic and evolutionary studies in metazoans. The development of next-generation sequencing (NGS) technique has rapidly increased the number of mitogenome sequences. The process of generating mitochondrial genome based on NGS includes different steps, from DNA preparation, sequencing, assembly, and annotation. Despite the effort to improve sequencing, assembly, and annotation methods of mitogenome, the low quality and/or quantity sequence in the final map can still be generated through the work. Therefore, it is necessary to check and curate mitochondrial genome sequence after annotation for proofreading and feedback. In this study, we introduce the pipeline for sequencing and curation for mitogenome based on NGS. For this purpose, two mitogenome sequences of Dermatobranchus otome were sequenced by Illumina Miseq system with different amount of raw read data. Generated reads were targeted for assembly and annotation with commonly used programs. As abnormal repeat regions present in the mitogenomes after annotation, primers covering these regions were designed and conventional PCR followed by Sanger sequencing were performed to curate the mitogenome sequences. The obtained sequences were used to replace the abnormal region. Following the replacement, each mitochondrial genome was compared with the other as well as the sequences of close species available on the Genbank for confirmation. After curation, two mitogenomes of D. otome showed a typically circular molecule with 14,559 bp in size and contained 13 protein-coding genes, 22 tRNA genes, two rRNA genes. The phylogenetic tree revealed a close relationship between D. otome and Tritonia diomea. The finding of this study indicated the importance of caution and curation for the generation of mitogenome from NGS.

개 회충 게놈 응용 사례에서 공개용 분석 툴을 사용한 드래프트 게놈 어셈블리 생성 (Workflow for Building a Draft Genome Assembly using Public-domain Tools: Toxocara canis as a Case Study)

  • 원정임;공진화;허선;윤지희
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제20권9호
    • /
    • pp.513-518
    • /
    • 2014
  • NGS 기술의 발달로 시퀀싱 비용이 급격히 하락됨에 따라 대규모 크기의 유전체 염기 서열해독을 소규모의 실험실에서 수행할 수 있게 되었다. 디노버 어셈블리는 표준 유전체가 없는 새로운 종을 시퀀싱하는 경우 리드들의 염기 서열 정보를 이용하여 재구성함으로써 원래의 전체 시퀀스를 복원하는 것이다. 최근 이와 관련된 많은 연구 결과가 보고되고 있으나, 충분한 분석 노하우와 명확한 가이드라인 등이 공개되어 있지 않기 때문에 이들 연구에서 제시하는 동일한 어셈블리 수행 과정 및 분석 툴들을 사용하더라도 만족할만한 수준의 어셈블리 결과를 얻지 못하는 경우가 발생한다. 본 연구에서는 이러한 문제점을 해결하기 위하여 NGS 기술과 디노버 어셈블리 기술을 이용하여 아직 밝혀지지 않은 생물체의 전체 DNA의 염기 서열을 밝히기 위한 일련의 과정들을 단계별로 소개하고, 각 단계에서 필요로 하는 공개용 분석 툴의 장단점을 분석하여 제시한다. 이러한 과정별 단계를 구체적으로 설명하기 위하여 본 연구에서는 350Mbp 크기의 개 회충 게놈을 응용 사례로 사용한다. 또한 디노버 어셈블리 과정을 통해 새롭게 어셈블리된 시퀀스와 다른 유사 종과의 상동성 분석을 수행하여 어셈블리된 시퀀스에서의 유전자 영역 추출과 추출된 유전자의 기능을 예측한다.