• 제목/요약/키워드: de novo genome assembly

검색결과 35건 처리시간 0.016초

Draft genome of Semisulcospira libertina, a species of freshwater snail

  • Gim, Jeong-An;Baek, Kyung-Wan;Hah, Young-Sool;Choo, Ho Jin;Kim, Ji-Seok;Yoo, Jun-Il
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.32.1-32.10
    • /
    • 2021
  • Semisulcospira libertina, a species of freshwater snail, is widespread in East Asia. It is important as a food source. Additionally, it is a vector of clonorchiasis, paragonimiasis, metagonimiasis, and other parasites. Although S. libertina has ecological, commercial, and clinical importance, its whole-genome has not been reported yet. Here, we revealed the genome of S. libertina through de novo assembly. We assembled the whole-genome of S. libertina and determined its transcriptome for the first time using Illumina NovaSeq 6000 platform. According to the k-mer analysis, the genome size of S. libertina was estimated to be 3.04 Gb. Using RepeatMasker, a total of 53.68% of repeats were identified in the genome assembly. Genome data of S. libertina reported in this study will be useful for identification and conservation of S. libertina in East Asia.

Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

  • Lim, Jong-Sung;Choi, Beom-Soon;Lee, Jeong-Soo;Shin, Chan-Seok;Yang, Tae-Jin;Rhee, Jae-Sung;Lee, Jae-Seong;Choi, Ik-Young
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.1-8
    • /
    • 2012
  • Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the nextgeneration DNA sequencer (NGS) Roche/454 and Illumina/ Solexa systems, along with bioinformation analysis technologies of whole-genome $de$ $novo$ assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing $de$ $novo$ assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least $2{\times}$ and $30{\times}$ depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive shortlength reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a wholegenome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through $de$ $novo$ assembly in any whole-genome sequenced species. The $20{\times}$ and $50{\times}$ coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average $30{\times}$ coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

Birth of an 'Asian cool' reference genome: AK1

  • Kim, Changhoon
    • BMB Reports
    • /
    • 제49권12호
    • /
    • pp.653-654
    • /
    • 2016
  • The human reference genome, maintained by the Genome Reference Consortium, is conceivably the most complete genome assembly ever, since its first construction. It has continually been improved by incorporating corrections made to the previous assemblies, thanks to various technological advances. Many currently-ongoing population sequencing projects have been based on this reference genome, heightening hopes of the development of useful medical applications of genomic information, thanks to the recent maturation of high-throughput sequencing technologies. However, just one reference genome does not fit all the populations across the globe, because of the large diversity in genomic structures and technical limitations inherent to short read sequencing methods. The recent success in de novo construction of the highly contiguous Asian diploid genome AK1, by combining single molecule technologies with routine sequencing data without resorting to traditional clone-by-clone sequencing and physical mapping, reveals the nature of genomic structure variation by detecting thousands of novel structural variations and by finally filling in some of the prior gaps which had persistently remained in the current human reference genome. Now it is expected that the AK1 genome, soon to be paired with more upcoming de novo assembled genomes, will provide a chance to explore what it is really like to use ancestry-specific reference genomes instead of hg19/hg38 for population genomics. This is a major step towards the furthering of genetically-based precision medicine.

개 회충 게놈 응용 사례에서 공개용 분석 툴을 사용한 드래프트 게놈 어셈블리 생성 (Workflow for Building a Draft Genome Assembly using Public-domain Tools: Toxocara canis as a Case Study)

  • 원정임;공진화;허선;윤지희
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제20권9호
    • /
    • pp.513-518
    • /
    • 2014
  • NGS 기술의 발달로 시퀀싱 비용이 급격히 하락됨에 따라 대규모 크기의 유전체 염기 서열해독을 소규모의 실험실에서 수행할 수 있게 되었다. 디노버 어셈블리는 표준 유전체가 없는 새로운 종을 시퀀싱하는 경우 리드들의 염기 서열 정보를 이용하여 재구성함으로써 원래의 전체 시퀀스를 복원하는 것이다. 최근 이와 관련된 많은 연구 결과가 보고되고 있으나, 충분한 분석 노하우와 명확한 가이드라인 등이 공개되어 있지 않기 때문에 이들 연구에서 제시하는 동일한 어셈블리 수행 과정 및 분석 툴들을 사용하더라도 만족할만한 수준의 어셈블리 결과를 얻지 못하는 경우가 발생한다. 본 연구에서는 이러한 문제점을 해결하기 위하여 NGS 기술과 디노버 어셈블리 기술을 이용하여 아직 밝혀지지 않은 생물체의 전체 DNA의 염기 서열을 밝히기 위한 일련의 과정들을 단계별로 소개하고, 각 단계에서 필요로 하는 공개용 분석 툴의 장단점을 분석하여 제시한다. 이러한 과정별 단계를 구체적으로 설명하기 위하여 본 연구에서는 350Mbp 크기의 개 회충 게놈을 응용 사례로 사용한다. 또한 디노버 어셈블리 과정을 통해 새롭게 어셈블리된 시퀀스와 다른 유사 종과의 상동성 분석을 수행하여 어셈블리된 시퀀스에서의 유전자 영역 추출과 추출된 유전자의 기능을 예측한다.

K-mer Based RNA-seq Read Distribution Method For Accelerating De Novo Transcriptome Assembly

  • Kwon, Hwijun;Jung, Inuk
    • 한국컴퓨터정보학회논문지
    • /
    • 제25권8호
    • /
    • pp.1-8
    • /
    • 2020
  • 본 논문에서는 드노보 전사체 어셈블리의 수행시간을 단축하기 위해 RNA-seq 서열을 유전자계 정보를 활용하여 여러 노드로 분산이 가능한 방법을 제시한다. 제안하는 전사체 서열 데이터 분산기법의 성능을 측정하기 위해 애기장대의 리드를 4개의 데이터 셋(전체 비분류 리드, 완전 분류 리드, 모델 분류 리드, 무작위 분류 리드)으로 구성하여 실험을 수행하였다. 전체 비분류 데이터와 비교하여 생성된 유전자 콘티그(Contig)는 95% 일치하였고 동일한 리소스들을 사용하는 단일 노드에 비해 본 연구에서 제시하는 분산환경분산 환경 기반의 어셈블리 수행시간은 4.2배 단축되었다.

An Optimized Strategy for Genome Assembly of Sanger/pyrosequencing Hybrid Data using Available Software

  • Jeong, Hae-Young;Kim, Ji-Hyun F.
    • Genomics & Informatics
    • /
    • 제6권2호
    • /
    • pp.87-90
    • /
    • 2008
  • During the last four years, the pyrosequencing-based 454 platform has rapidly displaced the traditional Sanger sequencing method due to its high throughput and cost effectiveness. Meanwhile, the Sanger sequencing methodology still provides the longest reads, and paired-end sequencing that is based on that chemistry offers an opportunity to ensure accurate assembly results. In this report, we describe an optimized approach for hybrid de novo genome assembly using pyrosequencing data and varying amounts of Sanger-type reads. 454 platform-derived contigs can be used as single non-breakable virtual reads or converted to simpler contigs that consist of editable, overlapping pseudoreads. These modified contigs maintain their integrity at the first jumpstarting assembly stage and are edited by fragmenting and rejoining. Pre-existing assembly software then can be applied for mixed assembly with 454-derived data and Sanger reads. An effective method for identifying genomic differences between reference and sample sequences in whole-genome resequencing procedures also is suggested.

Status of Philippine Mango Genomics: Enriching Molecular Genomics Towards a Globally Competitive Philippine Mango Industry

  • Eureka Teresa M. Ocampo;Cris Q. Cortaga;Jhun Laurence S. Rasco;John Albert P. Lachica;Darlon V. Lantican
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.28-28
    • /
    • 2022
  • This paper presents the first genome assemblies of Philippine mangoes that provide valuable reference for varietal improvement and genomic studies on mango and related fruit crops. WE sequenced whole genomes of3 species, Mangifera odorata (Huani), Mangifera altissima (Paho), and Mangifera indica 'Carabao' (Sweet Elena). 'Carabao' is the major export variety of the Philippines; Paho is identified as vulnerable by the IUCN Red List of Threatened Species; Huani has fruit sap acrid which is the primary defense mechanism against insects and birds. We used Falcon, a diploid aware -de novo assembler to assemble SMRT generated long-read sequences. Falcon-unzip was employed to phase the output assembly producing larger contig sets (primary contigs) and shorter contigs corresponding to haplotypes (haplotigs). Assembly statistics were generated by comparing the assembly to a reference genome, Tommy Atkins, using Quality Assessment Tool (QUAST). Moreover, the extent of duplication and completeness of gene content was measured using Benchmarking Universal Single-Copy Orthologs (BUSCO). Draft assemblies with high duplications were processed using Purge Haplotigs and Purge Dups to lessen duplications with minimal impact on genome completeness. De novo assemblies of Huani, Paho and 'Carabao' were then generated with primary contig sizes of 463.64 Mb, 508.95 Mb and 401.51 Mb respectively. These draft assemblies of Huani, Paho and 'Carabao' showed 96.90%, 95.17% and 99.07% complete BUSCOs respectively which is comparable to 'Tommy Atkins' genome (98.6%). Using two mango transcriptome data (pooled RNA-seq from different mango varieties and tissues), 91-96% or 24-30 million reads were successfully mapped back for each generated assembly indicating high degree of completeness. The results obtained demonstrated the highly contiguous, phased, and near complete genome assembly of three Philippine mango species for structural and functional annotation of gene units, especially those with economic importance.

  • PDF

De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

  • Jo, Yeonhwa;Choi, Hoseong;Bae, Miah;Kim, Sang-Min;Kim, Sun-Lim;Lee, Bong Choon;Cho, Won Kyong;Kim, Kook-Hyung
    • The Plant Pathology Journal
    • /
    • 제33권5호
    • /
    • pp.478-487
    • /
    • 2017
  • Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV), infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs) for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

Draft Genome of Toxocara canis, a Pathogen Responsible for Visceral Larva Migrans

  • Kong, Jinhwa;Won, Jungim;Yoon, Jeehee;Lee, UnJoo;Kim, Jong-Il;Huh, Sun
    • Parasites, Hosts and Diseases
    • /
    • 제54권6호
    • /
    • pp.751-758
    • /
    • 2016
  • This study aimed at constructing a draft genome of the adult female worm Toxocara canis using next-generation sequencing (NGS) and de novo assembly, as well as to find new genes after annotation using functional genomics tools. Using an NGS machine, we produced DNA read data of T. canis. The de novo assembly of the read data was performed using SOAPdenovo. RNA read data were assembled using Trinity. Structural annotation, homology search, functional annotation, classification of protein domains, and KEGG pathway analysis were carried out. Besides them, recently developed tools such as MAKER, PASA, Evidence Modeler, and Blast2GO were used. The scaffold DNA was obtained, the N50 was 108,950 bp, and the overall length was 341,776,187 bp. The N50 of the transcriptome was 940 bp, and its length was 53,046,952 bp. The GC content of the entire genome was 39.3%. The total number of genes was 20,178, and the total number of protein sequences was 22,358. Of the 22,358 protein sequences, 4,992 were newly observed in T. canis. Following proteins previously unknown were found: E3 ubiquitin-protein ligase cbl-b and antigen T-cell receptor, zeta chain for T-cell and B-cell regulation; endoprotease bli-4 for cuticle metabolism; mucin 12Ea and polymorphic mucin variant C6/1/40r2.1 for mucin production; tropomodulin-family protein and ryanodine receptor calcium release channels for muscle movement. We were able to find new hypothetical polypeptides sequences unique to T. canis, and the findings of this study are capable of serving as a basis for extending our biological understanding of T. canis.

Whole Genome Sequencing of Two Musa Species Towards Disease Resistance and Fiber Quality Improvement

  • John Ivan Pasquil;Richellen Plaza;Roneil Christian Alonday;Damsel Bangcal;Julianne Villela;Antonio, Lalusin;Maria Genaleen Diaz;Antonio Laurena
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.32-32
    • /
    • 2022
  • Abaca (Musa textilis L. Nee) is a native Musa species from the Philippines known for its natural fiber. Abaca fiber a.k.a. Manila hemp extracted from its pseudostems is considered one of the strongest fibers in the world. This is used for commodities such as ropes, papers, and money bills. Abaca is vulnerable to pests and diseases such as the Abaca Bunchy Top Disease (ABTD) caused by Abaca Bunchy Top Virus (ABTV) and Banana Bunchy Top Virus (BBTV). Inosa, one of the varieties of abaca utilized in the Philippines, is highly susceptible to ABTD. In contrast, Pacol (Musa balbisiana L.), a close relative of abaca, is highly resistant to the same disease. Here, we report the sequencing and de novo genome assembly of both abaca var. Inosa and banana var. Pacol. A total of ~16 Gb and ~21 Gb raw reads for Inosa and Pacol, respectively, were generated using Pacbio Hifi sequencing method and assembled with Hifiasm. High-quality de novo assemblies of both Musa species with 99% recovered as per BUSCO analysis were obtained. The assembled Inosa genome has a total length of ~654 Mb and N50 of 7 Mb while Pacol has a total length of 527 Mb and N50 of 3 Mb which are close to their estimated genome size of ~638 Mb and ~503 Mb, respectively. The information that can be derived from the de novo assembled genomes would provide a solid foundation for further research in disease resistance and fiber quality improvement in abaca.

  • PDF