• Title/Summary/Keyword: De novo assembly

Search Result 55, Processing Time 0.027 seconds

Status of Philippine Mango Genomics: Enriching Molecular Genomics Towards a Globally Competitive Philippine Mango Industry

  • Eureka Teresa M. Ocampo;Cris Q. Cortaga;Jhun Laurence S. Rasco;John Albert P. Lachica;Darlon V. Lantican
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.28-28
    • /
    • 2022
  • This paper presents the first genome assemblies of Philippine mangoes that provide valuable reference for varietal improvement and genomic studies on mango and related fruit crops. WE sequenced whole genomes of3 species, Mangifera odorata (Huani), Mangifera altissima (Paho), and Mangifera indica 'Carabao' (Sweet Elena). 'Carabao' is the major export variety of the Philippines; Paho is identified as vulnerable by the IUCN Red List of Threatened Species; Huani has fruit sap acrid which is the primary defense mechanism against insects and birds. We used Falcon, a diploid aware -de novo assembler to assemble SMRT generated long-read sequences. Falcon-unzip was employed to phase the output assembly producing larger contig sets (primary contigs) and shorter contigs corresponding to haplotypes (haplotigs). Assembly statistics were generated by comparing the assembly to a reference genome, Tommy Atkins, using Quality Assessment Tool (QUAST). Moreover, the extent of duplication and completeness of gene content was measured using Benchmarking Universal Single-Copy Orthologs (BUSCO). Draft assemblies with high duplications were processed using Purge Haplotigs and Purge Dups to lessen duplications with minimal impact on genome completeness. De novo assemblies of Huani, Paho and 'Carabao' were then generated with primary contig sizes of 463.64 Mb, 508.95 Mb and 401.51 Mb respectively. These draft assemblies of Huani, Paho and 'Carabao' showed 96.90%, 95.17% and 99.07% complete BUSCOs respectively which is comparable to 'Tommy Atkins' genome (98.6%). Using two mango transcriptome data (pooled RNA-seq from different mango varieties and tissues), 91-96% or 24-30 million reads were successfully mapped back for each generated assembly indicating high degree of completeness. The results obtained demonstrated the highly contiguous, phased, and near complete genome assembly of three Philippine mango species for structural and functional annotation of gene units, especially those with economic importance.

  • PDF

Ab ovo or de novo? Mechanisms of Centriole Duplication

  • Loncarek, Jadranka;Khodjakov, Alexey
    • Molecules and Cells
    • /
    • v.27 no.2
    • /
    • pp.135-142
    • /
    • 2009
  • The centrosome, an organelle comprising centrioles and associated pericentriolar material, is the major microtubule organizing center in animal cells. For the cell to form a bipolar mitotic spindle and ensure proper chromosome segregation at the end of each cell cycle, it is paramount that the cell contains two and only two centrosomes. Because the number of centrosomes in the cell is determined by the number of centrioles, cells have evolved elaborate mechanisms to control centriole biogenesis and to tightly coordinate this process with DNA replication. Here we review key proteins involved in centriole assembly, compare two major modes of centriole biogenesis, and discuss the mechanisms that ensure stringency of centriole number.

An Optimized Strategy for Genome Assembly of Sanger/pyrosequencing Hybrid Data using Available Software

  • Jeong, Hae-Young;Kim, Ji-Hyun F.
    • Genomics & Informatics
    • /
    • v.6 no.2
    • /
    • pp.87-90
    • /
    • 2008
  • During the last four years, the pyrosequencing-based 454 platform has rapidly displaced the traditional Sanger sequencing method due to its high throughput and cost effectiveness. Meanwhile, the Sanger sequencing methodology still provides the longest reads, and paired-end sequencing that is based on that chemistry offers an opportunity to ensure accurate assembly results. In this report, we describe an optimized approach for hybrid de novo genome assembly using pyrosequencing data and varying amounts of Sanger-type reads. 454 platform-derived contigs can be used as single non-breakable virtual reads or converted to simpler contigs that consist of editable, overlapping pseudoreads. These modified contigs maintain their integrity at the first jumpstarting assembly stage and are edited by fragmenting and rejoining. Pre-existing assembly software then can be applied for mixed assembly with 454-derived data and Sanger reads. An effective method for identifying genomic differences between reference and sample sequences in whole-genome resequencing procedures also is suggested.

De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

  • Jo, Yeonhwa;Choi, Hoseong;Bae, Miah;Kim, Sang-Min;Kim, Sun-Lim;Lee, Bong Choon;Cho, Won Kyong;Kim, Kook-Hyung
    • The Plant Pathology Journal
    • /
    • v.33 no.5
    • /
    • pp.478-487
    • /
    • 2017
  • Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV), infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs) for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

De Novo Assembly and Comparative Analysis of the Enterococcus faecalis Genome (KACC 91532) from a Korean Neonate

  • Ham, Jun Sang;Kwak, Woori;Chang, Oun Ki;Han, Gi Sung;Jeong, Seok Geun;Seol, Kuk Hwan;Kim, Hyoun Wook;Kang, Geun Ho;Park, Beom Young;Lee, Hyun-Jeong;Kim, Jong Geun;Kim, Kyu-Won;Sung, Samsun;Lee, Taeheon;Cho, Seoae;Kim, Heebal
    • Journal of Microbiology and Biotechnology
    • /
    • v.23 no.7
    • /
    • pp.966-973
    • /
    • 2013
  • Using a newly constructed de novo assembly pipeline, finished genome level assembly had been conducted for the probiotic candidate strain E. faecalis KACC 91532 isolated from a stool samples of Korean neonates. Our gene prediction identified 3,061 genes in the assembled genome of the strain. Among these, nine genes were specific only for the E. faecalis KACC 91532, compared with all of the four known reference genomes (EF62, D32, V583, OG1RF). We identified genes related to phenotypic characters and detected E. faecalis KACC 91532-specific evolutionarily accelerated genes using dN/dS analysis. From these results, we found the potential risk of KACC 91532 as a useful probiotic strain and identified some candidate genetic variations that could affect the function of enzymes.

Identification and Characterization of Polymorphic Microsatellite Loci using Next Generation Sequencing in Quercus variabilis (차세대 염기서열 분석을 이용한 굴참나무(Quercus variabilis)의 microsatellite 마커 개발 및 특성 분석)

  • Baek, Seung-Hoon;Lee, Jei-Wan;Hong, Kyung-Nak;Lee, Seok-Woo;Ahn, Ji-Young;Lee, Min-Woo
    • Journal of Korean Society of Forest Science
    • /
    • v.105 no.2
    • /
    • pp.186-192
    • /
    • 2016
  • This study was conducted to develop microsatellite markers in Quercus variabilis using next generation sequencing. A total of 305,771 reads (384 bp on average) were generated on a Roche GS-FLX system, yielding 117 Mbp of sequences. The de novo assembly resulted in 7,346 contigs. A total of 606 contigs (20.75%) including 911 microsatellite loci were derived from the 2,921 contigs longer than 500 bp. A total of 180 primer sets were designed from the 911 microsatellite loci and screened in eight Q. variabilis individual trees sampled from a natural stand to obtain polymorphic loci. As a result, a total of thirteen polymorphic microsatellite loci were selected and used for estimating population genetic parameters in the 54 individual trees. The mean number of effective alleles was 4.996 ranging from 2.439 to 7.515. The observed heterozygosity and the expected heterozygosity ranged between 0.731 and 1.000 with an average of 0.873 and from 0.590 to 0.867 with an average of 0.766, respectively. Null alleles were not detected in all loci. No significant linkage disequilibrium was detected after Bonferroni correction in all loci. In the near future, these novel polymorphic microsatellite markers will be used to study population and conservation genetics of Q. variabilis of Korea in more detail.

Whole Genome Sequencing of Two Musa Species Towards Disease Resistance and Fiber Quality Improvement

  • John Ivan Pasquil;Richellen Plaza;Roneil Christian Alonday;Damsel Bangcal;Julianne Villela;Antonio, Lalusin;Maria Genaleen Diaz;Antonio Laurena
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.32-32
    • /
    • 2022
  • Abaca (Musa textilis L. Nee) is a native Musa species from the Philippines known for its natural fiber. Abaca fiber a.k.a. Manila hemp extracted from its pseudostems is considered one of the strongest fibers in the world. This is used for commodities such as ropes, papers, and money bills. Abaca is vulnerable to pests and diseases such as the Abaca Bunchy Top Disease (ABTD) caused by Abaca Bunchy Top Virus (ABTV) and Banana Bunchy Top Virus (BBTV). Inosa, one of the varieties of abaca utilized in the Philippines, is highly susceptible to ABTD. In contrast, Pacol (Musa balbisiana L.), a close relative of abaca, is highly resistant to the same disease. Here, we report the sequencing and de novo genome assembly of both abaca var. Inosa and banana var. Pacol. A total of ~16 Gb and ~21 Gb raw reads for Inosa and Pacol, respectively, were generated using Pacbio Hifi sequencing method and assembled with Hifiasm. High-quality de novo assemblies of both Musa species with 99% recovered as per BUSCO analysis were obtained. The assembled Inosa genome has a total length of ~654 Mb and N50 of 7 Mb while Pacol has a total length of 527 Mb and N50 of 3 Mb which are close to their estimated genome size of ~638 Mb and ~503 Mb, respectively. The information that can be derived from the de novo assembled genomes would provide a solid foundation for further research in disease resistance and fiber quality improvement in abaca.

  • PDF

Next Generation Sequencing and Bioinformatics (차세대 염기서열 분석기법과 생물정보학)

  • Kim, Ki-Bong
    • Journal of Life Science
    • /
    • v.25 no.3
    • /
    • pp.357-367
    • /
    • 2015
  • With the ongoing development of next-generation sequencing (NGS) platforms and advancements in the latest bioinformatics tools at an unprecedented pace, the ultimate goal of sequencing the human genome for less than $1,000 can be feasible in the near future. The rapid technological advances in NGS have brought about increasing demands for statistical methods and bioinformatics tools for the analysis and management of NGS data. Even in the early stages of the commercial availability of NGS platforms, a large number of applications or tools already existed for analyzing, interpreting, and visualizing NGS data. However, the availability of this plethora of NGS data presents a significant challenge for storage, analyses, and data management. Intrinsically, the analysis of NGS data includes the alignment of sequence reads to a reference, base-calling, and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection, and genome browsing. While the NGS technologies have allowed a massive increase in available raw sequence data, a number of new informatics challenges and difficulties must be addressed to improve the current state and fulfill the promise of genome research. This review aims to provide an overview of major NGS technologies and bioinformatics tools for NGS data analyses.

RNA-Seq De Novo Assembly and Differential Transcriptome Analysis of Korean Medicinal Herb Cirsium japonicum var. spinossimum

  • Roy, Neha Samir;Kim, Jung-A;Choi, Ah-Young;Ban, Yong-Wook;Park, Nam-Il;Park, Kyong-Cheul;Yang, Hee-sun;Choi, Ik-Young;Kim, Soonok
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.34.1-34.9
    • /
    • 2018
  • Cirsium japonicum belongs to the Asteraceae or Compositae family and is a medicinal plant in Asia that has a variety of effects, including tumour inhibition, improved immunity with flavones, and antidiabetic and hepatoprotective effects. Silymarin is synthesized by 4-coumaroyl-CoA via both the flavonoid and phenylpropanoid pathways to produce the immediate precursors taxifolin and coniferyl alcohol. Then, the oxidative radicalization of taxifolin and coniferyl alcohol produces silymarin. We identified the expression of genes related to the synthesis of silymarin in C. japonicum in three different tissues, namely, flowers, leaves, and roots, through RNA sequencing. We obtained 51,133 unigenes from transcriptome sequencing by de novo assembly using Trinity v2.1.1, TransDecoder v2.0.1, and CD-HIT v4.6 software. The differentially expressed gene analysis revealed that the expression of genes related to the flavonoid pathway was higher in the flowers, whereas the phenylpropanoid pathway was more highly expressed in the roots. In this study, we established a global transcriptome dataset for C. japonicum. The data shall not only be useful to focus more deeply on the genes related to product medicinal metabolite including flavolignan but also to study the functional genomics for genetic engineering of C. japonicum.

De novo assembly, annotation and gene expression profiles of gonads of Cytorace-3, a hybrid lineage of Drosophila nasuta nasuta and D. n. albomicans

  • Ponnanna, Koushik;DSouza, Stafny M.;Ramachandra, Nallur B.
    • Genomics & Informatics
    • /
    • v.19 no.1
    • /
    • pp.8.1-8.12
    • /
    • 2021
  • Cytorace-3 is a laboratory evolved hybrid lineage of Drosophila nasuta nasuta males and Drosophila nasuta albomicans females currently passing ~850 generations. To assess interracial hybridization effects on gene expression in Cytorace-3 we profiled the transcriptomes of mature ovaries and testes by employing Illumina sequencing technology and de novo transcriptome assembling strategies. We found 26% of the ovarian, and 14% of testis genes to be differentially expressed in Cytorace-3 relative to the expressed genes in the parental gonadal transcriptomes. About 5% of genes exhibited additive gene expression pattern in the ovary and 3% in the testis, while the remaining genes were misexpressed in Cytorace-3. Nearly 772 of these misexpressed genes in the ovary and 413 in the testis were either over-or under-dominant. Genes following D. n. nasuta dominance was twice (270 genes) than D. n. albomicans dominance (133 genes) in the ovary. In contrast, only 105 genes showed D. n. nasuta dominance and 207 showed D. n. albomicans dominance in testis transcriptome. Of the six expression inheritance patterns, conserved inheritance pattern was predominant for both ovary (73%) and testis (85%) in Cytorace-3. This study is the first to provide an overview of the expression divergence and inheritance patterns of the transcriptomes in an independently evolving distinct hybrid lineage of Drosophila. This recorded expression divergence in Cytorace-3 surpasses that between parental lineages illustrating the strong impact of hybridization driving rapid gene expression changes.