• 제목/요약/키워드: De novo assembly

검색결과 55건 처리시간 0.028초

Status of Philippine Mango Genomics: Enriching Molecular Genomics Towards a Globally Competitive Philippine Mango Industry

  • Eureka Teresa M. Ocampo;Cris Q. Cortaga;Jhun Laurence S. Rasco;John Albert P. Lachica;Darlon V. Lantican
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.28-28
    • /
    • 2022
  • This paper presents the first genome assemblies of Philippine mangoes that provide valuable reference for varietal improvement and genomic studies on mango and related fruit crops. WE sequenced whole genomes of3 species, Mangifera odorata (Huani), Mangifera altissima (Paho), and Mangifera indica 'Carabao' (Sweet Elena). 'Carabao' is the major export variety of the Philippines; Paho is identified as vulnerable by the IUCN Red List of Threatened Species; Huani has fruit sap acrid which is the primary defense mechanism against insects and birds. We used Falcon, a diploid aware -de novo assembler to assemble SMRT generated long-read sequences. Falcon-unzip was employed to phase the output assembly producing larger contig sets (primary contigs) and shorter contigs corresponding to haplotypes (haplotigs). Assembly statistics were generated by comparing the assembly to a reference genome, Tommy Atkins, using Quality Assessment Tool (QUAST). Moreover, the extent of duplication and completeness of gene content was measured using Benchmarking Universal Single-Copy Orthologs (BUSCO). Draft assemblies with high duplications were processed using Purge Haplotigs and Purge Dups to lessen duplications with minimal impact on genome completeness. De novo assemblies of Huani, Paho and 'Carabao' were then generated with primary contig sizes of 463.64 Mb, 508.95 Mb and 401.51 Mb respectively. These draft assemblies of Huani, Paho and 'Carabao' showed 96.90%, 95.17% and 99.07% complete BUSCOs respectively which is comparable to 'Tommy Atkins' genome (98.6%). Using two mango transcriptome data (pooled RNA-seq from different mango varieties and tissues), 91-96% or 24-30 million reads were successfully mapped back for each generated assembly indicating high degree of completeness. The results obtained demonstrated the highly contiguous, phased, and near complete genome assembly of three Philippine mango species for structural and functional annotation of gene units, especially those with economic importance.

  • PDF

Ab ovo or de novo? Mechanisms of Centriole Duplication

  • Loncarek, Jadranka;Khodjakov, Alexey
    • Molecules and Cells
    • /
    • 제27권2호
    • /
    • pp.135-142
    • /
    • 2009
  • The centrosome, an organelle comprising centrioles and associated pericentriolar material, is the major microtubule organizing center in animal cells. For the cell to form a bipolar mitotic spindle and ensure proper chromosome segregation at the end of each cell cycle, it is paramount that the cell contains two and only two centrosomes. Because the number of centrosomes in the cell is determined by the number of centrioles, cells have evolved elaborate mechanisms to control centriole biogenesis and to tightly coordinate this process with DNA replication. Here we review key proteins involved in centriole assembly, compare two major modes of centriole biogenesis, and discuss the mechanisms that ensure stringency of centriole number.

An Optimized Strategy for Genome Assembly of Sanger/pyrosequencing Hybrid Data using Available Software

  • Jeong, Hae-Young;Kim, Ji-Hyun F.
    • Genomics & Informatics
    • /
    • 제6권2호
    • /
    • pp.87-90
    • /
    • 2008
  • During the last four years, the pyrosequencing-based 454 platform has rapidly displaced the traditional Sanger sequencing method due to its high throughput and cost effectiveness. Meanwhile, the Sanger sequencing methodology still provides the longest reads, and paired-end sequencing that is based on that chemistry offers an opportunity to ensure accurate assembly results. In this report, we describe an optimized approach for hybrid de novo genome assembly using pyrosequencing data and varying amounts of Sanger-type reads. 454 platform-derived contigs can be used as single non-breakable virtual reads or converted to simpler contigs that consist of editable, overlapping pseudoreads. These modified contigs maintain their integrity at the first jumpstarting assembly stage and are edited by fragmenting and rejoining. Pre-existing assembly software then can be applied for mixed assembly with 454-derived data and Sanger reads. An effective method for identifying genomic differences between reference and sample sequences in whole-genome resequencing procedures also is suggested.

De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

  • Jo, Yeonhwa;Choi, Hoseong;Bae, Miah;Kim, Sang-Min;Kim, Sun-Lim;Lee, Bong Choon;Cho, Won Kyong;Kim, Kook-Hyung
    • The Plant Pathology Journal
    • /
    • 제33권5호
    • /
    • pp.478-487
    • /
    • 2017
  • Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV), infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs) for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

De Novo Assembly and Comparative Analysis of the Enterococcus faecalis Genome (KACC 91532) from a Korean Neonate

  • Ham, Jun Sang;Kwak, Woori;Chang, Oun Ki;Han, Gi Sung;Jeong, Seok Geun;Seol, Kuk Hwan;Kim, Hyoun Wook;Kang, Geun Ho;Park, Beom Young;Lee, Hyun-Jeong;Kim, Jong Geun;Kim, Kyu-Won;Sung, Samsun;Lee, Taeheon;Cho, Seoae;Kim, Heebal
    • Journal of Microbiology and Biotechnology
    • /
    • 제23권7호
    • /
    • pp.966-973
    • /
    • 2013
  • Using a newly constructed de novo assembly pipeline, finished genome level assembly had been conducted for the probiotic candidate strain E. faecalis KACC 91532 isolated from a stool samples of Korean neonates. Our gene prediction identified 3,061 genes in the assembled genome of the strain. Among these, nine genes were specific only for the E. faecalis KACC 91532, compared with all of the four known reference genomes (EF62, D32, V583, OG1RF). We identified genes related to phenotypic characters and detected E. faecalis KACC 91532-specific evolutionarily accelerated genes using dN/dS analysis. From these results, we found the potential risk of KACC 91532 as a useful probiotic strain and identified some candidate genetic variations that could affect the function of enzymes.

차세대 염기서열 분석을 이용한 굴참나무(Quercus variabilis)의 microsatellite 마커 개발 및 특성 분석 (Identification and Characterization of Polymorphic Microsatellite Loci using Next Generation Sequencing in Quercus variabilis)

  • 백승훈;이제완;홍경낙;이석우;안지영;이민우
    • 한국산림과학회지
    • /
    • 제105권2호
    • /
    • pp.186-192
    • /
    • 2016
  • 본 연구는 차세대 염기서열 분석방법을 이용하여 굴참나무의 microsatellite 마커를 개발하고 특성을 분석하기 위해 수행되었다. GS-FLX Titanium 차세대 염기서열 분석 장비를 이용하여 305,771개의 read를 얻었고, 117 Mbp의 데이터를 생산하였다. De novo assembly를 통하여 7,326개의 contig를 확보하였다. 크기가 500 bp 이상이 되는 contig는 2,921개로 나타났다. 그 중 microsatellite 영역을 포함하는 contig는 606개(20.75%)로 나타났으며, 총 microsatellite의 수는 911개로 확인되었다. 그 중 13개의 microsatellite 유전자좌에서 굴참나무 개체 간 다형성이 관찰되었다. 이들 microsatellite 유전자좌에 대하여 주왕산 집단에서 관찰된 유효 대립유전자수($A_e$)는 평균 4.966(2.439~7.515)로 나타났다. 평균 이형접합도 관측치($H_o$)와 평균 이형접합도 기대치($H_e$)는 각각 0.873(0.731~1.000)과 0.766(0.590~0.867)으로 나타났다. 다형성이 관찰된 모든 microsatellite 유전자좌에서 null 대립유전자는 관찰되지 않았으며, 마커 간 연관불평형은 나타나지 않았다. 따라서 본 연구에서 개발된 13개의 microsatellite 마커는 굴참나무 집단의 유전변이 분석에 유용할 것으로 사료된다.

Whole Genome Sequencing of Two Musa Species Towards Disease Resistance and Fiber Quality Improvement

  • John Ivan Pasquil;Richellen Plaza;Roneil Christian Alonday;Damsel Bangcal;Julianne Villela;Antonio, Lalusin;Maria Genaleen Diaz;Antonio Laurena
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.32-32
    • /
    • 2022
  • Abaca (Musa textilis L. Nee) is a native Musa species from the Philippines known for its natural fiber. Abaca fiber a.k.a. Manila hemp extracted from its pseudostems is considered one of the strongest fibers in the world. This is used for commodities such as ropes, papers, and money bills. Abaca is vulnerable to pests and diseases such as the Abaca Bunchy Top Disease (ABTD) caused by Abaca Bunchy Top Virus (ABTV) and Banana Bunchy Top Virus (BBTV). Inosa, one of the varieties of abaca utilized in the Philippines, is highly susceptible to ABTD. In contrast, Pacol (Musa balbisiana L.), a close relative of abaca, is highly resistant to the same disease. Here, we report the sequencing and de novo genome assembly of both abaca var. Inosa and banana var. Pacol. A total of ~16 Gb and ~21 Gb raw reads for Inosa and Pacol, respectively, were generated using Pacbio Hifi sequencing method and assembled with Hifiasm. High-quality de novo assemblies of both Musa species with 99% recovered as per BUSCO analysis were obtained. The assembled Inosa genome has a total length of ~654 Mb and N50 of 7 Mb while Pacol has a total length of 527 Mb and N50 of 3 Mb which are close to their estimated genome size of ~638 Mb and ~503 Mb, respectively. The information that can be derived from the de novo assembled genomes would provide a solid foundation for further research in disease resistance and fiber quality improvement in abaca.

  • PDF

차세대 염기서열 분석기법과 생물정보학 (Next Generation Sequencing and Bioinformatics)

  • 김기봉
    • 생명과학회지
    • /
    • 제25권3호
    • /
    • pp.357-367
    • /
    • 2015
  • 매우 빠른 속도로 발전하고 있는 차세대 염기서열 분석 플랫폼과 최신 생물정보학적 분석도구들로 말미암아, 1,000달러 이하의 가격으로 인간 유전체 염기서열을 해독하고자 하는 궁극적인 목표가 조만간 곧 실현될 수 있을 것 같다. 차세대 염기서열 분석 분야의 급속한 기술적 진전은 NGS 데이터의 분석과 관리를 위한 통계적 방법과 생물정보학적 분석도구들에 대한 수요를 꾸준히 증대시키고 있다. NGS 플랫폼이 상용화되어 쓰이기 시작한 초창기부터, NGS 데이터를 분석하고 해석하거나, 가시화 해주는 다수의 응용프로그램이나 도구들이 개발되어 활용되어 왔다. 그러나, NGS 데이터의 엄청난 범람으로 데이터 저장, 데이터 분석 및 관리 등에 있어서 해결해야 할 많은 문제들이 부각되고 있다. NGS 데이터 분석은 단편서열과 참조서열간의 서열정렬, 염기식별, 다형성 발견, 쌍단편 서열이나 비쌍단편 서열 등을 이용한 어셈블리 작업, 구조변이 발견, 유전체 브라우징 등을 본질적으로 포함한다. 본 논문은 주요 차세대 염기서열 결정기술과 NGS 데이터 분석을 위한 생물정보학적 분석도구들에 대해 개관적으로 소개하고자 한다.

RNA-Seq De Novo Assembly and Differential Transcriptome Analysis of Korean Medicinal Herb Cirsium japonicum var. spinossimum

  • Roy, Neha Samir;Kim, Jung-A;Choi, Ah-Young;Ban, Yong-Wook;Park, Nam-Il;Park, Kyong-Cheul;Yang, Hee-sun;Choi, Ik-Young;Kim, Soonok
    • Genomics & Informatics
    • /
    • 제16권4호
    • /
    • pp.34.1-34.9
    • /
    • 2018
  • Cirsium japonicum belongs to the Asteraceae or Compositae family and is a medicinal plant in Asia that has a variety of effects, including tumour inhibition, improved immunity with flavones, and antidiabetic and hepatoprotective effects. Silymarin is synthesized by 4-coumaroyl-CoA via both the flavonoid and phenylpropanoid pathways to produce the immediate precursors taxifolin and coniferyl alcohol. Then, the oxidative radicalization of taxifolin and coniferyl alcohol produces silymarin. We identified the expression of genes related to the synthesis of silymarin in C. japonicum in three different tissues, namely, flowers, leaves, and roots, through RNA sequencing. We obtained 51,133 unigenes from transcriptome sequencing by de novo assembly using Trinity v2.1.1, TransDecoder v2.0.1, and CD-HIT v4.6 software. The differentially expressed gene analysis revealed that the expression of genes related to the flavonoid pathway was higher in the flowers, whereas the phenylpropanoid pathway was more highly expressed in the roots. In this study, we established a global transcriptome dataset for C. japonicum. The data shall not only be useful to focus more deeply on the genes related to product medicinal metabolite including flavolignan but also to study the functional genomics for genetic engineering of C. japonicum.

De novo assembly, annotation and gene expression profiles of gonads of Cytorace-3, a hybrid lineage of Drosophila nasuta nasuta and D. n. albomicans

  • Ponnanna, Koushik;DSouza, Stafny M.;Ramachandra, Nallur B.
    • Genomics & Informatics
    • /
    • 제19권1호
    • /
    • pp.8.1-8.12
    • /
    • 2021
  • Cytorace-3 is a laboratory evolved hybrid lineage of Drosophila nasuta nasuta males and Drosophila nasuta albomicans females currently passing ~850 generations. To assess interracial hybridization effects on gene expression in Cytorace-3 we profiled the transcriptomes of mature ovaries and testes by employing Illumina sequencing technology and de novo transcriptome assembling strategies. We found 26% of the ovarian, and 14% of testis genes to be differentially expressed in Cytorace-3 relative to the expressed genes in the parental gonadal transcriptomes. About 5% of genes exhibited additive gene expression pattern in the ovary and 3% in the testis, while the remaining genes were misexpressed in Cytorace-3. Nearly 772 of these misexpressed genes in the ovary and 413 in the testis were either over-or under-dominant. Genes following D. n. nasuta dominance was twice (270 genes) than D. n. albomicans dominance (133 genes) in the ovary. In contrast, only 105 genes showed D. n. nasuta dominance and 207 showed D. n. albomicans dominance in testis transcriptome. Of the six expression inheritance patterns, conserved inheritance pattern was predominant for both ovary (73%) and testis (85%) in Cytorace-3. This study is the first to provide an overview of the expression divergence and inheritance patterns of the transcriptomes in an independently evolving distinct hybrid lineage of Drosophila. This recorded expression divergence in Cytorace-3 surpasses that between parental lineages illustrating the strong impact of hybridization driving rapid gene expression changes.