• 제목/요약/키워드: Transcriptome assembly

검색결과 39건 처리시간 0.034초

K-mer Based RNA-seq Read Distribution Method For Accelerating De Novo Transcriptome Assembly

  • Kwon, Hwijun;Jung, Inuk
    • 한국컴퓨터정보학회논문지
    • /
    • 제25권8호
    • /
    • pp.1-8
    • /
    • 2020
  • 본 논문에서는 드노보 전사체 어셈블리의 수행시간을 단축하기 위해 RNA-seq 서열을 유전자계 정보를 활용하여 여러 노드로 분산이 가능한 방법을 제시한다. 제안하는 전사체 서열 데이터 분산기법의 성능을 측정하기 위해 애기장대의 리드를 4개의 데이터 셋(전체 비분류 리드, 완전 분류 리드, 모델 분류 리드, 무작위 분류 리드)으로 구성하여 실험을 수행하였다. 전체 비분류 데이터와 비교하여 생성된 유전자 콘티그(Contig)는 95% 일치하였고 동일한 리소스들을 사용하는 단일 노드에 비해 본 연구에서 제시하는 분산환경분산 환경 기반의 어셈블리 수행시간은 4.2배 단축되었다.

De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

  • Jo, Yeonhwa;Choi, Hoseong;Bae, Miah;Kim, Sang-Min;Kim, Sun-Lim;Lee, Bong Choon;Cho, Won Kyong;Kim, Kook-Hyung
    • The Plant Pathology Journal
    • /
    • 제33권5호
    • /
    • pp.478-487
    • /
    • 2017
  • Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV), infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs) for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

Application of Pac-Bio Sequencing, Trinity, and rnaSPAdes Assembly for Transcriptome Analysis in Medicinal Crop Astragalus membranaceus

  • Ji-Nam Kang;Si Myung Lee
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.254-254
    • /
    • 2022
  • Astragalus membranaceus (A. membranaceus) has traditionally been used as a medicinal plant in East Asia for the treatment ofvarious diseases. A. membranaceus belongs to the legume family and is known to be rich in substances such as flavonoids and saponins. Recent pharmacological studies of A. membranaceus have shown that the plant has immunomodulatory, anti-oxidant, anti-cancer, and anti-inflammatory effects. However, knowledge of major biosynthetic pathways in A. membranaceu is still lacking. Recently developed sequencing techniques enable high-quality transcriptome analysis in plants, which is recognized as an important part in elucidating the regulatory mechanisms of many plant secondary metabolic pathways. However, it is difficult to predict the number of transcripts because plant transcripts contain a large number of isoforms due to alternative splicing events, which can vary depending on the assembly platform used. In this study, we constructed three unigene sets using Pac-Bio isoform sequencing, Trinity and rnaSPAdes assembly for detailed transcriptome analysis mA. membranaceus. Furthermore, all genes involved in the flavonoid biosynthetic pathway were searched from three unigene sets, and structural comparisons and expression profiles between these genes were analyzed. The isoflavone synthesis was active in most tissues. Flavonol synthesis was mainly active in leaves and flowers, and anthocyanin synthesis was specific in flowers. Gene structural analysis revealed structural differences in the flavonoid-related genes derived from the three unigene sets. This study suggests the need for the application of multiple unigene sets for the analysis of key biosynthetic pathways in plants.

  • PDF

De novo genome assembly and single nucleotide variations for Soybean yellow common mosaic virus using soybean flower bud transcriptome data

  • Jo, Yeonhwa;Choi, Hoseong;Kim, Sang-Min;Lee, Bong Choon;Cho, Won Kyong
    • Journal of Applied Biological Chemistry
    • /
    • 제63권3호
    • /
    • pp.189-195
    • /
    • 2020
  • The soybean (Glycine max L.), also known as the soya bean, is an economically important legume species. Pathogens are always major threats for soybean cultivation. Several pathogens negatively affect soybean production. The soybean is also known as a susceptible host to many viruses. Recently, we carried out systematic analyses to identify viruses infecting soybeans using soybean transcriptome data. Our screening results showed that only few soybean transcriptomes contained virus-associated sequences. In this study, we further carried out bioinformatics analyses using a soybean flower bud transcriptome for virus identification, genome assembly, and single nucleotide variations (SNVs). We assembled the genome of Soybean yellow common mosaic virus (SYCMV) isolate China and revealed two SNVs. Phylogenetic analyses using three viral proteins suggested that SYCMV isolate China is closely related to SYCMV isolates from South Korea. Furthermore, we found that replication and mutation of SYCMV is relatively low, which might be associated with flower bud tissue. The most interesting finding was that SYCMV was not detected in the cytoplasmic male sterility (CMS) line derived from the non-CMS line that was severely infected by SYCMV. In summary, in silico analyses identified SYCMV from the soybean flower bud transcriptome, and a nearly complete genome of SYCMV was successfully assembled. Our results suggest that the low level of virus replication and mutation for SYCMV might be associated with plant tissues. Moreover, we provide the first evidence that male sterility might be used to eliminate viruses in crop plants.

A Study on Transcriptome Analysis Using de novo RNA-sequencing to Compare Ginseng Roots Cultivated in Different Environments

  • Yang, Byung Wook
    • 한국자원식물학회:학술대회논문집
    • /
    • 한국자원식물학회 2018년도 춘계학술발표회
    • /
    • pp.5-5
    • /
    • 2018
  • Ginseng (Panax ginseng C.A. Meyer), one of the most widely used medicinal plants in traditional oriental medicine, is used for the treatment of various diseases. It has been classified according to its cultivation environment, such as field cultivated ginseng (FCG) and mountain cultivated ginseng (MCG). However, little is known about differences in gene expression in ginseng roots between field cultivated and mountain cultivated ginseng. In order to investigate the whole transcriptome landscape of ginseng, we employed High-Throughput sequencing technologies using the Illumina HiSeqTM2500 system, and generated a large amount of sequenced transcriptome from ginseng roots. Approximately 77 million and 87 million high-quality reads were produced in the FCG and MCG roots transcriptome analyses, respectively, and we obtained 256,032 assembled unigenes with an average length of 1,171 bp by de novo assembly methods. Functional annotations of the unigenes were performed using sequence similarity comparisons against the following databases: the non-redundant nucleotide database, the InterPro domains database, the Gene Ontology Consortium database, and the Kyoto Encyclopedia of Genes and Genomes pathway database. A total of 4,207 unigenes were assigned to specific metabolic pathways, and all of the known enzymes involved in starch and sucrose metabolism pathways were also identified in the KEGG library. This study indicated that alpha-glucan phosphorylase 1, putative pectinesterase/pectinesterase inhibitor 17, beta-amylase, and alpha-glucan phosphorylase isozyme H might be important factors involved in starch and sucrose metabolism between FCG and MCG in different environments.

  • PDF

RNA-Seq De Novo Assembly and Differential Transcriptome Analysis of Korean Medicinal Herb Cirsium japonicum var. spinossimum

  • Roy, Neha Samir;Kim, Jung-A;Choi, Ah-Young;Ban, Yong-Wook;Park, Nam-Il;Park, Kyong-Cheul;Yang, Hee-sun;Choi, Ik-Young;Kim, Soonok
    • Genomics & Informatics
    • /
    • 제16권4호
    • /
    • pp.34.1-34.9
    • /
    • 2018
  • Cirsium japonicum belongs to the Asteraceae or Compositae family and is a medicinal plant in Asia that has a variety of effects, including tumour inhibition, improved immunity with flavones, and antidiabetic and hepatoprotective effects. Silymarin is synthesized by 4-coumaroyl-CoA via both the flavonoid and phenylpropanoid pathways to produce the immediate precursors taxifolin and coniferyl alcohol. Then, the oxidative radicalization of taxifolin and coniferyl alcohol produces silymarin. We identified the expression of genes related to the synthesis of silymarin in C. japonicum in three different tissues, namely, flowers, leaves, and roots, through RNA sequencing. We obtained 51,133 unigenes from transcriptome sequencing by de novo assembly using Trinity v2.1.1, TransDecoder v2.0.1, and CD-HIT v4.6 software. The differentially expressed gene analysis revealed that the expression of genes related to the flavonoid pathway was higher in the flowers, whereas the phenylpropanoid pathway was more highly expressed in the roots. In this study, we established a global transcriptome dataset for C. japonicum. The data shall not only be useful to focus more deeply on the genes related to product medicinal metabolite including flavolignan but also to study the functional genomics for genetic engineering of C. japonicum.

Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

  • Lim, Jong-Sung;Choi, Beom-Soon;Lee, Jeong-Soo;Shin, Chan-Seok;Yang, Tae-Jin;Rhee, Jae-Sung;Lee, Jae-Seong;Choi, Ik-Young
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.1-8
    • /
    • 2012
  • Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the nextgeneration DNA sequencer (NGS) Roche/454 and Illumina/ Solexa systems, along with bioinformation analysis technologies of whole-genome $de$ $novo$ assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing $de$ $novo$ assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least $2{\times}$ and $30{\times}$ depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive shortlength reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a wholegenome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through $de$ $novo$ assembly in any whole-genome sequenced species. The $20{\times}$ and $50{\times}$ coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average $30{\times}$ coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

De novo gene set assembly of the transcriptome of diploid, oilseed-crop species Perilla citriodora

  • Kim, Ji-Eun;Choe, Junkyoung;Lee, Woo Kyung;Kim, Sangmi;Lee, Myoung Hee;Kim, Tae-Ho;Jo, Sung-Hwan;Lee, Jeong Hee
    • Journal of Plant Biotechnology
    • /
    • 제43권3호
    • /
    • pp.293-301
    • /
    • 2016
  • High-quality gene sets are necessary for functional research of genes. Although Perilla is a commonly cultivated oil crop and vegetable crop in Southeast Asia, the quality of its available gene set is insufficient. To construct a high-quality Perilla gene set, we sequenced mRNAs extracted from different tissues of Perilla citriodora, the wild species (2n = 20) of Perilla. To make a high-quality gene set for P. citriodora, we compared the quality of assemblies produced by Velvet and Trinity, the two well-known de novo assemblers, and improved the de novo assembly pipeline by optimizing k-mers and removing redundant sequences. We then selected representative transcripts for loci according to several criteria. The improved assembly yielded a total of 86,396 transcripts and 38,413 representative transcripts. We evaluated the assembled transcripts by comparing them to 638 homologous Arabidopsis genes involved in fatty acid and TAG biosynthesis pathways. High proportions of full-length genes and transcripts in the assembled transcripts matched known genes in other species, indicating that the P. citriodora gene set can be applied in future functional studies. Our study provides a reference P. citriodora gene set for further studies. It will serve as valuable genetic resource to elucidate the molecular basis of various metabolisms.

De novo transcriptome sequencing and gene expression profiling with/without B-chromosome plants of Lilium amabile

  • Park, Doori;Kim, Jong-Hwa;Kim, Nam-Soo
    • Genomics & Informatics
    • /
    • 제17권3호
    • /
    • pp.27.1-27.9
    • /
    • 2019
  • Supernumerary B chromosomes were found in Lilium amabile (2n = 2x = 24), an endemic Korean lily that grows in the wild throughout the Korean Peninsula. The extra B chromosomes do not affect the host-plant morphology; therefore, whole transcriptome analysis was performed in 0B and 1B plants to identify differentially expressed genes. A total of 154,810 transcripts were obtained from over 10 Gbp data by de novo assembly. By mapping the raw reads to the de novo transcripts, we identified 7,852 differentially expressed genes (log2FC > |10|), in which 4,059 and 3,794 were up-and down-regulated, respectively, in 1B plants compared to 0B plants. Functional enrichment analysis revealed that various differentially expressed genes were involved in cellular processes including the cell cycle, chromosome breakage and repair, and microtubule formation; all of which may be related to the occurrence and maintenance of B chromosomes. Our data provide insight into transcriptomic changes and evolution of plant B chromosomes and deliver an informative database for future study of B chromosome transcriptomes in the Korean lily.

Draft genome of Semisulcospira libertina, a species of freshwater snail

  • Gim, Jeong-An;Baek, Kyung-Wan;Hah, Young-Sool;Choo, Ho Jin;Kim, Ji-Seok;Yoo, Jun-Il
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.32.1-32.10
    • /
    • 2021
  • Semisulcospira libertina, a species of freshwater snail, is widespread in East Asia. It is important as a food source. Additionally, it is a vector of clonorchiasis, paragonimiasis, metagonimiasis, and other parasites. Although S. libertina has ecological, commercial, and clinical importance, its whole-genome has not been reported yet. Here, we revealed the genome of S. libertina through de novo assembly. We assembled the whole-genome of S. libertina and determined its transcriptome for the first time using Illumina NovaSeq 6000 platform. According to the k-mer analysis, the genome size of S. libertina was estimated to be 3.04 Gb. Using RepeatMasker, a total of 53.68% of repeats were identified in the genome assembly. Genome data of S. libertina reported in this study will be useful for identification and conservation of S. libertina in East Asia.