• 제목/요약/키워드: de novo sequence assembly

검색결과 18건 처리시간 0.026초

Genome Sequencing and Genome-Wide Identification of Carbohydrate-Active Enzymes (CAZymes) in the White Rot Fungus Flammulina fennae

  • Lee, Chang-Soo;Kong, Won-Sik;Park, Young-Jin
    • 한국미생물·생명공학회지
    • /
    • 제46권3호
    • /
    • pp.300-312
    • /
    • 2018
  • Whole-genome sequencing of the wood-rotting fungus, Flammulina fennae, was carried out to identify carbohydrate-active enzymes (CAZymes). De novo genome assembly (31 kmer) of short reads by next-generation sequencing revealed a total genome length of 32,423,623 base pairs (39% GC). A total of 11,591 gene models in the assembled genome sequence of F. fennae were predicted by ab initio gene prediction using the AUGUSTUS tool. In a genome-wide comparison, 6,715 orthologous groups shared at least one gene with F. fennae and 10,667 (92%) of 11,591 genes for F. fennae proteins had orthologs among the Dikarya. Additionally, F. fennae contained 23 species-specific genes, of which 16 were paralogous. CAZyme identification and annotation revealed 513 CAZymes, including 82 auxiliary activities, 220 glycoside hydrolases, 85 glycosyltransferases, 20 polysaccharide lyases, 57 carbohydrate esterases, and 45 carbohydrate binding-modules in the F. fennae genome. The genome information of F. fennae increases the understanding of this basidiomycete fungus. CAZyme gene information will be useful for detailed studies of lignocellulosic biomass degradation for biotechnological and industrial applications.

NGS 데이터를 이용한 대용량 게놈의 디노버 어셈블리 (De novo assembly of a large volume of genome using NGS data)

  • 원정임;홍상균;공진화;허선;윤지희
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2012년도 한국컴퓨터종합학술대회논문집 Vol.39 No.1(C)
    • /
    • pp.25-27
    • /
    • 2012
  • 디노버 어셈블리는 레퍼런스 시퀀스 없이 리드의 염기 서열 정보를 이용하여 원래의 전체 시퀀스(original sequence)로 추정되는 시퀀스로 리드들을 재구성하는 방식이다. 최근의 NGS(Next Generation Sequencing) 기술은 대용량 리드를 훨씬 쉽게 저비용으로 생성할 수 있다는 장점이 있어, 이를 이용한 많은 연구가 이루어지고 있다. 그러나 NGS 리드 데이터를 이용한 디노버 어셈블리에 관한 연구는 국내외적으로 매우 미흡한 실정이다. 그 이유는 NGS 리드 데이터를 이용하여 디노버 어셈블리를 수행하는 경우 대용량 데이터, 복잡한 데이터 구조 및 처리 과정 등으로 인하여 매우 많은 시간과 공간이 소요될 뿐만 아니라 아직까지 다양한 분석 툴과 노하우 등이 충분히 개발되어 있지 않기 때문이다. 본 연구에서는 NGS 리드 데이터를 이용한 어셈블리의 실효성과 정확성을 검증한다. 또한 디노버 어셈블리의 처리 시간 및 공간 오버헤드를 해결하기 위하여 유사 종과의 리드 정렬을 활용하는 방안을 제안한다.

The comparative gene expression concern to the seed pigmentation in maize (Zea mays L.)

  • Sa, Kyu Jin;Choi, Ik-Young;Lee, Ju Kyong
    • Genomics & Informatics
    • /
    • 제18권3호
    • /
    • pp.29.1-29.11
    • /
    • 2020
  • Maize seed pigmentation is one of the important issue to develop maize seed breeding. The differently gene expression was characterized and compared for three inbred lines, such as the pigment accumulated seed (CM22) and non-pigmented seed (CM5 and CM19) at 10 days after pollination. We obtained a total of 63,870, 82,496, and 54,555 contigs by de novo assembly to identify gene expression in the CM22, CM5, and CM19, respectably. In differentially expressed gene analysis, it was revealed that 7,044 genes were differentially expressed by at least two-fold, with 4,067 upregulated in colored maize inbred lines and 2,977 upregulated in colorless maize inbred lines. Of them,18 genes were included to the anthocyanin biosynthesis pathways, while 15 genes were upregulated in both CM22/5 and CM22/19. Additionally, 37 genes were detected in the metabolic pathway concern to the seed pigmentation by BINs analysis using MAPMAN software. Finally, these differently expressed genes may aid in the research on seed pigmentation in maize breeding programs.

Analysis of the chloroplast genome and SNP detection in a salt tolerant breeding line in Korean ginseng

  • Jo, Ick-Hyun;Bang, Kyong-Hwan;Hong, Chi Eun;Kim, Jang-Uk;Lee, Jung-Woo;Kim, Dong-Hwi;Hyun, Dong-Yun;Ryu, Hojin;Kim, Young-Chang
    • Journal of Plant Biotechnology
    • /
    • 제43권4호
    • /
    • pp.417-421
    • /
    • 2016
  • The complete chloroplast genome sequence of Panax ginseng breeding line 'G07006', showing higher salt tolerance, was confirmed by de novo assembly using whole genome next-generation sequences. The complete chloroplast (CP) genome size is 156,356 bp, including two inverted repeats (IRs) of 52,060 bp, separated by the large single-copy (LSC 86,174 bp) and the small single-copy (SSC 18,122 bp) regions. One hundred fourteen genes were annotated, including 80 protein-coding genes, 30 tRNA genes, and 4 rRNA genes. Among them, 18 sites were duplicated in the inverted repeat regions. By comparative analyses of the previously identified CP genome sequences of nine cultivars of P. ginseng and that of G07006, five useful SNPs were defined in this study. Since three of the five SNPs were cultivar-specific to Chunpoong and Sunhyang, they could be easily used for distinguishing from other ginseng accessions. However, on arranging SNPs according to their gene location, the G07006 genotype was 'GTGGA', which was distinct from other accessions. This complete chloroplast DNA sequence could be conducive to discrimination of the line G07006 (salt-tolerant) and further enhancement of the genetic improvement program for this important medicinal plant.

돼지생식기호흡기증후군바이러스(PRRSV)의 전장 유전체 염기서열(whole-genome sequencing) 분석을 위한 차세대 염기서열 분석법의 활용 (Application of next generation sequencing (NGS) system for whole-genome sequencing of porcine reproductive and respiratory syndrome virus (PRRSV))

  • 문성현;아미나 카툰;김원일;후세인 엠디 묵터;오연수;조호성
    • 한국동물위생학회지
    • /
    • 제39권1호
    • /
    • pp.41-49
    • /
    • 2016
  • In the present study, fast and robust methods for the next generation sequencing (NGS) were developed for analysis of PRRSV full genome sequences, which is a positive sensed RNA virus with a high degree of genetic variability among isolates. Two strains of PRRSVs (VR2332 and VR2332-R) which have been maintained in our laboratory were used to validate our methods and to compare with the sequence registered in GenBank (GenBank accession no. EF536003). The results suggested that both of strains had 100% coverage with the reference; the VR2332 had the coverage depth from minimum 3 to maximum 23,012, for the VR2332-R from minimum 3 to maximum 41,348, and 22,712 as an average depth. Genomic data produced from the massive sequencing capacities of the NGS have enabled the study of PRRSV at an unprecedented rate and details. Unlike conventional sequence methods which require the knowledge of conserved regions, the NGS allows de novo assembly of the full viral genomes. Therefore, our results suggested that these methods using the NGS massively facilitate the generation of more full genome PRRSV sequences locally as well as nationally in regard of saving time and cost.

Transcriptome analysis of a medicinal plant, Pistacia chinensis

  • Choi, Ki-Young;Park, Duck Hwan;Seong, Eun-Soo;Lee, Sang Woo;Hang, Jin;Yi, Li Wan;Kim, Jong-Hwa;Na, Jong-Kuk
    • Journal of Plant Biotechnology
    • /
    • 제46권4호
    • /
    • pp.274-281
    • /
    • 2019
  • Pistacia chinensis Bunge has not only been used as a medicinal plant to treat various illnesses but its young shoots and leaves have also been used as vegetables. In addition, P. chinensis is used as a rootstock for Pistacia vera (pistachio). Here, the transcriptome of P. chinensis was sequenced to enrich genetic resources and identify secondary metabolite biosynthetic pathways using Illumina RNA-seq methods. De novo assembly resulted in 18,524 unigenes with an average length of 873 bp from 19 million RNA-seq reads. A Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation tool assigned KO (KEGG orthology) numbers to 6,553 (36.2%) unigenes, among which 4,061 unigenes were mapped into 391 different metabolic pathways. For terpenoid backbone and carotenoid biosynthesis pathways, 44 and 22 unigenes encode enzymes corresponding to 30 and 16 entries, respectively. Twenty-two unigenes encode proteins for 16 entries of the carotenoid biosynthesis pathway. As for the phenylpropanoid and flavonoid biosynthesis pathways, 63 and 24 unigenes were homologous to 17 and 14 entry proteins, respectively. Mining of simple sequence repeat identified 2,599 simple sequence repeats from P. chinensis unigenes. The results of the present study provide a valuable resource for in-depth studies on comparative and functional genomics to unravel the underlying mechanisms of the medicinal properties of Pistacia L.

Comparative Genomic Analysis of Lactobacillus plantarum GB-LP1 Isolated from Traditional Korean Fermented Food

  • Yu, Jihyun;Ahn, Sojin;Kim, Kwondo;Caetano-Anolles, Kelsey;Lee, Chanho;Kang, Jungsun;Cho, Kyungjin;Yoon, Sook Hee;Kang, Dae-Kyung;Kim, Heebal
    • Journal of Microbiology and Biotechnology
    • /
    • 제27권8호
    • /
    • pp.1419-1427
    • /
    • 2017
  • As probiotics play an important role in maintaining a healthy gut flora environment through antitoxin activity and inhibition of pathogen colonization, they have been of interest to the medical research community for quite some time now. Probiotic bacteria such as Lactobacillus plantarum, which can be found in fermented food, are of particular interest given their easy accessibility. We performed whole-genome sequencing and genomic analysis on a GB-LP1 strain of L. plantarum isolated from Korean traditional fermented food; this strain is well known for its functions in immune response, suppression of pathogen growth, and antitoxin effects. The complete genome sequence of GB-LP1 is a single chromosome of 3,040,388 bp with 2,899 predicted open reading frames. Genomic analysis of GB-LP1 revealed two CRISPR regions and genes showing accelerated evolution, which may have antibiotic and antitoxin functions. The aim of the present study was to predict strain specific-genomic characteristics and assess the potential of this new strain as lactic acid bacteria at the genomic level using in silico analysis. These results provide insight into the L. plantarum species as well as confirm the possibility of its utility as a candidate probiotic.

Blood transcriptome resources of chinstrap (Pygoscelis antarcticus) and gentoo (Pygoscelis papua) penguins from the South Shetland Islands, Antarctica

  • Kim, Bo-Mi;Jeong, Jihye;Jo, Euna;Ahn, Do-Hwan;Kim, Jeong-Hoon;Rhee, Jae-Sung;Park, Hyun
    • Genomics & Informatics
    • /
    • 제17권1호
    • /
    • pp.5.1-5.9
    • /
    • 2019
  • The chinstrap (Pygoscelis antarcticus) and gentoo (P. papua) penguins are distributed throughout Antarctica and the sub-Antarctic islands. In this study, high-quality de novo assemblies of blood transcriptomes from these penguins were generated using the Illumina MiSeq platform. A total of 22.2 and 21.8 raw reads were obtained from chinstrap and gentoo penguins, respectively. These reads were assembled using the Oases assembly platform and resulted in 26,036 and 21,854 contigs with N50 values of 929 and 933 base pairs, respectively. Functional gene annotations through pathway analyses of the Gene Ontology, EuKaryotic Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases were performed for each blood transcriptome, resulting in a similar compositional order between the two transcriptomes. Ortholog comparisons with previously published transcriptomes from the $Ad{\acute{e}}lie$ (P. adeliae) and emperor (Aptenodytes forsteri) penguins revealed that a high proportion of the four penguins' transcriptomes had significant sequence homology. Because blood and tissues of penguins have been used to monitor pollution in Antarctica, immune parameters in blood could be important indicators for understanding the health status of penguins and other Antarctic animals. In the blood transcriptomes, KEGG analyses detected many essential genes involved in the major innate immunity pathways, which are key metabolic pathways for maintaining homeostasis against exogenous infections or toxins. Blood transcriptome studies such as this may be useful for checking the immune and health status of penguins without sacrifice.