• 제목/요약/키워드: long-read sequencing

검색결과 11건 처리시간 0.022초

Storing Digital Information in Long-Read DNA

  • Ahn, TaeJin;Ban, Hamin;Park, Hyunsoo
    • Genomics & Informatics
    • /
    • 제16권4호
    • /
    • pp.30.1-30.6
    • /
    • 2018
  • There is urgent need for effective and cost-efficient data storage, as the worldwide requirement for data storage is rapidly growing. DNA has introduced a new tool for storing digital information. Recent studies have successfully stored digital information, such as text and gif animation. Previous studies tackled technical hurdles due to errors from DNA synthesis and sequencing. Studies also have focused on a strategy that makes use of 100-150-bp read sizes in both synthesis and sequencing. In this paper, we a suggest novel data encoding/decoding scheme that makes use of long-read DNA (~1,000 bp). This enables accurate recovery of stored digital information with a smaller number of reads than the previous approach. Also, this approach reduces sequencing time.

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

  • Lee, Yuna;Park, Kiejung;Koh, Insong
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.40.1-40.9
    • /
    • 2019
  • While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.

Novel High-Throughput DNA Part Characterization Technique for Synthetic Biology

  • Bak, Seong-Kun;Seong, Wonjae;Rha, Eugene;Lee, Hyewon;Kim, Seong Keun;Kwon, Kil Koang;Kim, Haseong;Lee, Seung-Goo
    • Journal of Microbiology and Biotechnology
    • /
    • 제32권8호
    • /
    • pp.1026-1033
    • /
    • 2022
  • This study presents a novel DNA part characterization technique that increases throughput by combinatorial DNA part assembly, solid plate-based quantitative fluorescence assay for phenotyping, and barcode tagging-based long-read sequencing for genotyping. We confirmed that the fluorescence intensities of colonies on plates were comparable to fluorescence at the single-cell level from a high-end, flow-cytometry device and developed a high-throughput image analysis pipeline. The barcode tagging-based long-read sequencing technique enabled rapid identification of all DNA parts and their combinations with a single sequencing experiment. Using our techniques, forty-four DNA parts (21 promoters and 23 RBSs) were successfully characterized in 72 h without any automated equipment. We anticipate that this high-throughput and easy-to-use part characterization technique will contribute to increasing part diversity and be useful for building genetic circuits and metabolic pathways in synthetic biology.

Comparison of the Performance of MiSeq and HiSeq 2500 in a Microbiome Study

  • Na, Hee Sam;Yu, Yeuni;Kim, Si Yeong;Lee, Jae-Hyung;Chung, Jin
    • 한국미생물·생명공학회지
    • /
    • 제48권4호
    • /
    • pp.574-581
    • /
    • 2020
  • Next generation sequencing is commonly used to characterize the microbiome structure. MiSeq is commonly used to analyze the microbiome due to its relatively long read length. However, recently, Illumina introduced the 250x2 chip for HiSeq 2500. The purpose of this study was to compare the performance of MiSeq and HiSeq in the context of oral microbiome samples. The MiSeq Reagent Kit V3 and the HiSeq Rapid SBS Kit V2 were used for MiSeq and HiSeq 2500 analyses, respectively. Total read count, read quality score, relative bacterial abundance, community diversity, and relative abundance correlation were analyzed. HiSeq produced significantly more read sequences and assigned taxa compared to MiSeq. Conversely, community diversity was similar in the context of MiSeq and HiSeq. However, depending on the relative abundance, the correlation between the two platforms differed. The correlation between HiSeq and MiSeq sequencing data for highly abundant taxa (> 2%), low abundant taxa (2-0.2%), and rare taxa (0.2% >) was 0.994, 0.860, and 0.416, respectively. Therefore, HiSeq 2500 may also be compatible for microbiome studies. Importantly, the HiSeq platform may allow a high-resolution massive parallel sequencing for the detection of rare taxa.

일루미나에서 제작된 TSLRH (Truseq Synthetic Long-Read Haplotyping)와 10X Genomics에서 제작된 The Chromium Genome 시퀀싱 플랫폼을 이용하여 생산된 한우(한국 재래 소)의 반수체형 페이징 및 단일염기서열변이 비교 분석 (A Comparative Analysis of the Illumina Truseq Synthetic Long-read Haplotyping Sequencing Platform versus the 10X Genomics Chromium Genome Sequencing Platform for Haplotype Phasing and the Identification of Single-nucleotide variants (SNVs) in Hanwoo (Korean Native Cattle))

  • 박원철;크리스나무티 스리칸스;박종은;신동현;고해수;임다정;조인철
    • 생명과학회지
    • /
    • 제29권1호
    • /
    • pp.1-8
    • /
    • 2019
  • 한우(한국 재래 소)에서 반수체형 페이징을 위한 고밀도 시퀀싱을 이용한 비교 분석 논문은 많지가 않다. 이런 고밀도 시퀀싱 플랫폼 중에서, 일루미나에서 서비스 하는 Truseq Synthetic Long-Read Haplotyping 시퀀싱 플랫폼(TSLRH)과 10X Genomics에서 서비스하는 The Chromium Genome 시퀀싱 플랫폼을 특별히 비교 분석하는 논문은 없다. 우리는 한우 연구소의 한우 종모우(아이디: TN1505D2184 or 27214)의 정액에서 DNA를 추출하였으며, 이 DNA로부터 각각의 시퀀싱 플랫폼을 이용하여 시퀀싱 데이터를 생산하였다. 그 후, 우리는 각각의 시퀀싱 플랫폼에 맞는 분석 방법을 이용하여 단일염기서열변이들은 찾아냈다. 그 결과, TSLRH과 10XG의 전체 리드 수는 각각 355,208,304, 1,632,772,004, 맵핑 리드의 개수는 351,992,768(99.09%), 1,526,641,824(93.50%), Q30(%)은 89.04%, 88.60%, 평균 밀도는 13.04X, 74.3X, 가장 긴 페이즈 블락은 1,982,706bp, 1,480,081 bp, N50 페이즈 블락은 57,637 bp, 114,394 bp, 전체 단일염기서열변이는 4,534,989, 8,496,813, 전체 페이징 비율은 72.29%, 87.67%였다. 더욱이, 우리는 각각의 시퀀싱 플랫폼을 비교해서 각각의 시퀀싱 플랫폼의 고유한 단일염기서열변이와 두 시퀀싱 플랫폼에서 공통적으로 존재하는 단일염기서열변이를 각 염색체 별로 확인하였으며, 단일염기서열변이의 개수는 염색체 길이에 정비례한다는 결과를 확인하였다. 결론적으로, 본 연구에서 추천하는 바는 연구비가 충분하지 않을 시에는 TSLRH 보다 10XG을 사용하는 것을 추천한다. 왜냐하면 전체 리드 및 단일염기서열변이 개수, N50 페이즈 블락, 가장 긴 페이즈 블락, 페이즈 비율 그리고 평균 밀도 등이 TSLRH 보다 10XG가 더 높거나 좋기 때문이다.

Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms

  • Jeong, Haeyoung;Lee, Dae-Hee;Ryu, Choong-Min;Park, Seung-Hwan
    • Journal of Microbiology and Biotechnology
    • /
    • 제26권1호
    • /
    • pp.207-212
    • /
    • 2016
  • PacBio's long-read sequencing technologies can be successfully used for a complete bacterial genome assembly using recently developed non-hybrid assemblers in the absence of second-generation, high-quality short reads. However, standardized procedures that take into account multiple pre-existing second-generation sequencing platforms are scarce. In addition to Illumina HiSeq and Ion Torrent PGM-based genome sequencing results derived from previous studies, we generated further sequencing data, including from the PacBio RS II platform, and applied various bioinformatics tools to obtain complete genome assemblies for five bacterial strains. Our approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. The other two platforms further improved the PacBio assembly through scaffolding and a final error correction.

Exome and genome sequencing for diagnosing patients with suspected rare genetic disease

  • Go Hun Seo;Hane Lee
    • Journal of Genetic Medicine
    • /
    • 제20권2호
    • /
    • pp.31-38
    • /
    • 2023
  • Rare diseases, even though defined as fewer than 20,000 in South Korea, with over 8,000 rare Mendelian disorders having been identified, they collectively impact 6-8% of the global population. Many of the rare diseases pose significant challenges to patients, patients' families, and the healthcare system. The diagnostic journey for rare disease patients is often lengthy and arduous, hampered by the genetic diversity and phenotypic complexity of these conditions. With the advent of next-generation sequencing technology and clinical implementation of exome sequencing (ES) and genome sequencing (GS), the diagnostic rate for rare diseases is 25-50% depending on the disease category. It is also allowing more rapid new gene-disease association discovery and equipping us to practice precision medicine by offering tailored medical management plans, early intervention, family planning options. However, a substantial number of patients remain undiagnosed, and it could be due to several factors. Some may not have genetic disorders. Some may have disease-causing variants that are not detectable or interpretable by ES and GS. It's also possible that some patient might have a disease-causing variant in a gene that hasn't yet been linked to a disease. For patients who remain undiagnosed, reanalysis of existing data has shown promises in providing new molecular diagnoses achieved by new gene-disease associations, new variant discovery, and variant reclassification, leading to a 5-10% increase in the diagnostic rate. More advanced approach such as long-read sequencing, transcriptome sequencing and integration of multi-omics data may provide potential values in uncovering elusive genetic causes.

Birth of an 'Asian cool' reference genome: AK1

  • Kim, Changhoon
    • BMB Reports
    • /
    • 제49권12호
    • /
    • pp.653-654
    • /
    • 2016
  • The human reference genome, maintained by the Genome Reference Consortium, is conceivably the most complete genome assembly ever, since its first construction. It has continually been improved by incorporating corrections made to the previous assemblies, thanks to various technological advances. Many currently-ongoing population sequencing projects have been based on this reference genome, heightening hopes of the development of useful medical applications of genomic information, thanks to the recent maturation of high-throughput sequencing technologies. However, just one reference genome does not fit all the populations across the globe, because of the large diversity in genomic structures and technical limitations inherent to short read sequencing methods. The recent success in de novo construction of the highly contiguous Asian diploid genome AK1, by combining single molecule technologies with routine sequencing data without resorting to traditional clone-by-clone sequencing and physical mapping, reveals the nature of genomic structure variation by detecting thousands of novel structural variations and by finally filling in some of the prior gaps which had persistently remained in the current human reference genome. Now it is expected that the AK1 genome, soon to be paired with more upcoming de novo assembled genomes, will provide a chance to explore what it is really like to use ancestry-specific reference genomes instead of hg19/hg38 for population genomics. This is a major step towards the furthering of genetically-based precision medicine.

Microbial Community Dysbiosis and Functional Gene Content Changes in Apple Flowers due to Fire Blight

  • Kong, Hyun Gi;Ham, Hyeonheui;Lee, Mi-Hyun;Park, Dong Suk;Lee, Yong Hwan
    • The Plant Pathology Journal
    • /
    • 제37권4호
    • /
    • pp.404-412
    • /
    • 2021
  • Despite the plant microbiota plays an important role in plant health, little is known about the potential interactions of the flower microbiota with pathogens. In this study, we investigated the microbial community of apple blossoms when infected with Erwinia amylovora. The long-read sequencing technology, which significantly increased the genome sequence resolution, thus enabling the characterization of fire blight-induced changes in the flower microbial community. Each sample showed a unique microbial community at the species level. Pantoea agglomerans and P. allii were the most predominant bacteria in healthy flowers, whereas E. amylovora comprised more than 90% of the microbial population in diseased flowers. Furthermore, gene function analysis revealed that glucose and xylose metabolism were enriched in diseased flowers. Overall, our results showed that the microbiome of apple blossoms is rich in specific bacteria, and the nutritional composition of flowers is important for the incidence and spread of bacterial disease.

Ongoing endeavors to detect mobilization of transposable elements

  • Lee, Yujeong;Ha, Una;Moon, Sungjin
    • BMB Reports
    • /
    • 제55권7호
    • /
    • pp.305-315
    • /
    • 2022
  • Transposable elements (TEs) are DNA sequences capable of mobilization from one location to another in the genome. Since the discovery of 'Dissociation (Dc) locus' by Barbara McClintock in maize (1), mounting evidence in the era of genomics indicates that a significant fraction of most eukaryotic genomes is composed of TE sequences, involving in various aspects of biological processes such as development, physiology, diseases and evolution. Although technical advances in genomics have discovered numerous functional impacts of TE across species, our understanding of TEs is still ongoing process due to challenges resulted from complexity and abundance of TEs in the genome. In this mini-review, we briefly summarize biology of TEs and their impacts on the host genome, emphasizing importance of understanding TE landscape in the genome. Then, we introduce recent endeavors especially in vivo retrotransposition assays and long read sequencing technology for identifying de novo insertions/TE polymorphism, which will broaden our knowledge of extraordinary relationship between genomic cohabitants and their host.