• Title/Summary/Keyword: Genome sequencing

Search Result 820, Processing Time 0.026 seconds

An Optimized Strategy for Genome Assembly of Sanger/pyrosequencing Hybrid Data using Available Software

  • Jeong, Hae-Young;Kim, Ji-Hyun F.
    • Genomics & Informatics
    • /
    • v.6 no.2
    • /
    • pp.87-90
    • /
    • 2008
  • During the last four years, the pyrosequencing-based 454 platform has rapidly displaced the traditional Sanger sequencing method due to its high throughput and cost effectiveness. Meanwhile, the Sanger sequencing methodology still provides the longest reads, and paired-end sequencing that is based on that chemistry offers an opportunity to ensure accurate assembly results. In this report, we describe an optimized approach for hybrid de novo genome assembly using pyrosequencing data and varying amounts of Sanger-type reads. 454 platform-derived contigs can be used as single non-breakable virtual reads or converted to simpler contigs that consist of editable, overlapping pseudoreads. These modified contigs maintain their integrity at the first jumpstarting assembly stage and are edited by fragmenting and rejoining. Pre-existing assembly software then can be applied for mixed assembly with 454-derived data and Sanger reads. An effective method for identifying genomic differences between reference and sample sequences in whole-genome resequencing procedures also is suggested.

A Primer for Disease Gene Prioritization Using Next-Generation Sequencing Data

  • Wang, Shuoguo;Xing, Jinchuan
    • Genomics & Informatics
    • /
    • v.11 no.4
    • /
    • pp.191-199
    • /
    • 2013
  • High-throughput next-generation sequencing (NGS) technology produces a tremendous amount of raw sequence data. The challenges for researchers are to process the raw data, to map the sequences to genome, to discover variants that are different from the reference genome, and to prioritize/rank the variants for the question of interest. The recent development of many computational algorithms and programs has vastly improved the ability to translate sequence data into valuable information for disease gene identification. However, the NGS data analysis is complex and could be overwhelming for researchers who are not familiar with the process. Here, we outline the analysis pipeline and describe some of the most commonly used principles and tools for analyzing NGS data for disease gene identification.

Generation and analysis of whole-genome sequencing data in human mammary epithelial cells

  • Jong-Lyul Park;Jae-Yoon Kim;Seon-Young Kim;Yong Sun Lee
    • Genomics & Informatics
    • /
    • v.21 no.1
    • /
    • pp.11.1-11.5
    • /
    • 2023
  • Breast cancer is the most common cancer worldwide, and advanced breast cancer with metastases is incurable mainly with currently available therapies. Therefore, it is essential to understand molecular characteristics during the progression of breast carcinogenesis. Here, we report a dataset of whole genomes from the human mammary epithelial cell system derived from a reduction mammoplasty specimen. This system comprises pre-stasis 184D cells, considered normal, and seven cell lines along cancer progression series that are immortalized or additionally acquired anchorage-independent growth. Our analysis of the whole-genome sequencing (WGS) data indicates that those seven cancer progression series cells have somatic mutations whose number ranges from 8,393 to 39,564 (with an average of 30,591) compared to 184D cells. These WGS data and our mutation analysis will provide helpful information to identify driver mutations and elucidate molecular mechanisms for breast carcinogenesis.

Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

  • Lim, Jong-Sung;Choi, Beom-Soon;Lee, Jeong-Soo;Shin, Chan-Seok;Yang, Tae-Jin;Rhee, Jae-Sung;Lee, Jae-Seong;Choi, Ik-Young
    • Genomics & Informatics
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2012
  • Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the nextgeneration DNA sequencer (NGS) Roche/454 and Illumina/ Solexa systems, along with bioinformation analysis technologies of whole-genome $de$ $novo$ assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing $de$ $novo$ assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least $2{\times}$ and $30{\times}$ depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive shortlength reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a wholegenome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through $de$ $novo$ assembly in any whole-genome sequenced species. The $20{\times}$ and $50{\times}$ coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average $30{\times}$ coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

New Lung Cancer Panel for High-Throughput Targeted Resequencing

  • Kim, Eun-Hye;Lee, Sunghoon;Park, Jongsun;Lee, Kyusang;Bhak, Jong;Kim, Byung Chul
    • Genomics & Informatics
    • /
    • v.12 no.2
    • /
    • pp.50-57
    • /
    • 2014
  • We present a new next-generation sequencing-based method to identify somatic mutations of lung cancer. It is a comprehensive mutation profiling protocol to detect somatic mutations in 30 genes found frequently in lung adenocarcinoma. The total length of the target regions is 107 kb, and a capture assay was designed to cover 99% of it. This method exhibited about 97% mean coverage at $30{\times}$ sequencing depth and 42% average specificity when sequencing of more than 3.25 Gb was carried out for the normal sample. We discovered 513 variations from targeted exome sequencing of lung cancer cells, which is 3.9-fold higher than in the normal sample. The variations in cancer cells included previously reported somatic mutations in the COSMIC database, such as variations in TP53, KRAS, and STK11 of sample H-23 and in EGFR of sample H-1650, especially with more than $1,000{\times}$ coverage. Among the somatic mutations, up to 91% of single nucleotide polymorphisms from the two cancer samples were validated by DNA microarray-based genotyping. Our results demonstrated the feasibility of high-throughput mutation profiling with lung adenocarcinoma samples, and the profiling method can be used as a robust and effective protocol for somatic variant screening.

Genetic Diagnosis of Inherited Metabolic Disorders using Next-Generation Sequencing (차세대 염기서열분석을 이용한 유전성 대사질환의 유전진단)

  • Chang-Seok Ki
    • Journal of The Korean Society of Inherited Metabolic disease
    • /
    • v.23 no.2
    • /
    • pp.1-7
    • /
    • 2023
  • Inherited metabolic disorders (IMD) are a group of disorders involving various metabolic pathways. Genetic diagnosis of IMD has been challenging because of extremely heterogeneous nature and extensive laboratory and/or phenotype overlap. Conventional genetic diagnosis was a gene-by-gene approach that needs a priori information on the causative genes that might underlie the IMD. Recent implementation of next-generation sequencing (NGS) technologies has changed the process of genetic diagnosis from a gene-by-gene approach to simultaneous analysis of targeted genes possibly associated with the IMD using gene panels or using whole exome/genome sequencing (WES/WGS) covering entire human genes. Clinical NGS tests can be a cost-effective approach for the rapid diagnosis of IMD with genetic heterogeneity and are becoming standard diagnostic procedures.

  • PDF

AGB (Ancestral Genome Browser): A Web Interface for Browsing Reconstructed Ancestral Genomes (AGB (Ancestral Genome Browser): 조상유전체 데이터의 시각적 열람을 위한 웹 인터페이스)

  • Lee, Daehwan;Lee, Jongin;Hong, Woon-Young;Jang, Eunji;Kim, Jaebum
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1584-1589
    • /
    • 2015
  • With the advancement of next-generation sequencing (NGS) technologies, various genome browsers have been introduced. Because existing browsers focus on comparison of the genomic data of extant species, however, there is a need for a genome browser for ancestral genomes and their evolution. In this paper, we introduce a genome browser, AGB (Ancestral Genome Browser), that displays ancestral genome data reconstructed from existing species. With AGB, it is possible to trace genomic variations that occurred during evolution in a simple and intuitive way. We explain the capability of AGB in terms of visualizing ancestral genomic information and evolutionary genomic variations. AGB is now available at http://bioinfo.konkuk.ac.kr/genomebrowser/.

Whole Genome Sequencing and Gene Prediction of Cynodon transvaalensis

  • Sol Ji Lee;Chang soo Kim
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.237-237
    • /
    • 2022
  • Cynodon transvaalensis belongs to the warm-season grasses and is one of the economically and ecologically important crops. Cynodon species with high heterozygosity are difficult to assemble, so genome research has not been actively conducted. In this study, hybrid assembly was performed by sequencing with Illumina and PacBio. As a result of the assembly, the number of scaffolds and the length of N50 were 1,392, 928 kb, respectively. The completeness of the assembly was confirmed by BSUCO at 98.3%. In addition, as a result of estimating the size of the assembled genome by K-mer analysis (k=25), it was approximately ~413 Mb. A total of 37,060 cds sequences were annotated in the assembled genome, and their functions were identified through blast. After that, we try to complete the assembled genome into a pseudochromosome-level genome through Hi-C technology. These results will not only help to understand the complex genome composition of african bermudagrass, but also provide a resource for genomic and evolutionary studies of grass and other plant species.

  • PDF

IVAG: An Integrative Visualization Application for Various Types of Genomic Data Based on R-Shiny and the Docker Platform

  • Lee, Tae-Rim;Ahn, Jin Mo;Kim, Gyuhee;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.15 no.4
    • /
    • pp.178-182
    • /
    • 2017
  • Next-generation sequencing (NGS) technology has become a trend in the genomics research area. There are many software programs and automated pipelines to analyze NGS data, which can ease the pain for traditional scientists who are not familiar with computer programming. However, downstream analyses, such as finding differentially expressed genes or visualizing linkage disequilibrium maps and genome-wide association study (GWAS) data, still remain a challenge. Here, we introduce a dockerized web application written in R using the Shiny platform to visualize pre-analyzed RNA sequencing and GWAS data. In addition, we have integrated a genome browser based on the JBrowse platform and an automated intermediate parsing process required for custom track construction, so that users can easily build and navigate their personal genome tracks with in-house datasets. This application will help scientists perform series of downstream analyses and obtain a more integrative understanding about various types of genomic data by interactively visualizing them with customizable options.