• Title/Summary/Keyword: NGS data analysis

Search Result 57, Processing Time 0.028 seconds

Short Reads Phasing to Construct Haplotypes in Genomic Regions That Are Associated with Body Mass Index in Korean Individuals

  • Lee, Kichan;Han, Seonggyun;Tark, Yeonjeong;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.12 no.4
    • /
    • pp.165-170
    • /
    • 2014
  • Genome-wide association (GWA) studies have found many important genetic variants that affect various traits. Since these studies are useful to investigate untyped but causal variants using linkage disequilibrium (LD), it would be useful to explore the haplotypes of single-nucleotide polymorphisms (SNPs) within the same LD block of significant associations based on high-density variants from population references. Here, we tried to make a haplotype catalog affecting body mass index (BMI) through an integrative analysis of previously published whole-genome next-generation sequencing (NGS) data of 7 representative Korean individuals and previously known Korean GWA signals. We selected 435 SNPs that were significantly associated with BMI from the GWA analysis and searched 53 LD ranges nearby those SNPs. With the NGS data, the haplotypes were phased within the LDs. A total of 44 possible haplotype blocks for Korean BMI were cataloged. Although the current result constitutes little data, this study provides new insights that may help to identify important haplotypes for traits and low variants nearby significant SNPs. Furthermore, we can build a more comprehensive catalog as a larger dataset becomes available.

Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms

  • Franke, Karl R.;Crowgey, Erin L.
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.10.1-10.9
    • /
    • 2020
  • Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon's somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.

Bacterial Community and Diversity from the Watermelon Cultivated Soils through Next Generation Sequencing Approach

  • Adhikari, Mahesh;Kim, Sang Woo;Kim, Hyun Seung;Kim, Ki Young;Park, Hyo Bin;Kim, Ki Jung;Lee, Youn Su
    • The Plant Pathology Journal
    • /
    • v.37 no.6
    • /
    • pp.521-532
    • /
    • 2021
  • Knowledge and better understanding of functions of the microbial community are pivotal for crop management. This study was conducted to study bacterial structures including Acidovorax species community structures and diversity from the watermelon cultivated soils in different regions of South Korea. In this study, soil samples were collected from watermelon cultivation areas from various places of South Korea and microbiome analysis was performed to analyze bacterial communities including Acidovorax species community. Next generation sequencing (NGS) was performed by extracting genomic DNA from 92 soil samples from 8 different provinces using a fast genomic DNA extraction kit. NGS data analysis results revealed that, total, 39,367 operational taxonomic unit (OTU), were obtained. NGS data results revealed that, most dominant phylum in all the soil samples was Proteobacteria (37.3%). In addition, most abundant genus was Acidobacterium (1.8%) in all the samples. In order to analyze species diversity among the collected soil samples, OTUs, community diversity, and Shannon index were measured. Shannon (9.297) and inverse Simpson (0.996) were found to have the highest diversity scores in the greenhouse soil sample of Gyeonggi-do province (GG4). Results from NGS sequencing suggest that, most of the soil samples consists of similar trend of bacterial community and diversity. Environmental factors play a key role in shaping the bacterial community and diversity. In order to address this statement, further correlation analysis between soil physical and chemical parameters with dominant bacterial community will be carried out to observe their interactions.

COEX-Seq: Convert a Variety of Measurements of Gene Expression in RNA-Seq

  • Kim, Sang Cheol;Yu, Donghyeon;Cho, Seong Beom
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.36.1-36.3
    • /
    • 2018
  • Next generation sequencing (NGS), a high-throughput DNA sequencing technology, is widely used for molecular biological studies. In NGS, RNA-sequencing (RNA-Seq), which is a short-read massively parallel sequencing, is a major quantitative transcriptome tool for different transcriptome studies. To utilize the RNA-Seq data, various quantification and analysis methods have been developed to solve specific research goals, including identification of differentially expressed genes and detection of novel transcripts. Because of the accumulation of RNA-Seq data in the public databases, there is a demand for integrative analysis. However, the available RNA-Seq data are stored in different formats such as read count, transcripts per million, and fragments per kilobase million. This hinders the integrative analysis of the RNA-Seq data. To solve this problem, we have developed a web-based application using Shiny, COEX-seq (Convert a Variety of Measurements of Gene Expression in RNA-Seq) that easily converts data in a variety of measurement formats of gene expression used in most bioinformatic tools for RNA-Seq. It provides a workflow that includes loading data set, selecting measurement formats of gene expression, and identifying gene names. COEX-seq is freely available for academic purposes and can be run on Windows, Mac OS, and Linux operating systems. Source code, sample data sets, and supplementary documentation are available as well.

Application of next generation sequencing (NGS) system for whole-genome sequencing of porcine reproductive and respiratory syndrome virus (PRRSV) (돼지생식기호흡기증후군바이러스(PRRSV)의 전장 유전체 염기서열(whole-genome sequencing) 분석을 위한 차세대 염기서열 분석법의 활용)

  • Moon, Sung-Hyun;Khatun, Amina;Kim, Won-Il;Hossain, Md Mukter;Oh, Yeonsu;Cho, Ho-Seong
    • Korean Journal of Veterinary Service
    • /
    • v.39 no.1
    • /
    • pp.41-49
    • /
    • 2016
  • In the present study, fast and robust methods for the next generation sequencing (NGS) were developed for analysis of PRRSV full genome sequences, which is a positive sensed RNA virus with a high degree of genetic variability among isolates. Two strains of PRRSVs (VR2332 and VR2332-R) which have been maintained in our laboratory were used to validate our methods and to compare with the sequence registered in GenBank (GenBank accession no. EF536003). The results suggested that both of strains had 100% coverage with the reference; the VR2332 had the coverage depth from minimum 3 to maximum 23,012, for the VR2332-R from minimum 3 to maximum 41,348, and 22,712 as an average depth. Genomic data produced from the massive sequencing capacities of the NGS have enabled the study of PRRSV at an unprecedented rate and details. Unlike conventional sequence methods which require the knowledge of conserved regions, the NGS allows de novo assembly of the full viral genomes. Therefore, our results suggested that these methods using the NGS massively facilitate the generation of more full genome PRRSV sequences locally as well as nationally in regard of saving time and cost.

Construction of PANM Database (Protostome DB) for rapid annotation of NGS data in Mollusks

  • Kang, Se Won;Park, So Young;Patnaik, Bharat Bhusan;Hwang, Hee Ju;Kim, Changmu;Kim, Soonok;Lee, Jun Sang;Han, Yeon Soo;Lee, Yong Seok
    • The Korean Journal of Malacology
    • /
    • v.31 no.3
    • /
    • pp.243-247
    • /
    • 2015
  • A stand-alone BLAST server is available that provides a convenient and amenable platform for the analysis of molluscan sequence information especially the EST sequences generated by traditional sequencing methods. However, it is found that the server has limitations in the annotation of molluscan sequences generated using next-generation sequencing (NGS) platforms due to inconsistencies in molluscan sequence available at NCBI. We constructed a web-based interface for a new stand-alone BLAST, called PANM-DB (Protostome DB) for the analysis of molluscan NGS data. The PANM-DB includes the amino acid sequences from the protostome groups-Arthropoda, Nematoda, and Mollusca downloaded from GenBank with the NCBI taxonomy Browser. The sequences were translated into multi-FASTA format and stored in the database by using the formatdb program at NCBI. PANM-DB contains 6% of NCBInr database sequences (as of 24-06-2015), and for an input of 10,000 RNA-seq sequences the processing speed was 15 times faster by using PANM-DB when compared with NCBInr DB. It was also noted that PANM-DB show two times more significant hits with diverse annotation profiles as compared with Mollusks DB. Hence, the construction of PANM-DB is a significant step in the annotation of molluscan sequence information obtained from NGS platforms. The PANM-DB is freely downloadable from the web-based interface (Malacological Society of Korea, http://malacol.or/kr/blast) as compressed file system and can run on any compatible operating system.

Multi-omics integration strategies for animal epigenetic studies - A review

  • Kim, Do-Young;Kim, Jun-Mo
    • Animal Bioscience
    • /
    • v.34 no.8
    • /
    • pp.1271-1282
    • /
    • 2021
  • Genome-wide studies provide considerable insights into the genetic background of animals; however, the inheritance of several heritable factors cannot be elucidated. Epigenetics explains these heritabilities, including those of genes influenced by environmental factors. Knowledge of the mechanisms underlying epigenetics enables understanding the processes of gene regulation through interactions with the environment. Recently developed next-generation sequencing (NGS) technologies help understand the interactional changes in epigenetic mechanisms. There are large sets of NGS data available; however, the integrative data analysis approaches still have limitations with regard to reliably interpreting the epigenetic changes. This review focuses on the epigenetic mechanisms and profiling methods and multi-omics integration methods that can provide comprehensive biological insights in animal genetic studies.

The Protostome database (PANM-DB): Version 2.0 release with updated sequences (연체동물 NGS 데이터 분석을 위한 PANM 데이터베이스 업데이트 (Version II))

  • Kang, Se Won;Park, So Young;Patnaik, Bharat Bhusan;Hwang, Hee Ju;Chung, Jong Min;Song, Dae Kwon;Park, Young-Su;Lee, Jun Sang;Han, Yeon Soo;Park, Hong Seog;Lee, Yong Seok
    • The Korean Journal of Malacology
    • /
    • v.32 no.3
    • /
    • pp.185-188
    • /
    • 2016
  • PANM-DB (version 1.0) was constructed as a web-based interface for the analysis and annotation of Next-Generation Sequencing (NGS) data of Mollusca, Arthropoda, and Nematoda. The database collected the sequences of Protostomes (Mollusca, Arthropoda, and Nematoda) from the NCBI Taxonomy Browser, and the same were compiled in a multi-FASTA format and stored using the formatdb program. This improved the processing of the RNA-seq sequences in terms of speed and hit percentage. PANM-DB has been successfully used for the transcriptome annotation of butterfly, land snail, and other commercial mollusca. We have improved the database by updating the same with new sequences and version 2.0 contains a total of 7,571,246 protein sequences (two times more as compared to version 1.0). Furthermore, the updated version contains the Cephalopoda database. The constructed web interface is available that independently analyses following these updates that is an improvement of the mollusks BLAST server. The updated version of PANM-DB will be helpful for the analysis of the NGS based sequencing data of non-model species, especially Mollusca, Arthropoda, Nematoda.

CNVDAT: A Copy Number Variation Detection and Analysis Tool for Next-generation Sequencing Data (CNVDAT : 차세대 시퀀싱 데이터를 위한 유전체 단위 반복 변이 검출 및 분석 도구)

  • Kang, Inho;Kong, Jinhwa;Shin, JaeMoon;Lee, UnJoo;Yoon, Jeehee
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.249-255
    • /
    • 2014
  • Copy number variations(CNVs) are a recently recognized class of human structural variations and are associated with a variety of human diseases, including cancer. To find important cancer genes, researchers identify novel CNVs in patients with a particular cancer and analyze large amounts of genomic and clinical data. We present a tool called CNVDAT which is able to detect CNVs from NGS data and systematically analyze the genomic and clinical data associated with variations. CNVDAT consists of two modules, CNV Detection Engine and Sequence Analyser. CNV Detection Engine extracts CNVs by using the multi-resolution system of scale-space filtering, enabling the detection of the types and the exact locations of CNVs of all sizes even when the coverage level of read data is low. Sequence Analyser is a user-friendly program to view and compare variation regions between tumor and matched normal samples. It also provides a complete analysis function of refGene and OMIM data and makes it possible to discover CNV-gene-phenotype relationships. CNVDAT source code is freely available from http://dblab.hallym.ac.kr/CNVDAT/.

Caution and Curation for Complete Mitochondrial Genome from Next-Generation Sequencing: A Case Study from Dermatobranchus otome (Gastropoda, Nudibranchia)

  • Do, Thinh Dinh;Choi, Yisoo;Jung, Dae-Wui;Kim, Chang-Bae
    • Animal Systematics, Evolution and Diversity
    • /
    • v.36 no.4
    • /
    • pp.336-346
    • /
    • 2020
  • Mitochondrial genome is an important molecule for systematic and evolutionary studies in metazoans. The development of next-generation sequencing (NGS) technique has rapidly increased the number of mitogenome sequences. The process of generating mitochondrial genome based on NGS includes different steps, from DNA preparation, sequencing, assembly, and annotation. Despite the effort to improve sequencing, assembly, and annotation methods of mitogenome, the low quality and/or quantity sequence in the final map can still be generated through the work. Therefore, it is necessary to check and curate mitochondrial genome sequence after annotation for proofreading and feedback. In this study, we introduce the pipeline for sequencing and curation for mitogenome based on NGS. For this purpose, two mitogenome sequences of Dermatobranchus otome were sequenced by Illumina Miseq system with different amount of raw read data. Generated reads were targeted for assembly and annotation with commonly used programs. As abnormal repeat regions present in the mitogenomes after annotation, primers covering these regions were designed and conventional PCR followed by Sanger sequencing were performed to curate the mitogenome sequences. The obtained sequences were used to replace the abnormal region. Following the replacement, each mitochondrial genome was compared with the other as well as the sequences of close species available on the Genbank for confirmation. After curation, two mitogenomes of D. otome showed a typically circular molecule with 14,559 bp in size and contained 13 protein-coding genes, 22 tRNA genes, two rRNA genes. The phylogenetic tree revealed a close relationship between D. otome and Tritonia diomea. The finding of this study indicated the importance of caution and curation for the generation of mitogenome from NGS.