• Title/Summary/Keyword: Molecular Sequencing Data

Search Result 216, Processing Time 0.026 seconds

A Universal Analysis Pipeline for Hybrid Capture-Based Targeted Sequencing Data with Unique Molecular Indexes

  • Kim, Min-Jung;Kim, Si-Cho;Kim, Young-Joon
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.29.1-29.5
    • /
    • 2018
  • Hybrid capture-based targeted sequencing is being used increasingly for genomic variant profiling in tumor patients. Unique molecular index (UMI) technology has recently been developed and helps to increase the accuracy of variant calling by minimizing polymerase chain reaction biases and sequencing errors. However, UMI-adopted targeted sequencing data analysis is slightly different from the methods for other types of omics data, and its pipeline for variant calling is still being optimized in various study groups for their own purposes. Due to this provincial usage of tools, our group built an analysis pipeline for global application to many studies of targeted sequencing generated with different methods. First, we generated hybrid capture-based data using genomic DNA extracted from tumor tissues of colorectal cancer patients. Sequencing libraries were prepared and pooled together, and an 8-plexed capture library was processed to the enrichment step before 150-bp paired-end sequencing with Illumina HiSeq series. For the analysis, we evaluated several published tools. We focused mainly on the compatibility of the input and output of each tool. Finally, our laboratory built an analysis pipeline specialized for UMI-adopted data. Through this pipeline, we were able to estimate even on-target rates and filtered consensus reads for more accurate variant calling. These results suggest the potential of our analysis pipeline in the precise examination of the quality and efficiency of conducted experiments.

A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning (앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

ChIP-seq Library Preparation and NGS Data Analysis Using the Galaxy Platform (ChIP-seq 라이브러리 제작 및 Galaxy 플랫폼을 이용한 NGS 데이터 분석)

  • Kang, Yujin;Kang, Jin;Kim, Yea Woon;Kim, AeRi
    • Journal of Life Science
    • /
    • v.31 no.4
    • /
    • pp.410-417
    • /
    • 2021
  • Next-generation sequencing (NGS) is a high-throughput technique for sequencing large numbers of DNA fragments that are prepared from a genome. This sequencing technique has been used to elucidate whole genome sequences of living organisms and to analyze complementary DNA (cDNA) or chromatin immunoprecipitated DNA (ChIPed DNA) at the genome level. After NGS, the use of proper tools is important for processing and analyzing data with reasonable parameters. However, handling large-scale sequencing data and programing for data analysis can be difficult. The Galaxy platform, a public web service system, provides many different tools for NGS data analysis, and it allows researchers to analyze their data on a web browser with no deep knowledge about bioinformatics and/or programing. In this study, we explain the procedure for preparing chromatin immunoprecipitation-sequencing (ChIP-seq) libraries and steps for analyzing ChIP-seq data using the Galaxy platform. The data analysis steps include the NGS data upload to Galaxy, quality check of the NGS data, premapping processes, read mapping, the post-mapping process, peak-calling and visualization by window view, heatmaps, average profile, and correlation analysis. Analysis of our histone H3K4me1 ChIP-seq data in K562 cells shows that it correlates with public data. Thus, NGS data analysis using the Galaxy platform can provide an easy approach to bioinformatics.

ChIP-seq Analysis of Histone H3K27ac and H3K27me3 Showing Different Distribution Patterns in Chromatin

  • Kang, Jin;Kim, AeRi
    • Biomedical Science Letters
    • /
    • v.28 no.2
    • /
    • pp.109-119
    • /
    • 2022
  • Histone proteins can be modified by the addition of acetyl group or methyl group to specific amino acids. The modifications have different distribution patterns in chromatin. Recently, histone modifications are studied based on ChIP-seq data, which requires reasonable analysis of sequencing data depending on their distribution patterns. Here we have analyzed histone H3K27ac and H3K27me3 ChIP-seq data and it showed that the H3K27ac is enriched at narrow regions while H3K27me3 distributes broadly. To properly analyze the ChIP-seq data, we called peaks for H3K27ac and H3K27me3 using MACS2 (narrow option and broad option) and SICER methods, and compared propriety of the peaks using signal-to-background ratio. As results, H3K27ac-enriched regions were well identified by both methods while H3K27me3 peaks were properly identified by SICER, which indicates that peak calling method is more critical for histone modifications distributed broadly. When ChIP-seq data were compared in different sequencing depth (15, 30, 60, 120 M), high sequencing depth caused high false-positive rate in H3K27ac peak calling, but it reflected more properly the broad distribution pattern of H3K27me3. These results suggest that sequencing depth affects peak calling from ChIP-seq data and high sequencing depth is required for H3K27me3. Taken together, peak calling tool and sequencing depth should be chosen depending on the distribution pattern of histone modification in ChIP-seq analysis.

Generation and analysis of whole-genome sequencing data in human mammary epithelial cells

  • Jong-Lyul Park;Jae-Yoon Kim;Seon-Young Kim;Yong Sun Lee
    • Genomics & Informatics
    • /
    • v.21 no.1
    • /
    • pp.11.1-11.5
    • /
    • 2023
  • Breast cancer is the most common cancer worldwide, and advanced breast cancer with metastases is incurable mainly with currently available therapies. Therefore, it is essential to understand molecular characteristics during the progression of breast carcinogenesis. Here, we report a dataset of whole genomes from the human mammary epithelial cell system derived from a reduction mammoplasty specimen. This system comprises pre-stasis 184D cells, considered normal, and seven cell lines along cancer progression series that are immortalized or additionally acquired anchorage-independent growth. Our analysis of the whole-genome sequencing (WGS) data indicates that those seven cancer progression series cells have somatic mutations whose number ranges from 8,393 to 39,564 (with an average of 30,591) compared to 184D cells. These WGS data and our mutation analysis will provide helpful information to identify driver mutations and elucidate molecular mechanisms for breast carcinogenesis.

Clinical Application of ABO Genotyping: 10 Years' Experience in the Southeastern Korea

  • Sae Am Song;Eun-Kyung Yu;Seung Hwan Oh
    • Journal of Interdisciplinary Genomics
    • /
    • v.6 no.1
    • /
    • pp.6-13
    • /
    • 2024
  • Background: ABO typing is crucial for ensuring safe blood transfusion and is commonly performed by examining antigen-antibody interactions. Determining ABO blood group can be difficult when dealing with ABO discrepancy and ABO subgroups. ABO genotyping may be necessary to resolve ABO discrepancy. ABO genotyping primarily involves direct sequencing, with the possibility of using other molecular methods. Methods: PCR and direct sequencing of exons 6 and 7 were performed for total 108 samples from June 2010 to December 2019. Also, other molecular methods including cloning sequencing and short tandem repeat analysis were carried out just in case. Sequencing data were compared with allele information of blood group antigen mutation databases. Results: The predominant causal allele among 108 ABO discrepant cases was cis-AB01, with 28 cases. This was followed by rare ABO alleles (B309, B306, A204, Bw29, and Ax01) with 14 cases, and blood chimera with 5 cases. Five new alleles were identified during the investigation. Conclusion: This study reaffirms that cis-AB is the most common cause of inherited ABO discrepancies, and cis-AB01 is the most prevalent cis-AB allele in the Korean population, also in the southeastern region. In addition, we discovered five new alleles and five blood chimeras by adopting sequencing analysis and additional molecular techniques to resolve ABO discrepancies, which provide regional data on rare alleles. This study presents rare and new ABO alleles and blood chimeras identified over a ten-year period at two major university hospitals in Southeastern Korea.

Genetic tests by next-generation sequencing in children with developmental delay and/or intellectual disability

  • Han, Ji Yoon;Lee, In Goo
    • Clinical and Experimental Pediatrics
    • /
    • v.63 no.6
    • /
    • pp.195-202
    • /
    • 2020
  • Developments in next-generation sequencing (NGS) techogies have assisted in clarifying the diagnosis and treatment of developmental delay/intellectual disability (DD/ID) via molecular genetic testing. Advances in DNA sequencing technology have not only allowed the evolution of targeted panels but also, and more currently enabled genome-wide analyses to progress from research era to clinical practice. Broad acceptance of accuracy-guided targeted gene panel, whole-exome sequencing (WES), and whole-genome sequencing (WGS) for DD/ID need prospective analyses of the increasing cost-effectiveness versus conventional genetic testing. Choosing the appropriate sequencing method requires individual planning. Data are required to guide best-practice recommendations for genomic testing, regarding various clinical phenotypes in an etiologic approach. Targeted panel testing may be recommended as a firsttier testing approach for children with DD/ID. Family-based trio testing by WES/WGS can be used as a second test for DD/ID in undiagnosed children who previously tested negative on a targeted panel. The role of NGS in molecular diagnostics, treatment, prediction of prognosis will continue to increase further in the coming years. Given the rapid pace of changes in the past 10 years, all medical providers should be aware of the changes in the transformative genetics field.

Transcriptional Heterogeneity of Cellular Senescence in Cancer

  • Junaid, Muhammad;Lee, Aejin;Kim, Jaehyung;Park, Tae Jun;Lim, Su Bin
    • Molecules and Cells
    • /
    • v.45 no.9
    • /
    • pp.610-619
    • /
    • 2022
  • Cellular senescence plays a paradoxical role in tumorigenesis through the expression of diverse senescence-associated (SA) secretory phenotypes (SASPs). The heterogeneity of SA gene expression in cancer cells not only promotes cancer stemness but also protects these cells from chemotherapy. Despite the potential correlation between cancer and SA biomarkers, many transcriptional changes across distinct cell populations remain largely unknown. During the past decade, single-cell RNA sequencing (scRNA-seq) technologies have emerged as powerful experimental and analytical tools to dissect such diverse senescence-derived transcriptional changes. Here, we review the recent sequencing efforts that successfully characterized scRNA-seq data obtained from diverse cancer cells and elucidated the role of senescent cells in tumor malignancy. We further highlight the functional implications of SA genes expressed specifically in cancer and stromal cell populations in the tumor microenvironment. Translational research leveraging scRNA-seq profiling of SA genes will facilitate the identification of novel expression patterns underlying cancer susceptibility, providing new therapeutic opportunities in the era of precision medicine.

Deep sequencing of B cell receptor repertoire

  • Kim, Daeun;Park, Daechan
    • BMB Reports
    • /
    • v.52 no.9
    • /
    • pp.540-547
    • /
    • 2019
  • Immune repertoire is a collection of enormously diverse adaptive immune cells within an individual. As the repertoire shapes and represents immunological conditions, identification of clones and characterization of diversity are critical for understanding how to protect ourselves against various illness such as infectious diseases and cancers. Over the past several years, fast growing technologies for high throughput sequencing have facilitated rapid advancement of repertoire research, enabling us to observe the diversity of repertoire at an unprecedented level. Here, we focus on B cell receptor (BCR) repertoire and review approaches to B cell isolation and sequencing library construction. These experiments should be carefully designed according to BCR regions to be interrogated, such as heavy chain full length, complementarity determining regions, and isotypes. We also highlight preprocessing steps to remove sequencing and PCR errors with unique molecular index and bioinformatics techniques. Due to the nature of massive sequence variation in BCR, caution is warranted when interpreting repertoire diversity from error-prone sequencing data. Furthermore, we provide a summary of statistical frameworks and bioinformatics tools for clonal evolution and diversity. Finally, we discuss limitations of current BCR-seq technologies and future perspectives on advances in repertoire sequencing.

Whole-exome sequencing analysis in a case of primary congenital glaucoma due to the partial uniparental isodisomy

  • Zavarzadeh, Parisima Ghaffarian;Bonyadi, Morteza;Abedi, Zahra
    • Genomics & Informatics
    • /
    • v.20 no.3
    • /
    • pp.28.1-28.7
    • /
    • 2022
  • We described a clinical, laboratory, and genetic presentation of a pathogenic variant of the CYP1B1 gene through a report of a case of primary congenital glaucoma and a trio analysis of this candidate variant in the family with the Sanger sequencing method and eventually completed our study with the secondary/incidental findings. This study reports a rare case of primary congenital glaucoma, an 8-year-old female child with a negative family history of glaucoma and uncontrolled intraocular pressure. This case's whole-exome sequencing data analysis presents a homozygous pathogenic single nucleotide variant in the CYP1B1 gene (NM_000104:exon3:c.G1103A:p.R368H). At the same time, this pathogenic variant was obtained as a heterozygous state in her unaffected father but not her mother. The diagnosis was made based on molecular findings of whole-exome sequencing data analysis. Therefore, the clinical reports and bioinformatics findings supported the relation between the candidate pathogenic variant and the disease. However, it should not be forgotten that primary congenital glaucoma is not peculiar to the CYP1B1 gene. Since the chance of developing autosomal recessive disorders with low allele frequency and unrelated parents is extraordinary in offspring. However, further data analysis of whole-exome sequencing and Sanger sequencing method were applied to obtain the type of mutation and how it was carried to the offspring.