• 제목/요약/키워드: Molecular Sequencing Data

검색결과 216건 처리시간 0.025초

A Universal Analysis Pipeline for Hybrid Capture-Based Targeted Sequencing Data with Unique Molecular Indexes

  • Kim, Min-Jung;Kim, Si-Cho;Kim, Young-Joon
    • Genomics & Informatics
    • /
    • 제16권4호
    • /
    • pp.29.1-29.5
    • /
    • 2018
  • Hybrid capture-based targeted sequencing is being used increasingly for genomic variant profiling in tumor patients. Unique molecular index (UMI) technology has recently been developed and helps to increase the accuracy of variant calling by minimizing polymerase chain reaction biases and sequencing errors. However, UMI-adopted targeted sequencing data analysis is slightly different from the methods for other types of omics data, and its pipeline for variant calling is still being optimized in various study groups for their own purposes. Due to this provincial usage of tools, our group built an analysis pipeline for global application to many studies of targeted sequencing generated with different methods. First, we generated hybrid capture-based data using genomic DNA extracted from tumor tissues of colorectal cancer patients. Sequencing libraries were prepared and pooled together, and an 8-plexed capture library was processed to the enrichment step before 150-bp paired-end sequencing with Illumina HiSeq series. For the analysis, we evaluated several published tools. We focused mainly on the compatibility of the input and output of each tool. Finally, our laboratory built an analysis pipeline specialized for UMI-adopted data. Through this pipeline, we were able to estimate even on-target rates and filtered consensus reads for more accurate variant calling. These results suggest the potential of our analysis pipeline in the precise examination of the quality and efficiency of conducted experiments.

앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구 (A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제2권1호
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

ChIP-seq 라이브러리 제작 및 Galaxy 플랫폼을 이용한 NGS 데이터 분석 (ChIP-seq Library Preparation and NGS Data Analysis Using the Galaxy Platform)

  • 강유진;강진;김예운;김애리
    • 생명과학회지
    • /
    • 제31권4호
    • /
    • pp.410-417
    • /
    • 2021
  • NGS (Next-generation sequencing), 즉 차세대염기서열분석은 유전체 수준의 방대한 DNA를 작은 절편으로 만들어서 그 절편들의 염기서열들을 동시에 읽어내는 기법이다. 현재 다양한 생명체의 유전체 염기서열 분석부터 cDNA (complementary DNA)나 ChIPed DNA (chromatin immunoprecipitated DNA)를 분석하는데 이 NGS 기법을 사용하고 있으며, 이 때 얻어진 데이터를 적절히 처리하고 분석하는 일은 생물학적으로 유의미한 결과를 얻기 위하여 중요하다. 하지만 대용량 데이터의 저장 및 활용, 그리고 컴퓨터 프로그래밍 바탕의 데이터 분석은 실험을 수행하는 일반 생물학자들에게 어려운 일이다. Galaxy 플랫폼은 다양한 NGS 데이터 분석 tool을 무료로 제공하는 웹 서비스이며, 생물정보학이나 프로그래밍에 대한 전문지식이 없는 연구자들에게 웹 브라우저만을 이용하여 데이터를 분석할 수 있는 환경을 제공한다. 본 논문에서는 ChIP-seq (chromatin immunoprecipitation-sequencing) 수행을 위한 라이브러리 제작 과정 및 Galaxy 플랫폼을 이용한 ChIP-seq 데이터 분석 과정을 설명하고, K562 세포주에서 수행한 히스톤 H3K4me1 ChIP-seq 결과가 public 데이터와 일치함을 보여준다. 따라서 Galaxy 플랫폼을 활용한 NGS 데이터 분석은 생물정보학에 대한 손쉬운 접근 방법을 제공할 것으로 기대된다.

ChIP-seq Analysis of Histone H3K27ac and H3K27me3 Showing Different Distribution Patterns in Chromatin

  • Kang, Jin;Kim, AeRi
    • 대한의생명과학회지
    • /
    • 제28권2호
    • /
    • pp.109-119
    • /
    • 2022
  • Histone proteins can be modified by the addition of acetyl group or methyl group to specific amino acids. The modifications have different distribution patterns in chromatin. Recently, histone modifications are studied based on ChIP-seq data, which requires reasonable analysis of sequencing data depending on their distribution patterns. Here we have analyzed histone H3K27ac and H3K27me3 ChIP-seq data and it showed that the H3K27ac is enriched at narrow regions while H3K27me3 distributes broadly. To properly analyze the ChIP-seq data, we called peaks for H3K27ac and H3K27me3 using MACS2 (narrow option and broad option) and SICER methods, and compared propriety of the peaks using signal-to-background ratio. As results, H3K27ac-enriched regions were well identified by both methods while H3K27me3 peaks were properly identified by SICER, which indicates that peak calling method is more critical for histone modifications distributed broadly. When ChIP-seq data were compared in different sequencing depth (15, 30, 60, 120 M), high sequencing depth caused high false-positive rate in H3K27ac peak calling, but it reflected more properly the broad distribution pattern of H3K27me3. These results suggest that sequencing depth affects peak calling from ChIP-seq data and high sequencing depth is required for H3K27me3. Taken together, peak calling tool and sequencing depth should be chosen depending on the distribution pattern of histone modification in ChIP-seq analysis.

Generation and analysis of whole-genome sequencing data in human mammary epithelial cells

  • Jong-Lyul Park;Jae-Yoon Kim;Seon-Young Kim;Yong Sun Lee
    • Genomics & Informatics
    • /
    • 제21권1호
    • /
    • pp.11.1-11.5
    • /
    • 2023
  • Breast cancer is the most common cancer worldwide, and advanced breast cancer with metastases is incurable mainly with currently available therapies. Therefore, it is essential to understand molecular characteristics during the progression of breast carcinogenesis. Here, we report a dataset of whole genomes from the human mammary epithelial cell system derived from a reduction mammoplasty specimen. This system comprises pre-stasis 184D cells, considered normal, and seven cell lines along cancer progression series that are immortalized or additionally acquired anchorage-independent growth. Our analysis of the whole-genome sequencing (WGS) data indicates that those seven cancer progression series cells have somatic mutations whose number ranges from 8,393 to 39,564 (with an average of 30,591) compared to 184D cells. These WGS data and our mutation analysis will provide helpful information to identify driver mutations and elucidate molecular mechanisms for breast carcinogenesis.

Clinical Application of ABO Genotyping: 10 Years' Experience in the Southeastern Korea

  • Sae Am Song;Eun-Kyung Yu;Seung Hwan Oh
    • Journal of Interdisciplinary Genomics
    • /
    • 제6권1호
    • /
    • pp.6-13
    • /
    • 2024
  • Background: ABO typing is crucial for ensuring safe blood transfusion and is commonly performed by examining antigen-antibody interactions. Determining ABO blood group can be difficult when dealing with ABO discrepancy and ABO subgroups. ABO genotyping may be necessary to resolve ABO discrepancy. ABO genotyping primarily involves direct sequencing, with the possibility of using other molecular methods. Methods: PCR and direct sequencing of exons 6 and 7 were performed for total 108 samples from June 2010 to December 2019. Also, other molecular methods including cloning sequencing and short tandem repeat analysis were carried out just in case. Sequencing data were compared with allele information of blood group antigen mutation databases. Results: The predominant causal allele among 108 ABO discrepant cases was cis-AB01, with 28 cases. This was followed by rare ABO alleles (B309, B306, A204, Bw29, and Ax01) with 14 cases, and blood chimera with 5 cases. Five new alleles were identified during the investigation. Conclusion: This study reaffirms that cis-AB is the most common cause of inherited ABO discrepancies, and cis-AB01 is the most prevalent cis-AB allele in the Korean population, also in the southeastern region. In addition, we discovered five new alleles and five blood chimeras by adopting sequencing analysis and additional molecular techniques to resolve ABO discrepancies, which provide regional data on rare alleles. This study presents rare and new ABO alleles and blood chimeras identified over a ten-year period at two major university hospitals in Southeastern Korea.

Genetic tests by next-generation sequencing in children with developmental delay and/or intellectual disability

  • Han, Ji Yoon;Lee, In Goo
    • Clinical and Experimental Pediatrics
    • /
    • 제63권6호
    • /
    • pp.195-202
    • /
    • 2020
  • Developments in next-generation sequencing (NGS) techogies have assisted in clarifying the diagnosis and treatment of developmental delay/intellectual disability (DD/ID) via molecular genetic testing. Advances in DNA sequencing technology have not only allowed the evolution of targeted panels but also, and more currently enabled genome-wide analyses to progress from research era to clinical practice. Broad acceptance of accuracy-guided targeted gene panel, whole-exome sequencing (WES), and whole-genome sequencing (WGS) for DD/ID need prospective analyses of the increasing cost-effectiveness versus conventional genetic testing. Choosing the appropriate sequencing method requires individual planning. Data are required to guide best-practice recommendations for genomic testing, regarding various clinical phenotypes in an etiologic approach. Targeted panel testing may be recommended as a firsttier testing approach for children with DD/ID. Family-based trio testing by WES/WGS can be used as a second test for DD/ID in undiagnosed children who previously tested negative on a targeted panel. The role of NGS in molecular diagnostics, treatment, prediction of prognosis will continue to increase further in the coming years. Given the rapid pace of changes in the past 10 years, all medical providers should be aware of the changes in the transformative genetics field.

Transcriptional Heterogeneity of Cellular Senescence in Cancer

  • Junaid, Muhammad;Lee, Aejin;Kim, Jaehyung;Park, Tae Jun;Lim, Su Bin
    • Molecules and Cells
    • /
    • 제45권9호
    • /
    • pp.610-619
    • /
    • 2022
  • Cellular senescence plays a paradoxical role in tumorigenesis through the expression of diverse senescence-associated (SA) secretory phenotypes (SASPs). The heterogeneity of SA gene expression in cancer cells not only promotes cancer stemness but also protects these cells from chemotherapy. Despite the potential correlation between cancer and SA biomarkers, many transcriptional changes across distinct cell populations remain largely unknown. During the past decade, single-cell RNA sequencing (scRNA-seq) technologies have emerged as powerful experimental and analytical tools to dissect such diverse senescence-derived transcriptional changes. Here, we review the recent sequencing efforts that successfully characterized scRNA-seq data obtained from diverse cancer cells and elucidated the role of senescent cells in tumor malignancy. We further highlight the functional implications of SA genes expressed specifically in cancer and stromal cell populations in the tumor microenvironment. Translational research leveraging scRNA-seq profiling of SA genes will facilitate the identification of novel expression patterns underlying cancer susceptibility, providing new therapeutic opportunities in the era of precision medicine.

Deep sequencing of B cell receptor repertoire

  • Kim, Daeun;Park, Daechan
    • BMB Reports
    • /
    • 제52권9호
    • /
    • pp.540-547
    • /
    • 2019
  • Immune repertoire is a collection of enormously diverse adaptive immune cells within an individual. As the repertoire shapes and represents immunological conditions, identification of clones and characterization of diversity are critical for understanding how to protect ourselves against various illness such as infectious diseases and cancers. Over the past several years, fast growing technologies for high throughput sequencing have facilitated rapid advancement of repertoire research, enabling us to observe the diversity of repertoire at an unprecedented level. Here, we focus on B cell receptor (BCR) repertoire and review approaches to B cell isolation and sequencing library construction. These experiments should be carefully designed according to BCR regions to be interrogated, such as heavy chain full length, complementarity determining regions, and isotypes. We also highlight preprocessing steps to remove sequencing and PCR errors with unique molecular index and bioinformatics techniques. Due to the nature of massive sequence variation in BCR, caution is warranted when interpreting repertoire diversity from error-prone sequencing data. Furthermore, we provide a summary of statistical frameworks and bioinformatics tools for clonal evolution and diversity. Finally, we discuss limitations of current BCR-seq technologies and future perspectives on advances in repertoire sequencing.

Whole-exome sequencing analysis in a case of primary congenital glaucoma due to the partial uniparental isodisomy

  • Zavarzadeh, Parisima Ghaffarian;Bonyadi, Morteza;Abedi, Zahra
    • Genomics & Informatics
    • /
    • 제20권3호
    • /
    • pp.28.1-28.7
    • /
    • 2022
  • We described a clinical, laboratory, and genetic presentation of a pathogenic variant of the CYP1B1 gene through a report of a case of primary congenital glaucoma and a trio analysis of this candidate variant in the family with the Sanger sequencing method and eventually completed our study with the secondary/incidental findings. This study reports a rare case of primary congenital glaucoma, an 8-year-old female child with a negative family history of glaucoma and uncontrolled intraocular pressure. This case's whole-exome sequencing data analysis presents a homozygous pathogenic single nucleotide variant in the CYP1B1 gene (NM_000104:exon3:c.G1103A:p.R368H). At the same time, this pathogenic variant was obtained as a heterozygous state in her unaffected father but not her mother. The diagnosis was made based on molecular findings of whole-exome sequencing data analysis. Therefore, the clinical reports and bioinformatics findings supported the relation between the candidate pathogenic variant and the disease. However, it should not be forgotten that primary congenital glaucoma is not peculiar to the CYP1B1 gene. Since the chance of developing autosomal recessive disorders with low allele frequency and unrelated parents is extraordinary in offspring. However, further data analysis of whole-exome sequencing and Sanger sequencing method were applied to obtain the type of mutation and how it was carried to the offspring.