Search | Korea Science

Analysis of Whole Transcriptome Sequencing Data: Workflow and Software

Yang, In Seok;Kim, Sangwoo
- Genomics & Informatics
- /
- v.13 no.4
- /
- pp.119-125
- /
- 2015
RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification.
https://doi.org/10.5808/GI.2015.13.4.119 인용 PDF KSCI

Variational Autoencoder Based Dimension Reduction and Clustering for Single-Cell RNA-seq Gene Expression (단일세포 RNA-SEQ의 유전자 발현 군집화를 위한 변이 자동인코더 기반의 차원감소와 군집화)

Chi, Sang-Mun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.11
- /
- pp.1512-1518
- /
- 2021
Since single cell RNA sequencing provides the expression profiles of individual cells, it provides higher cellular differential resolution than traditional bulk RNA sequencing. Using these single cell RNA sequencing data, clustering analysis is generally conducted to find cell types and understand high level biological processes. In order to effectively process the high-dimensional single cell RNA sequencing data fir the clustering analysis, this paper uses a variational autoencoder to transform a high dimensional data space into a lower dimensional latent space, expecting to produce a latent space that can give more accurate clustering results. By clustering the features in the transformed latent space, we compare the performance of various classical clustering methods for single cell RNA sequencing data. Experimental results demonstrate that the proposed framework outperforms many state-of-the-art methods under various clustering performance metrics.
https://doi.org/10.6109/jkiice.2021.25.11.1512 인용 PDF KSCI

A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning (앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구)

Geon AN;JooYong PARK
- Journal of Korea Artificial Intelligence Association
- /
- v.2 no.1
- /
- pp.7-14
- /
- 2024
In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.
https://doi.org/10.24225/jkaia.2024.2.1.7 인용 PDF

COEX-Seq: Convert a Variety of Measurements of Gene Expression in RNA-Seq

Kim, Sang Cheol;Yu, Donghyeon;Cho, Seong Beom
- Genomics & Informatics
- /
- v.16 no.4
- /
- pp.36.1-36.3
- /
- 2018
Next generation sequencing (NGS), a high-throughput DNA sequencing technology, is widely used for molecular biological studies. In NGS, RNA-sequencing (RNA-Seq), which is a short-read massively parallel sequencing, is a major quantitative transcriptome tool for different transcriptome studies. To utilize the RNA-Seq data, various quantification and analysis methods have been developed to solve specific research goals, including identification of differentially expressed genes and detection of novel transcripts. Because of the accumulation of RNA-Seq data in the public databases, there is a demand for integrative analysis. However, the available RNA-Seq data are stored in different formats such as read count, transcripts per million, and fragments per kilobase million. This hinders the integrative analysis of the RNA-Seq data. To solve this problem, we have developed a web-based application using Shiny, COEX-seq (Convert a Variety of Measurements of Gene Expression in RNA-Seq) that easily converts data in a variety of measurement formats of gene expression used in most bioinformatic tools for RNA-Seq. It provides a workflow that includes loading data set, selecting measurement formats of gene expression, and identifying gene names. COEX-seq is freely available for academic purposes and can be run on Windows, Mac OS, and Linux operating systems. Source code, sample data sets, and supplementary documentation are available as well.
https://doi.org/10.5808/GI.2018.16.4.e36 인용 PDF KSCI

Development of an RNA sequencing panel to detect gene fusions in thyroid cancer

Kim, Dongmoung;Jung, Seung-Hyun;Chung, Yeun-Jun
- Genomics & Informatics
- /
- v.19 no.4
- /
- pp.41.1-41.10
- /
- 2021
In addition to mutations and copy number alterations, gene fusions are commonly identified in cancers. In thyroid cancer, fusions of important cancer-related genes have been commonly reported; however, extant panels do not cover all clinically important gene fusions. In this study, we aimed to develop a custom RNA-based sequencing panel to identify the key fusions in thyroid cancer. Our ThyChase panel was designed to detect 87 types of gene fusion. As quality control of RNA sequencing, five housekeeping genes were included in this panel. When we applied this panel for the analysis of fusions containing reference RNA (HD796), three expected fusions (EML4-ALK, CCDC6-RET, and TPM3-NTRK1) were successfully identified. We confirmed the fusion breakpoint sequences of the three fusions from HD796 by Sanger sequencing. Regarding the limit of detection, this panel could detect the target fusions from a tumor sample containing a 1% fusion-positive tumor cellular fraction. Taken together, our ThyChase panel would be useful to identify gene fusions in the clinical field.
https://doi.org/10.5808/gi.21061 인용 PDF KSCI

Transcriptomic Analysis of Cellular Senescence: One Step Closer to Senescence Atlas

Kim, Sohee;Kim, Chuna
- Molecules and Cells
- /
- v.44 no.3
- /
- pp.136-145
- /
- 2021
Senescent cells that gradually accumulate during aging are one of the leading causes of aging. While senolytics can improve aging in humans as well as mice by specifically eliminating senescent cells, the effect of the senolytics varies in different cell types, suggesting variations in senescence. Various factors can induce cellular senescence, and the rate of accumulation of senescent cells differ depending on the organ. In addition, since the heterogeneity is due to the spatiotemporal context of senescent cells, in vivo studies are needed to increase the understanding of senescent cells. Since current methods are often unable to distinguish senescent cells from other cells, efforts are being made to find markers commonly expressed in senescent cells using bulk RNA-sequencing. Moreover, single-cell RNA (scRNA) sequencing, which analyzes the transcripts of each cell, has been utilized to understand the in vivo characteristics of the rare senescent cells. Recently, transcriptomic cell atlases for each organ using this technology have been published in various species. Novel senescent cells that do not express previously established marker genes have been discovered in some organs. However, there is still insufficient information on senescent cells due to the limited throughput of the scRNA sequencing technology. Therefore, it is necessary to improve the throughput of the scRNA sequencing technology or develop a way to enrich the rare senescent cells. The in vivo senescent cell atlas that is established using rapidly developing single-cell technologies will contribute to the precise rejuvenation by specifically removing senescent cells in each tissue and individual.
https://doi.org/10.14348/molcells.2021.2239 인용 PDF KSCI

Type-specific Amplification of 5S rRNA from Panax ginseng Cultivars Using Touchdown (TD) PCR and Direct Sequencing

Sun, Hun;Wang, Hong-Tao;Kwon, Woo-Saeng;Kim, Yeon-Ju;Yang, Deok-Chun
- Journal of Ginseng Research
- /
- v.33 no.1
- /
- pp.55-58
- /
- 2009
Generally, the direct sequencing through PCR is faster, easier, cheaper, and more practical than clone sequencing. Frequently, standard PCR amplification is usually interpreted by mispriming internal or external regions of the target template. Normally, DNA fragments were eluted from the gel using Gel extraction kit and subjected to direct sequencing or cloning sequencing. Cloning sequencing has often troublesome and needs more time to analyze for many samples. Since touchdown (TD) PCR can generate sufficient and highly specific amplification, it reduces unwanted amplicon generation. Accordingly, TD PCR is a good method for direct sequencing due to amplifying wanted fragment. In plants the 5S-rRNA gene is separated by simple spacers. The 5S-rRNA gene sequence is very well-conserved between plant species while the spacer is species-specific. Therefore, the sequence has been used for phylogenetic studies and species identification. But frequent occurrences of spurious bands caused by complex genomes are encountered in the product spectrum of standard PCR amplification. In conclusion, the TD PCR method can be applied easily to amplify main 5S-rRNA and direct sequencing of panax ginseng cultivars.
https://doi.org/10.5142/JGR.2009.33.1.055 인용 PDF KSCI

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

Ajaykumar, Atul;Yang, Jung Jin
- Journal of Microbiology and Biotechnology
- /
- v.32 no.2
- /
- pp.149-159
- /
- 2022
Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.
https://doi.org/10.4014/jmb.2110.10017 인용 PDF KSCI

One-step spectral clustering of weighted variables on single-cell RNA-sequencing data (단세포 RNA 시퀀싱 데이터를 위한 가중변수 스펙트럼 군집화 기법)

Park, Min Young;Park, Seyoung
- The Korean Journal of Applied Statistics
- /
- v.33 no.4
- /
- pp.511-526
- /
- 2020
Single-cell RNA-sequencing (scRNA-seq) data consists of each cell's RNA expression extracted from large populations of cells. One main purpose of using scRNA-seq data is to identify inter-cellular heterogeneity. However, scRNA-seq data pose statistical challenges when applying traditional clustering methods because they have many missing values and high level of noise due to technical and sampling issues. In this paper, motivated by analyzing scRNA-seq data, we propose a novel spectral-based clustering method by imposing different weights on genes when computing a similarity between cells. Assigning weights on genes and clustering cells are performed simultaneously in the proposed clustering framework. We solve the proposed non-convex optimization using an iterative algorithm. Both real data application and simulation study suggest that the proposed clustering method better identifies underlying clusters compared with existing clustering methods.
https://doi.org/10.5351/KJAS.2020.33.4.511 인용 PDF KSCI

Identification of Alternative Splicing and Fusion Transcripts in Non-Small Cell Lung Cancer by RNA Sequencing

Hong, Yoonki;Kim, Woo Jin;Bang, Chi Young;Lee, Jae Cheol;Oh, Yeon-Mok
- Tuberculosis and Respiratory Diseases
- /
- v.79 no.2
- /
- pp.85-90
- /
- 2016
Background: Lung cancer is the most common cause of cancer related death. Alterations in gene sequence, structure, and expression have an important role in the pathogenesis of lung cancer. Fusion genes and alternative splicing of cancer-related genes have the potential to be oncogenic. In the current study, we performed RNA-sequencing (RNA-seq) to investigate potential fusion genes and alternative splicing in non-small cell lung cancer. Methods: RNA was isolated from lung tissues obtained from 86 subjects with lung cancer. The RNA samples from lung cancer and normal tissues were processed with RNA-seq using the HiSeq 2000 system. Fusion genes were evaluated using Defuse and ChimeraScan. Candidate fusion transcripts were validated by Sanger sequencing. Alternative splicing was analyzed using multivariate analysis of transcript sequencing and validated using quantitative real time polymerase chain reaction. Results: RNA-seq data identified oncogenic fusion genes EML4-ALK and SLC34A2-ROS1 in three of 86 normal-cancer paired samples. Nine distinct fusion transcripts were selected using DeFuse and ChimeraScan; of which, four fusion transcripts were validated by Sanger sequencing. In 33 squamous cell carcinoma, 29 tumor specific skipped exon events and six mutually exclusive exon events were identified. ITGB4 and PYCR1 were top genes that showed significant tumor specific splice variants. Conclusion: In conclusion, RNA-seq data identified novel potential fusion transcripts and splice variants. Further evaluation of their functional significance in the pathogenesis of lung cancer is required.
https://doi.org/10.4046/trd.2016.79.2.85 인용 PDF KSCI

Search Result 1,181, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)