• Title/Summary/Keyword: NGS data analysis

Search Result 57, Processing Time 0.034 seconds

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

  • Ajaykumar, Atul;Yang, Jung Jin
    • Journal of Microbiology and Biotechnology
    • /
    • v.32 no.2
    • /
    • pp.149-159
    • /
    • 2022
  • Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.

Alternative Splicing Pattern Analysis from RNA-Seq data (RNA-Seq 데이터를 이용한 선택 스플라이싱 유형 분석)

  • Kong, Jin-Hwa;Lee, Jong-Keun;Lee, Un-Joo;Yoon, Jee-Hee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.37-40
    • /
    • 2011
  • 선택 스플라이싱 (alternative splicing)은 mRNA (messenger RNA)의 전구체인 pre-mRNA가 mRNA로 전사될 때 pre-mRNA의 엑손 영역들 (exons)이 여러 가지 유형 (pattern)으로 다시 연결되는 과정을 말한다. 선택 스플라이싱에 의해 하나의 유전자로부터 서로 다른 mRNA가 만들어 지고 서로 다른 이소형의 단백질 (protein isoforms)이 생성된다. 현재까지 알려진 선택 스플라이싱의 유형은 약 7가지 종류가 있으며, 유전자의 돌연변이 및 질병과 밀접한 연관성을 가지고 있는 것으로 알려져 있다. 본 연구에서는 차세대 시퀀싱 (Next Generation Sequencing : NGS) 기술로 생성된 RNA-Seq 데이터로부터 각 유전자 영역에 대한 선택 스플라이싱 유형을 분류/추출하는 새로운 알고리즘을 제안한다. 제안된 알고리즘에서는 RNA-Seq 데이터를 DNA 시퀀스와 mRNA 트랜스크립트 시퀀스에 동시 매핑하고, 각 엑손 영역에 정렬된 RNA-Seq 데이터의 커버리지 정보 및 엑손의 접합 (junction) 정보를 이용하여 발현된 트랜스크립트 (transcript)의 종류와 양을 측정한다. 알고리즘의 유효성을 보이기 위하여 시뮬레이션 데이터를 이용한 인간 유전자 영역에서의 선택 스플라이싱 유형 추출 실험을 수행하였으며, 검증된 선택 스플라이싱 DB와 비교, 검증하였다.

Precision Medicine in Head and Neck Cancer (두경부암에서 정밀의료)

  • Hye-sung Park;Jin-Hyoung Kang
    • Korean Journal of Head & Neck Oncology
    • /
    • v.39 no.1
    • /
    • pp.1-9
    • /
    • 2023
  • Technological advancement in human genome analysis and ICT (information & communication technologies) brought 'precision medicine' into our clinical practice. Precision medicine is a novel medical approach that provides personalized treatments tailored to each individual by precisely segmenting patient populations, based on robust data including a person's genetic information, disease information, lifestyle information, etc. Precision medicine has a potential to be applied to treating a range of tumors, in addition to non-small cell lung cancer, in which precision oncology has been actively practiced. In this article, we are reviewing precision medicine in head and neck cancer (HNC) with focus on tumor agnostic biomarkers and treatments such as NTRK, MSI-H/dMMR, TMB-H and BRAF V600E, all of which were recently approved by U.S. Food and Drug Administration (FDA).

Development of Simple Sequence Repeat Markers from Adenophora triphylla var. japonica (Regel) H. Hara using Next Generation Sequencing (차세대염기서열분석법을 이용한 잔대의 SSR 마커 개발)

  • Park, Ki Chan;Kim, Young Guk;Hwangbo, Kyeong;Gil, Jinsu;Chung, Hee;Park, Sin Gi;Hong, Chang Pyo;Lee, Yi
    • Korean Journal of Medicinal Crop Science
    • /
    • v.25 no.6
    • /
    • pp.411-417
    • /
    • 2017
  • Background: Adenophora triphylla var. japonica (Regel) H. Hara shows vegetative growth with radical leaves during the first year and shows reproductive growth with cauline leaves and bolting during the second year. In addition, the shape of the plant varies within the same species. For this reason, there are limitations to classifying the species by visual examination. However, there is not sufficient genetic information or molecular tools to analyze the genetic diversity of the plant. Methods and Results: Approximately 34.59 Gbp of raw data containing 342,487,502 reads was obtained from next generation sequencing (NGS) and these reads were assembled into 357,211 scaffolds. A total of 84,106 simple sequence repeat (SSR) regions were identified and 14,133 primer sets were designed. From the designed primer sets, 95 were randomly selected and were applied to the genomic DNA which was extracted from five plants and pooled. Thirty-nine primer sets showing more than two bands were finally selected as SSR markers, and were used for the genetic relationship analysis. Conclusions: The 39 novel SSR markers developed in this study could be used for the genetic diversity analysis, variety identification, new variety development and molecular breeding of A. triphylla.

Analysis of inquiry activities in the life science chapters of middle school 'science' textbooks: Focusing on Science Process Skills and 8 Scientific Practices (중학교 과학교과서 생명과학 단원의 탐구 활동 분석: 과학탐구 기능과 8가지 과학 실천을 중심으로)

  • Kim, Mijung;Hong, Juneuy;Kim, Sung-Ha;Lim, Chae-Seong
    • Journal of Science Education
    • /
    • v.41 no.3
    • /
    • pp.318-333
    • /
    • 2017
  • In this study, we analyzed activities in life science chapters of middle school 'science' textbooks for the 2009 revised Korea national curriculum and examined the difference between the analysis based on scientific practices and the analysis based on inquiry skills. As a results, there was a lot of inquiry skills in the order of 'reasoning', 'observing', 'classification' in the all of grade. In scientific practices, 'data analysis and interpretation' and 'constructing explanations and devising problem solving' were biased. This shows that life science inquiry activities in middle school 'science' textbooks are lacking in diversity in scientific practice elements as well as inquiry skills, and that the goals of the activities are limited. In addition, through the interrelationships between scientific inquiry skills and scientific practice elements, we examined contents relevance in the transition from inquiry function center to scientific practice, and compared with the results of inquiry activities in textbook, The results of this study were matched monotonously due to the tendency to basic inquiry-data interpretation / basic inquiry-explanation. This comes from results of the lack of diversity in activities presented in middle school 'science' textbooks. In this study, it is suggested that efforts should be made to include diverse scientific practice elements in the process of realizing 2015 revised Korea national curriculum from the simple and diversity-less inquiry activity through analyzing the textbooks of the 2009 revised Korea national curriculum.

Development of SNP marker set for marker-assisted backcrossing (MABC) in cultivating tomato varieties

  • Park, GiRim;Jang, Hyun A;Jo, Sung-Hwan;Park, Younghoon;Oh, Sang-Keun;Nam, Moon
    • Korean Journal of Agricultural Science
    • /
    • v.45 no.3
    • /
    • pp.385-400
    • /
    • 2018
  • Marker-assisted backcrossing (MABC) is useful for selecting offspring with a highly recovered genetic background for a recurrent parent at early generation unlike rice and other field crops. Molecular marker sets applicable to practical MABC are scarce in vegetable crops including tomatoes. In this study, we used the National Center for Biotechnology Information- short read archive (NCBI-SRA) database that provided the whole genome sequences of 234 tomato accessions and selected 27,680 tag-single nucleotide polymorphisms (tag-SNPs) that can identify haplotypes in the tomato genome. From this SNP dataset, a total of 143 tag-SNPs that have a high polymorphism information content (PIC) value (> 0.3) and are physically evenly distributed on each chromosome were selected as a MABC marker set. This marker set was tested for its polymorphism in each pairwise cross combination constructed with 124 of the 234 tomato accessions, and a relatively high number of SNP markers polymorphic for the cross combination was observed. The reliability of the MABC SNP set was assessed by converting 18 SNPs into Luna probe-based high-resolution melting (HRM) markers and genotyping nine tomato accessions. The results show that the SNP information and HRM marker genotype matched in 98.6% of the experiment data points, indicating that our sequence analysis pipeline for SNP mining worked successfully. The tag-SNP set for the MABC developed in this study can be useful for not only a practical backcrossing program but also for cultivar identification and F1 seed purity test in tomatoes.

Experimental Infection of Different Tomato Genotypes with Tomato mosaic virus Led to a Low Viral Population Heterogeneity in the Capsid Protein Encoding Region

  • Sihelska, Nina;Vozarova, Zuzana;Predajna, Lukas;Soltys, Katarina;Hudcovicova, Martina;Mihalik, Daniel;Kraic, Jan;Mrkvova, Michaela;Kudela, Otakar;Glasa, Miroslav
    • The Plant Pathology Journal
    • /
    • v.33 no.5
    • /
    • pp.508-513
    • /
    • 2017
  • The complete genome sequence of a Slovak SL-1 isolate of Tomato mosaic virus (ToMV) was determined from the next generation sequencing (NGS) data, further confirming a limited sequence divergence in this tobamovirus species. Tomato genotypes Monalbo, Mobaci and Moperou, respectively carrying the susceptible tm-2 allele or the Tm-1 and Tm-2 resistant alleles, were tested for their susceptibility to ToMV SL-1. Although the three tomato genotypes accumulated ToMV SL-1 to similar amounts as judged by semiquantitative DAS-ELISA, they showed variations in the rate of infection and symptomatology. Possible differences in the intra-isolate variability and polymorphism between viral populations propagating in these tomato genotypes were evaluated by analysis of the capsid protein (CP) encoding region. Irrespective of genotype infected, the intra-isolate haplotype structure showed the presence of the same highly dominant CP sequence and the low level of population diversity (0.08-0.19%). Our results suggest that ToMV CP encoding sequence is relatively stable in the viral population during its replication in vivo and provides further demonstration that RNA viruses may show high sequence stability, probably as a result of purifying selection.

A Framework of Intelligent Middleware for DNA Sequence Analysis in Cloud Computing Environment (DNA 서열 분석을 위한 클라우드 컴퓨팅 기반 지능형 미들웨어 설계)

  • Oh, Junseok;Lee, Yoonjae;Lee, Bong Gyou
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.29-43
    • /
    • 2014
  • The development of NGS technologies, such as scientific workflows, has reduced the time required for decoding DNA sequences. Although the automated technologies change the genome sequence analysis environment, limited computing resources still pose problems for the analysis. Most scientific workflow systems are pre-built platforms and are highly complex because a lot of the functions are implemented into one system platform. It is also difficult to apply components of pre-built systems to a new system in the cloud environment. Cloud computing technologies can be applied to the systems to reduce analysis time and enable simultaneous analysis of massive DNA sequence data. Web service techniques are also introduced for improving the interoperability between DNA sequence analysis systems. The workflow-based middleware, which supports Web services, DBMS, and cloud computing, is proposed in this paper for expecting to reduceanalysis time and aiding lightweight virtual instances. It uses DBMS for managing the pipeline status and supporting the creation of lightweight virtual instances in the cloud environment. Also, the RESTful Web services with simple URI and XML contents are applied for improving the interoperability. The performance test of the system needs to be conducted by comparing results other developed DNA analysis services at the stabilization stage.

Analysis of the Distribution and Diversity of the Microbial Community in Kimchi Samples from Central and Southern Regions in Korea Using Next-generation Sequencing (차세대 염기서열 분석법을 이용한 우리나라 중부지방과 남부지방의 김치 미생물 군집의 분포 및 다양성 분석)

  • Yunjeong Noh;Gwangsu Ha;Jinwon Kim;Soo-Young Lee;Do-Youn Jeong;Hee-Jong Yang
    • Journal of Life Science
    • /
    • v.33 no.1
    • /
    • pp.25-33
    • /
    • 2023
  • The fermentation process of kimchi, which is a traditional Korean food, influences the resulting compo- sition of microorganisms, such as the genera Leuconostoc, Weissella, and Lactobacillus. In addition, several factors, including the type of kimchi, fermentation conditions, materials, and ingredients, can influence the distribution of the kimchi microbial community. In this study, next-generation sequencing (NGS) of kimchi samples obtained from central (Gangwon-do and Gyeonggi-do) and southern (Jeolla-do and Gyeongsang-do) regions in Korea was performed, and the microbial communities in samples from the two regions were compared. Good's coverage prediction for all samples was higher than 99%, indicating that there was sufficient reliability for comparative analysis. However, in a α -diversity analysis, there was no significant difference in species richness and diversity between samples. The Firmicutes phylum was common in both regions. At the species level, Weissella kandleri dominated in central (46.5%) and southern (30.8%) regions. Linear discriminant analysis effect size (LEfSe) analysis was performed to identify biomarkers representing the microbial community in each region. The LEfSe results pointed to statistically significant differences between the two regions in community composition, with Leuconostocaceae (71.4%) dominating in the central region and Lactobacillaceae (61.0%) dominating in the southern region. Based on these results, it can be concluded that the microbial communities of kimchi are significantly influenced by regional properties and that it can provide more useful scientific data to study the relationship between regional characteristics of kimchi and their microbial distribution.

Determining the doses of probiotics for application in Scylla tranquebarica (Fabricius 1798) larvae to produce crablet

  • Gunarto, Gunarto;Yustian Rovi Alfiansah;Muliani Muliani;Bunga Rante Tampangalo;Herlinah Herlinah;Nurbaya Nurbaya;Rosmiati Rosmiati
    • Fisheries and Aquatic Sciences
    • /
    • v.27 no.3
    • /
    • pp.180-194
    • /
    • 2024
  • Mass mortalities of mud crab Scylla spp. larvae due to pathogenic Vibrio spp. outbreaks have frequently occurred in hatcheries. To overcome this problem, probiotics containing Bacillus subtilis bacteria are applied to inhibit pathogenic ones. We tested different doses of probiotic-containing B. subtilis (108 CFU/g) on the Scylla tranquebarica larvae and investigated the microbiota population, including Vibrio. Water quality, larvae development, and crablet production were also monitored. The recently hatched larvae were grown in twelve conical fiber tanks filled with 200 L sterile seawater, with a salinity of 30 ppt at a stocking density of 80 ind/L. Four different doses of probiotics were applied in the larvae rearing, namely, A = 2.5 mg/L, B = 5 mg/L, C = 7.5 mg/L, and D = 0 mg/L, with three replicates. Next-generation sequencing analysis was used to obtain the abundance of microbes in the whole body of megalopa and the water media for larvae rearing after applying probiotics. Sixteen Raw Deoxyribonucleic Acid samples (eight from a whole body of megalopa extraction from four treatments of probiotics defined as A, B, C, D, and eight from water media extraction from four treatments of probiotic defined as E, F, G, H) were prepared. Then, they were sent to the Genetics Science Laboratory for NGS analysis. Ammonia, nitrite, total organic matter (TOM), larvae, and crablet production were monitored. Based on the Next-generation sequencing analysis data, the Vibrio spp. decreased significantly (p < 0.05) than control test (D) in megalopa-applied probiotics at the doses of 2.5 mg/L (A) and 7.5 mg/L (C) and in the water media for megalopa rearing treated with probiotics at the dosage of 5.0 mg/L (F). Ammonia in the zoea stage in B treatment and TOM in the zoea and megalopa stage in B and C treatments were decreased significantly (p < 0.05). It impacts the higher number of zoea survival in treatments B and C. Finally, it results in a significantly high crablet production in treatments B and C. Therefore, the dosage of 5 mg/L to 7.5 mg/L improves crablet S. tranquebarica production significantly.