Browse > Article
http://dx.doi.org/10.5808/GI.2019.17.3.e26

FusionScan: accurate prediction of fusion genes from RNA-Seq data  

Kim, Pora (Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University)
Jang, Ye Eun (Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University)
Lee, Sanghyuk (Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University)
Abstract
Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.
Keywords
chromosomal translocation; fusion transcript; gene fusion; RNA-Seq; transcriptome sequencing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Kantarjian H, Sawyers C, Hochhaus A, Guilhot F, Schiffer C, Gambacorti-Passerini C, et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N Engl J Med 2002;346:645-652.   DOI
2 Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005;310:644-648.   DOI
3 Fernandez-Cuesta L, Plenker D, Osada H, Sun R, Menon R, Leenders F, et al. CD74-NRG1 fusions in lung adenocarcinoma. Cancer Discov 2014;4:415-422.   DOI
4 Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 2007;448:561-566.   DOI
5 Singh D, Chan JM, Zoppoli P, Niola F, Sullivan R, Castano A, et al. Transforming fusions of FGFR and TACC genes in human glioblastoma. Science 2012;337:1231-1235.   DOI
6 Guo G, Sun X, Chen C, Wu S, Huang P, Li Z, et al. Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation. Nat Genet 2013;45:1459-1463.   DOI
7 Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, et al. FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data. Genome Biol 2010;11:R104.   DOI
8 Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics 2011;27:2903-2904.   DOI
9 McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol 2011;7:e1001138.   DOI
10 Ge H, Liu K, Juan T, Fang F, Newman M, Hoeck W. FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution. Bioinformatics 2011;27:1922-1928.   DOI
11 Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 2011;12:R72.   DOI
12 Li Y, Chien J, Smith DI, Ma J. FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq. Bioinformatics 2011;27:1708-1710.   DOI
13 Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2013;14:506-519.   DOI
14 Carrara M, Beccuti M, Lazzarato F, Cavallo F, Cordero F, Donatelli S, et al. State-of-the-art fusion-finder algorithms sensitivity and specificity. Biomed Res Int 2013;2013:340620.
15 Liu C, Ma J, Chang CJ, Zhou X. FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq. BMC Bioinformatics 2013;14:193.   DOI
16 Jia W, Qiu K, He M, Song P, Zhou Q, Zhou F, et al. SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data. Genome Biol 2013;14:R12.   DOI
17 Chen K, Wallis JW, Kandoth C, Kalicki-Veizer JM, Mungall KL, Mungall AJ, et al. BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics 2012;28:1923-1924.   DOI
18 Benelli M, Pescucci C, Marseglia G, Severgnini M, Torricelli F, Magi A. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics 2012;28:3232-3239.   DOI
19 Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res 2002;12:656-664.   DOI
20 Wu J, Zhang W, Huang S, He Z, Cheng Y, Wang J, et al. SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads. Bioinformatics 2013;29:2971-2978.   DOI
21 Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-359.   DOI
22 Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res 2001;11:1725-1729.   DOI
23 Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 2005;21:1859-1875.   DOI
24 Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754-1760.   DOI
25 Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013;14:R36.   DOI
26 Communi D, Suarez-Huerta N, Dussossoy D, Savi P, Boeynaems JM. Cotranscription and intergenic splicing of human P2Y11 and SSF1 genes. J Biol Chem 2001;276:16561-16566.   DOI
27 Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 2010;38:e178.   DOI
28 Novo FJ, de Mendibil IO, Vizmanos JL. TICdb: a collection of gene-mapped translocation breakpoints in cancer. BMC Genomics 2007;8:33.   DOI
29 Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, et al. ChimerDB 2. 0: a knowledgebase for fusion genes updated. Nucleic Acids Res 2010;38:D81-D85.   DOI
30 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389-3402.   DOI
31 Smit A, Hubley R, Green P. RepeatMasker Open-3.0. 1996-2000. RepeatMasker, 1996. Accessed 2018 Dec 5. Available from: http://www.repeatmasker.org.
32 Ouedraogo M, Bettembourg C, Bretaudeau A, Sallou O, Diot C, Demeure O, et al. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes. PLoS One 2012;7:e50653.   DOI
33 Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA. Genenames. org: the HGNC resources in 2013. Nucleic Acids Res 2013;41:D545-D552.   DOI
34 Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, et al. Integrative analysis of the melanoma transcriptome. Genome Res 2010;20:413-427.   DOI
35 Kumar S, Vo AD, Qin F, Li H. Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Sci Rep 2016;6:21597.   DOI
36 Sakarya O, Breu H, Radovich M, Chen Y, Wang YN, Barbacioru C, et al. RNA-Seq mapping and detection of gene fusions with a suffix array algorithm. PLoS Comput Biol 2012;8:e1002464.   DOI
37 Lee M, Lee K, Yu N, Jang I, Choi I, Kim P, et al. ChimerDB 3. 0: an enhanced database for fusion genes from cancer transcriptome and literature data mining. Nucleic Acids Res 2017;45:D784-D789.   DOI
38 Davidson NM, Majewski IJ, Oshlack A. JAFFA: high sensitivity transcriptome-focused fusion gene detection. Genome Med 2015;7:43.   DOI