DOI QR코드

DOI QR Code

FusionScan: accurate prediction of fusion genes from RNA-Seq data

  • Kim, Pora (Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University) ;
  • Jang, Ye Eun (Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University) ;
  • Lee, Sanghyuk (Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University)
  • Received : 2019.02.26
  • Accepted : 2019.03.21
  • Published : 2019.09.30

Abstract

Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.

Keywords

References

  1. Kantarjian H, Sawyers C, Hochhaus A, Guilhot F, Schiffer C, Gambacorti-Passerini C, et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N Engl J Med 2002;346:645-652. https://doi.org/10.1056/NEJMoa011573
  2. Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005;310:644-648. https://doi.org/10.1126/science.1117679
  3. Fernandez-Cuesta L, Plenker D, Osada H, Sun R, Menon R, Leenders F, et al. CD74-NRG1 fusions in lung adenocarcinoma. Cancer Discov 2014;4:415-422. https://doi.org/10.1158/2159-8290.CD-13-0633
  4. Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 2007;448:561-566. https://doi.org/10.1038/nature05945
  5. Singh D, Chan JM, Zoppoli P, Niola F, Sullivan R, Castano A, et al. Transforming fusions of FGFR and TACC genes in human glioblastoma. Science 2012;337:1231-1235. https://doi.org/10.1126/science.1220834
  6. Guo G, Sun X, Chen C, Wu S, Huang P, Li Z, et al. Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation. Nat Genet 2013;45:1459-1463. https://doi.org/10.1038/ng.2798
  7. Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, et al. FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data. Genome Biol 2010;11:R104. https://doi.org/10.1186/gb-2010-11-10-r104
  8. Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics 2011;27:2903-2904. https://doi.org/10.1093/bioinformatics/btr467
  9. McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol 2011;7:e1001138. https://doi.org/10.1371/journal.pcbi.1001138
  10. Ge H, Liu K, Juan T, Fang F, Newman M, Hoeck W. FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution. Bioinformatics 2011;27:1922-1928. https://doi.org/10.1093/bioinformatics/btr310
  11. Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 2011;12:R72. https://doi.org/10.1186/gb-2011-12-8-r72
  12. Li Y, Chien J, Smith DI, Ma J. FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq. Bioinformatics 2011;27:1708-1710. https://doi.org/10.1093/bioinformatics/btr265
  13. Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2013;14:506-519. https://doi.org/10.1093/bib/bbs044
  14. Carrara M, Beccuti M, Lazzarato F, Cavallo F, Cordero F, Donatelli S, et al. State-of-the-art fusion-finder algorithms sensitivity and specificity. Biomed Res Int 2013;2013:340620.
  15. Liu C, Ma J, Chang CJ, Zhou X. FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq. BMC Bioinformatics 2013;14:193. https://doi.org/10.1186/1471-2105-14-193
  16. Chen K, Wallis JW, Kandoth C, Kalicki-Veizer JM, Mungall KL, Mungall AJ, et al. BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics 2012;28:1923-1924. https://doi.org/10.1093/bioinformatics/bts272
  17. Benelli M, Pescucci C, Marseglia G, Severgnini M, Torricelli F, Magi A. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics 2012;28:3232-3239. https://doi.org/10.1093/bioinformatics/bts617
  18. Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res 2002;12:656-664. https://doi.org/10.1101/gr.229202
  19. Jia W, Qiu K, He M, Song P, Zhou Q, Zhou F, et al. SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data. Genome Biol 2013;14:R12. https://doi.org/10.1186/gb-2013-14-2-r12
  20. Wu J, Zhang W, Huang S, He Z, Cheng Y, Wang J, et al. SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads. Bioinformatics 2013;29:2971-2978. https://doi.org/10.1093/bioinformatics/btt522
  21. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-359. https://doi.org/10.1038/nmeth.1923
  22. Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res 2001;11:1725-1729. https://doi.org/10.1101/gr.194201
  23. Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 2005;21:1859-1875. https://doi.org/10.1093/bioinformatics/bti310
  24. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754-1760. https://doi.org/10.1093/bioinformatics/btp324
  25. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013;14:R36. https://doi.org/10.1186/gb-2013-14-4-r36
  26. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 2010;38:e178. https://doi.org/10.1093/nar/gkq622
  27. Novo FJ, de Mendibil IO, Vizmanos JL. TICdb: a collection of gene-mapped translocation breakpoints in cancer. BMC Genomics 2007;8:33. https://doi.org/10.1186/1471-2164-8-33
  28. Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, et al. ChimerDB 2. 0: a knowledgebase for fusion genes updated. Nucleic Acids Res 2010;38:D81-D85. https://doi.org/10.1093/nar/gkp982
  29. Communi D, Suarez-Huerta N, Dussossoy D, Savi P, Boeynaems JM. Cotranscription and intergenic splicing of human P2Y11 and SSF1 genes. J Biol Chem 2001;276:16561-16566. https://doi.org/10.1074/jbc.M009609200
  30. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389-3402. https://doi.org/10.1093/nar/25.17.3389
  31. Smit A, Hubley R, Green P. RepeatMasker Open-3.0. 1996-2000. RepeatMasker, 1996. Accessed 2018 Dec 5. Available from: http://www.repeatmasker.org.
  32. Ouedraogo M, Bettembourg C, Bretaudeau A, Sallou O, Diot C, Demeure O, et al. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes. PLoS One 2012;7:e50653. https://doi.org/10.1371/journal.pone.0050653
  33. Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA. Genenames. org: the HGNC resources in 2013. Nucleic Acids Res 2013;41:D545-D552. https://doi.org/10.1093/nar/gks1066
  34. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, et al. Integrative analysis of the melanoma transcriptome. Genome Res 2010;20:413-427. https://doi.org/10.1101/gr.103697.109
  35. Sakarya O, Breu H, Radovich M, Chen Y, Wang YN, Barbacioru C, et al. RNA-Seq mapping and detection of gene fusions with a suffix array algorithm. PLoS Comput Biol 2012;8:e1002464. https://doi.org/10.1371/journal.pcbi.1002464
  36. Lee M, Lee K, Yu N, Jang I, Choi I, Kim P, et al. ChimerDB 3. 0: an enhanced database for fusion genes from cancer transcriptome and literature data mining. Nucleic Acids Res 2017;45:D784-D789. https://doi.org/10.1093/nar/gkw1083
  37. Davidson NM, Majewski IJ, Oshlack A. JAFFA: high sensitivity transcriptome-focused fusion gene detection. Genome Med 2015;7:43. https://doi.org/10.1186/s13073-015-0167-x
  38. Kumar S, Vo AD, Qin F, Li H. Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Sci Rep 2016;6:21597. https://doi.org/10.1038/srep21597