References
- Bradley, P., Den Bakker, H.C., Rocha, E.P., McVean, G. and Iqbal, Z.: "Ultrafast search of all deposited bacterial and viral genomic data". Nature biotechnology, vol. 37(2), pp.152-159(2019) https://doi.org/10.1038/s41587-018-0010-1
- Chikhi, Rayan, Jan Holub, and Paul Medvedev: "Data structures to represent sets of k-long DNA sequences". arXiv preprint arXiv:1903.12312(2019)
- Brinda, Karel, Michael Baym, and Gregory Kucherov: "Simplitigs as an efficient and scalable representation of de Bruijn graphs". Genome biology 22, vol. 1, pp.1-24(2021)
- Kryukov, K., Ueda, M.T., Nakagawa, S. and Imanishi, T.: "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences". Bioinformatics, vol. 35(19), pp.3826-3828(2019) https://doi.org/10.1093/bioinformatics/btz144
- Al-Okaily, A., Almarri, B., Al Yami, S. and Huang, C.H.: "Toward a better compression for DNA sequences using Huffman encoding". Journal of Computational Biology, vol. 24(4), pp.280-288(2017) https://doi.org/10.1089/cmb.2016.0151
- Solomon, B. and Kingsford, C.: "Fast search of thousands of short-read sequencing experiments". Nature biotechnology, vol. 34(3), pp.300-302(2016) https://doi.org/10.1038/nbt.3442
- Khairy, Reem, Mona Safar, and Watheq El-Kharashi. M.: "Bloom filter acceleration: A high level synthesis approach". In the proceedings of 30th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1-6. Canada(2017)
- Deorowicz, S.: "FQSqueezer: k-mer-based compression of sequencing data". Scientific reports, vol. 10(1), pp.1-9 (2020) https://doi.org/10.1038/s41598-019-56847-4
- Bingmann, Timo, Phelim Bradley, Florian Gauger, and Zamin Iqbal: "Cobs: a compact bit-sliced signature index. In: Proceedings of International Symposium on String Processing and Information Retrieval, pp. 285-303. Springer, Cham(2019)
- Bradley, P., Den Bakker, H.C., Rocha, E.P., McVean, G. and Iqbal, Z.: "Ultrafast search of all deposited bacterial and viral genomic data". Nature biotechnology, vol. 37(2), pp.152-159(2019) https://doi.org/10.1038/s41587-018-0010-1
- Holley, G., Wittler, R. and Stoye, J.: "Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage". Algorithms for Molecular Biology, vol. 11(1), pp.1-9(2016) https://doi.org/10.1186/s13015-016-0063-y
- Marchiori, D. and Comin, M.: "SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers". Bioinformatics, pp. 59-67(2017)
- Chikhi, R., Holub, J. and Medvedev, P.: "Data structures to represent sets of k-long DNA sequences", arXiv preprint arXiv: 1903.12312(2019)
- Kryukov, K., Ueda, M.T., Nakagawa, S. and Imanishi, T.: "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences". Bioinformatics, vol. 35(19), pp. 3826-3828(2019) https://doi.org/10.1093/bioinformatics/btz144
- Pratas, D., Pinho, A.J. and Ferreira, P.J.: "Efficient compression of genomic sequences". In Proceedings of Data compression conference (DCC), pp. 231-240, IEEE, USA(2016)
- Chandak, S., Tatwawadi, K., Ochoa, I., Hernaez, M. and Weissman, T.: "SPRING: a next-generation compressor for FASTQ data". Bioinformatics, vol. 35(15), pp. 2674-2676(2019) https://doi.org/10.1093/bioinformatics/bty1015
- Liu, Y., Yu, Z., Dinger, M.E. and Li, J.: "Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression". Bioinformatics, vol. 35(12), pp.2066-2074(2019) https://doi.org/10.1093/bioinformatics/bty936
- Hernaez, M., Ochoa, I. and Weissman, T.: "A cluster-based approach to compression of quality scores". In 2016 Data Compression Conference (DCC), pp. 261-270, IEEE, USA(2016)
- Pratas, D., Hosseini, M., Silva, J.M. and Pinho, A.J.: "A reference-free lossless compression algorithm for DNA sequences using a competitive prediction of two classes of weighted models". Entropy, vol. 21(11), p.1074(2019) https://doi.org/10.3390/e21111074
- Long, H., Sung, W., Kucukyildirim, S., Williams, E., Miller, S.F., Guo, W., Patterson, C., Gregory, C., Strauss, C., Stone, C. and Berne, C.: "Evolutionary determinants of genome-wide nucleotide composition". Nature ecology & evolution, vol. 2(2), pp.237-240(2018) https://doi.org/10.1038/s41559-017-0425-y
- Hernaez, M., Pavlichin, D., Weissman, T. and Ochoa, I.: "Genomic data compression". Annual Review of Biomedical Data Science, vol. 2, pp.19-37(2019) https://doi.org/10.1146/annurev-biodatasci-072018-021229
- Hosseini, M., Pratas, D. and Pinho, A.J.: "A survey on data compression methods for biological sequences". Information, vol. 7(4), p.56(2016) https://doi.org/10.3390/info7040056
- Bonfield, J.K., McCarthy, S.A. and Durbin, R.: "Crumble: reference free lossy compression of sequence quality values". Bioinformatics, vol. 35(2), pp. 337-339(2019) https://doi.org/10.1093/bioinformatics/bty608
- Chandak, S., Tatwawadi, K. and Weissman, T.: "Compression of genomic sequencing reads via hashbased reordering: algorithm and analysis". Bioinformatics, vol. 34(4), pp. 558- 567(2018) https://doi.org/10.1093/bioinformatics/btx639
- Ginart, A.A., Hui, J., Zhu, K., Numanagic, I., Courtade, T.A., Sahinalp, S.C. and David, N.T.: "Optimal compressed representation of high throughput sequence data via light assembly". Nature communications, vol. 9(1), pp. 1-9(2018) https://doi.org/10.1038/s41467-017-02088-w
- Ochoa, I., Hernaez, M., Goldfeder, R., Weissman, T. and Ashley, E.: "Effect of lossy compression of quality scores on variant calling". Briefings in bioinformatics, vol. 18(2), pp. 183-194(2017)
- Pamela Vinitha, E., Gopalakrishnan, G. and Karunakaran, M.: "An optimal seed based compression algorithm for DNA sequences". Advances in Bioinformatics, vol. 2016, Article ID 3528406(2016)
- Punitha K. and Murugan A.: "Pattern Matching Compression Algorithm for DNA Sequences", In: Proceedings of the International Conference on Sustainable Expert System, vol.176, pp. 387-402, Nepal(2021).