DOI QR코드

DOI QR Code

Detecting outliers in segmented genomes of flu virus using an alignment-free approach

  • Received : 2019.10.25
  • Accepted : 2019.11.21
  • Published : 2020.03.31

Abstract

In this paper, we propose a new approach to detecting outliers in a set of segmented genomes of the flu virus, a data set with a heterogeneous set of sequences. The approach has the following computational phases: feature extraction, which is a mapping into feature space, alignment-free distance measure to measure the distance between any two segmented genomes, and a mapping into distance space to analyze a quantum of distance values. The approach is implemented using supervised and unsupervised learning modes. The experiments show robustness in detecting outliers of the segmented genome of the flu virus.

Keywords

References

  1. Wikramaratna PS, Sandeman M, Recker M, Gupta S. The antigenic evolution of influenza: drift or thrift? Philos Trans R Soc Lond B Biol Sci 2013;368:20120200. https://doi.org/10.1098/rstb.2012.0200
  2. Schweiger B, Zadow I, Heckler R. Antigenic drift and variability of influenza viruses. Med Microbiol Immunol 2002;191:133-138. https://doi.org/10.1007/s00430-002-0132-3
  3. Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017;18:186. https://doi.org/10.1186/s13059-017-1319-7
  4. Vinga S. Editorial: Alignment-free methods in computational biology. Brief Bioinform 2014;15:341-342. https://doi.org/10.1093/bib/bbu005
  5. Vinga S, Almeida J. Alignment-free sequence comparison: a review. Bioinformatics 2003;19:513-523. https://doi.org/10.1093/bioinformatics/btg005
  6. Stuart GW, Moffett K, Baker S. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics 2002;18:100-108. https://doi.org/10.1093/bioinformatics/18.1.100
  7. Han GB, Chung BC, Cho DH. Alignment-free sequence comparison using joint frequency and position information of k-words. Conf Proc IEEE Eng Med Biol Soc 2017;2017:3880-3883.
  8. Daoud M. Quantum sequence analysis: a new alignment-free technique for analyzing sequences in feature space. New York: Association for Computing Machinery, 2013. Accessed 2019 Dec 10. Available from: http://doi.acm.org/10.1145/2506583.2512375.
  9. Daoud M. Insights of window-based mechanism approach to visualize composite BioData point in feature spaces. Genomics Inform 2019;17:e4. https://doi.org/10.5808/GI.2019.17.1.e4
  10. Song K, Ren J, Reinert G, Deng M, Waterman MS, Sun F. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Brief Bioinform 2014;15:343-353. https://doi.org/10.1093/bib/bbt067
  11. Alfree. Alignment-free sequence tools. Polzan: Alfree, 2017. Accessed 2019 Dec 10. Available from: http://www.combio.pl/alfree.
  12. Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv 2009;41:1-58. https://doi.org/10.1145/1541880.1541882
  13. Daoud M. A new variance-covariance structure-based statistical pattern recognition system for solving the sequence-set proximity problem under the homology-free assumption [dissertation]. Guelph: University of Guelph, 2010.
  14. Daoud M, Kremer SC. A new distance distribution paradigm to detect the variability of the influenza-A virus in high dimensional spaces. In: 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop, 2009 Nov 1-4, Washington, DC, USA. Piscataway: Institute of Electrical and Electronics Engineers, 2009. pp. 32-37.
  15. Filzmoser P. A multivariate outlier detection method. In: Proceedings of the 7th International Conference on Computed Data Analysis and Modelling, Vol. 1 (Aivazian S, Filzmoser P, Kharin Y, eds.), Minsk: Belarusian State University, 2004. pp. 18-22.
  16. Filzmoser P. Identification of multivariate outliers: a performance study. Aust J Stat 2016;34:127-138.
  17. Sundararajan K, Woodard DL. Deep learning for biometrics: a survey. ACM Comput Surv 2018;51:1-34. https://doi.org/10.1145/3190618
  18. Viralzone. Virus variation resource. Bethesda: National Center for Biotechnology Information, 2016. Accessed 2019 Dec 10. Available from: https://www.ncbi.nlm.nih.gov/genome/viruses/variation/.