Browse > Article
http://dx.doi.org/10.5808/GI.2020.18.1.e2

Detecting outliers in segmented genomes of flu virus using an alignment-free approach  

Daoud, Mosaab (Independent Research Scientist)
Abstract
In this paper, we propose a new approach to detecting outliers in a set of segmented genomes of the flu virus, a data set with a heterogeneous set of sequences. The approach has the following computational phases: feature extraction, which is a mapping into feature space, alignment-free distance measure to measure the distance between any two segmented genomes, and a mapping into distance space to analyze a quantum of distance values. The approach is implemented using supervised and unsupervised learning modes. The experiments show robustness in detecting outliers of the segmented genome of the flu virus.
Keywords
composite data point; distance space; flu virus; Mosaab-metric space; outliers; statistical learning;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Wikramaratna PS, Sandeman M, Recker M, Gupta S. The antigenic evolution of influenza: drift or thrift? Philos Trans R Soc Lond B Biol Sci 2013;368:20120200.   DOI
2 Schweiger B, Zadow I, Heckler R. Antigenic drift and variability of influenza viruses. Med Microbiol Immunol 2002;191:133-138.   DOI
3 Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017;18:186.   DOI
4 Vinga S. Editorial: Alignment-free methods in computational biology. Brief Bioinform 2014;15:341-342.   DOI
5 Vinga S, Almeida J. Alignment-free sequence comparison: a review. Bioinformatics 2003;19:513-523.   DOI
6 Stuart GW, Moffett K, Baker S. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics 2002;18:100-108.   DOI
7 Han GB, Chung BC, Cho DH. Alignment-free sequence comparison using joint frequency and position information of k-words. Conf Proc IEEE Eng Med Biol Soc 2017;2017:3880-3883.
8 Song K, Ren J, Reinert G, Deng M, Waterman MS, Sun F. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Brief Bioinform 2014;15:343-353.   DOI
9 Daoud M. Quantum sequence analysis: a new alignment-free technique for analyzing sequences in feature space. New York: Association for Computing Machinery, 2013. Accessed 2019 Dec 10. Available from: http://doi.acm.org/10.1145/2506583.2512375.
10 Daoud M. Insights of window-based mechanism approach to visualize composite BioData point in feature spaces. Genomics Inform 2019;17:e4.   DOI
11 Alfree. Alignment-free sequence tools. Polzan: Alfree, 2017. Accessed 2019 Dec 10. Available from: http://www.combio.pl/alfree.
12 Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv 2009;41:1-58.   DOI
13 Daoud M. A new variance-covariance structure-based statistical pattern recognition system for solving the sequence-set proximity problem under the homology-free assumption [dissertation]. Guelph: University of Guelph, 2010.
14 Daoud M, Kremer SC. A new distance distribution paradigm to detect the variability of the influenza-A virus in high dimensional spaces. In: 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop, 2009 Nov 1-4, Washington, DC, USA. Piscataway: Institute of Electrical and Electronics Engineers, 2009. pp. 32-37.
15 Filzmoser P. A multivariate outlier detection method. In: Proceedings of the 7th International Conference on Computed Data Analysis and Modelling, Vol. 1 (Aivazian S, Filzmoser P, Kharin Y, eds.), Minsk: Belarusian State University, 2004. pp. 18-22.
16 Filzmoser P. Identification of multivariate outliers: a performance study. Aust J Stat 2016;34:127-138.
17 Viralzone. Virus variation resource. Bethesda: National Center for Biotechnology Information, 2016. Accessed 2019 Dec 10. Available from: https://www.ncbi.nlm.nih.gov/genome/viruses/variation/.
18 Sundararajan K, Woodard DL. Deep learning for biometrics: a survey. ACM Comput Surv 2018;51:1-34.   DOI