Browse > Article
http://dx.doi.org/10.9717/kmms.2021.24.9.1242

A Distance-based Outlier Detection Method using Landmarks in High Dimensional Data  

Park, Cheong Hee (Division of Computer Convergence, Chungnam National University)
Publication Information
Abstract
Detection of outliers deviating normal data distribution in high dimensional data is an important technique in many application areas. In this paper, a distance-based outlier detection method using landmarks in high dimensional data is proposed. Given normal training data, the k-means clustering method is applied for the training data in order to extract the centers of the clusters as landmarks which represent normal data distribution. For a test data sample, the distance to the nearest landmark gives the outlier score. In the experiments using high dimensional data such as images and documents, it was shown that the proposed method based on the landmarks of one-tenth of training data can give the comparable outlier detection performance while reducing the time complexity greatly in the testing stage.
Keywords
Distance-based Outlier Detection; High Dimensional Data; Landmark; Outlier Detection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E. Knor and R. Ng, "Algorithms for Mining Distance-based Outliers in Large Datasets," Proceeding of International Conference on Very Large Databases, pp. 392-403, 1998.
2 A. Zimek, E. Schubert, and H. Kriegel, "A Survey on Unsupervised Outlier Detection in High-dimensional Numerical Data," Statistical Analysis and Data Mining, Vol. 5, pp. 363-387, 2012.   DOI
3 The MNIST Database(1998), http://yann.lecun.com/exdb/mnist (Accessed February 20, 2019).
4 D. Greene and P. Cunningham, "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering," Proceeding of International Conference on Machine Learning, pp. 377-384, 2006.
5 Y. Zhao, Z. Nasrullah and Z. Li, "PyOD: A Python Toolbox for Scalable Outlier Detection," Journal of Machine Learning Research, Vol. 20, pp. 1-7, 2019.
6 C. Aggarwal, Outlier Analysis, Springer, Switzerlnd, 2017.
7 C. Park. "Outlier and Anomaly Pattern Detection on Data Streams," The Journal of Supercomputing, Vol. 75, pp. 6118-6128, 2019.   DOI
8 S. Damaswanny, R. Rastogi, and K. Shim, "Efficient Algorithms for Mining Outliers from Large Data Sets," Proceeding of ACM Sigmod International Conference on Management of Data, pp. 427-438, 2000.
9 T. Vries, S. Chawla, and M. Houle, "Finding Local Anomalies in Very High Dimensional Space," Proceeding of International Conference on Data Mining, pp 128-137, 2010.
10 H. Hoffmann, "Kernel PCA for Novelty Detection," Pattern Recognition, Vol. 40, pp. 863- 874, 2007.   DOI
11 S. Choi and C. Park, "Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data," Journal of Korea Multimedia Society, Vol. 23, No. 9, pp. 1181-1190, 2020.   DOI
12 H. Kriegel, P. Kroger, E. Schubert, and A. Zimek, "Outlier Detection in Axis-parallel subspaces of High Dimensional Data," Proceeding of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 831-838, 2009.
13 A. Lazarevic and V. Kumar, "Feature Bagging for Outlier Detection," Proceeding of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 157-166, 2005.
14 F. Liu, K. Ting, and Z. Zhou, "Isolation Forest," Proceeding of International Conference on Data Mining, pp. 413-422, 2008.
15 A. Putina, M. Sozio, D. Rossi, and J. Navarro, "Random Histogram Forest for Unsupervised Anomaly Detection," Proceedings of International Conference on Data Mining, pp. 1226-1231, 2020.
16 E. Knorr and R. Ng, "Finding Intensional Knowledge of Distance-based Outliers," Proceeding of 25th International Conference on Very Large Databases, pp. 211-222, 1999.
17 M. Breunig, H. Kriegel, R. Ng, and J. Sander, "LOF: Identifying Density-based Local Outliers," Proceeding of the ACM Sigmod International Conference on Management of Data, pp. 93-104, 2000.
18 K. Wu, K. Zhang, W. Fan, A. Edwards, and P. Yu, "RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection," Proceeding of the 14th International Conference on Data Mining, pp. 600-609, 2014.
19 S. Sathe and C. Aggarwal, "Subspace Histograms for Outlier Detection in Linear Time," Knowledge and Information Systems, Vol. 56, pp. 691-715, 2018.   DOI
20 E. Marchi, F. Vesperini, F. Weninger, F. Eyben, S. Squartini, and B. Schuller, "Non-linear Prediction with LSTM Recurrent Neural Networks for Acoustic Novelty Detection," Proceeding of International Joint Conference on Neural Networks, 2015.