Browse > Article
http://dx.doi.org/10.3837/tiis.2022.12.002

Density-based Outlier Detection in Multi-dimensional Datasets  

Wang, Xite (Dalian Maritime University)
Cao, Zhixin (Dalian Maritime University)
Zhan, Rongjuan (Dalian Maritime University)
Bai, Mei (Dalian Maritime University)
Ma, Qian (Dalian Maritime University)
Li, Guanyu (Dalian Maritime University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.16, no.12, 2022 , pp. 3815-3835 More about this Journal
Abstract
Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as LOFmin. Secondly, the outliers can filtered out by LOFmin, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.
Keywords
outlier; multi-dimensional data; density-based; z-order curve; micro-cluster;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Chongsheng Zhang, Zhongbo Wu, Bo Qu, Hong Chen, "Mining Top-n Local Outliers in Constrained Spatial Networks," in Proc. of the 4th international conference on Advanced Data Mining and Applications, Springer, vol. 5139, no. 008, pp.725-732, Oct. 2008.
2 Jin L, Chen J, Zhang X, "An Outlier Fuzzy Detection Method Using Fuzzy Set Theory," IEEE Access, vol. 357, no. 99, pp.59321-59322, 2019.
3 Gustavo Henrique Orair, Carlos H. C. Teixeira, "Distance-Based Outlier Detection: Consolidation and Renewed Bearing," Proceedings of the VLDB Endowment, vol. 3, no.2, pp.1469-1480, Oct. 2010.   DOI
4 Tian Xia, "Improving the R*-tree with outlier handling techniques," in Proc. of International Workshop on Geographic Information Systems, Bremen, Germany, pp.125-134, Nov. 2005.
5 Papadimitriou S, Kitagawa H, Gibbons P B, et al, "LOCI: Fast Outlier Detection Using the Local Correlation Integral," in Proc. of the 19th International Conference on Data Engineering, March, Bangalore, India, pp.315-326, Mar. 2003.
6 He Z, Xu X, Deng S, "Discovering Cluster Based Local Outliers," Pattern Recognition Letters, vol.24, no. 9, pp. 1641-1650, 2003.   DOI
7 Zhong Y, Wang x, Bai M, et al, "FODU: fast outlier detection method in uncertain data sets," Computer engineering and applications, vol.55, no. 19, pp.105-114, 2019.
8 Schubert E, Zimek A, Kriegel H P, "Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection," Data Mining and Knowledge Discovery, vol. 28, no. 1, pp.190-237, 2014.   DOI
9 Nguyen M Q, Omiecinski E, Mark L, et al, "A Fast Randomized Method for Local DensityBased Outlier Detection in High Dimensional Data," Data Warehousing and Knowledge Discovery, Bilbao, Spain, pp.215-226, Aug. 2010.
10 Zhao X, Cui W, Wu Y, et al, "Outlier Interpretation on Multi-dimensional Data via Visual Analytic," in Proc. of Computer Graphics Forum, vol. 38, no.3, pp.213-214, Oct. 2019.
11 Bo Tang, Haibo He, "A Local Density-Based Approach for Local Outlier Detection," CoRR, Mar. 2016.
12 Han J, Micheline K, "Data mining: concepts and techniques," data mining concepts models methods & algorithms second edition, vol. 5, no. 4, pp.1-18, Jan. 2006.
13 Mengliang Shao, Deyu Qi, Huili Xue, "Big data outlier detection model based on improved density peak algorithm," J. Intell. Fuzzy Syst., vol. 40, pp.6185-6194, May. 2021.   DOI
14 Yan Gao, "Deep Model-Based Semi-Supervised Learning Way for Outlier Detection in Wireless Capsule Endoscopy Images," IEEE Access, vol. 8, pp.81621-81632, Oct. 2020.   DOI
15 Bay S D, Schwabacher M, "Mining distance-based outliers in near linear time with randomization and a simple pruning rule," in Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, USA, pp.29-38, Aug. 2003.
16 Xiangmin Zhou, Guoren Wang, Jeffrey Xu Yu, "M+-tree : A New Dynamical Multidimensional Index for Metric Spaces," in Proc. of Fourteenth Australasian Database Conference (ADC2003), vol. 17, pp.161-168, Feb. 2003
17 Lin Mei, Fengli Zhang, et al, "A Distributed Density-based Outlier Detection Algorithm on Big Data," Int. J. Netw. Secur., vol.22, no.5, pp. 775-781, Jan. 2020.
18 Hu C, Qin X, "A Density-Based Local Outlier Detecting Algorithm," Computer research and development, vol.47, no.12, pp. 2110-2116, Dec. 2010.
19 Matsumoto T, Hung E, Edward, Yiu L, et al, "Parallel outlier detection on uncertain data for GPUs," Distributed and Parallel Databases, vol.33, no. 3, pp.417-447, 2015.   DOI
20 Huawen Liu. Xuelong, Jiuyong Li, "Efficient Outlier Detection for High-Dimensional Data," IEEE Trans. Syst. Man Cybern. Syst., vol.48, no.12, pp. 2451-2461, 2018.   DOI
21 Schubert E, Gertz M, "Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection," in Proc. of International Conference on Similarity Search & Applications, Munich, Germany, pp.188-203, Oct. 2017.
22 Mostafa Rahmani, George K. Atia, "Randomized Robust Subspace Recovery and Outlier Detection for High Dimensional Data Matrices," IEEE Trans. Signal Process., vol. 65, no. 6, pp.1580-1594, May. 2017.   DOI
23 Angiuulli F, Pizzuti C, "Outlier mining in large high-dimensional data sets," IEEE Transactions on Knowledge & Data Engineering, vol. 17, no. 2, pp.203-215, Feb. 2005.   DOI
24 Wang X, Shen D, Bai M, et al, "An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Dataset," Journal of Computer Science and Technology, vol. 30, no. 6, pp.1233-1248, Oct. 2015.   DOI
25 Zhang K, Hutter M, Jin H, "A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data," in Proc. of Pacific- Conference on Knowledge Discovery & Data Mining, Bangkok, Thailand, pp.813-822, Apr. 2009.
26 De Ridder D, "An Experimental Comparison of One-class Classification Methods," in Proc. of the 4th Annual Conference of the Advanced School for Computing and Imaging, Delft, NL, pp. 121-130, 1998.
27 Asuncion A, Newman D, "UCI machine learning repository," 2015.
28 Xue A, Yao L, Ju S, et al, "Survey of outlier mining methods," Computer science, vol. 35, no. 11, pp.13-18, Oct. 2008.   DOI
29 Kriegel H P, Schubert M, Zimek A, "Angle-based outlier detection in high-dimensional data," in Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, pp. 444-452, Aug. 2008.