Browse > Article
http://dx.doi.org/10.9717/kmms.2020.23.8.915

Identification of Incorrect Data Labels Using Conditional Outlier Detection  

Hong, Charmgil (School of Computer Science and Electrical Engineering, Handong Global University)
Publication Information
Abstract
Outlier detection methods help one to identify unusual instances in data that may correspond to erroneous, exceptional, or surprising events or behaviors. This work studies conditional outlier detection, a special instance of the outlier detection problem, in the context of incorrect data label identification. Unlike conventional (unconditional) outlier detection methods that seek abnormalities across all data attributes, conditional outlier detection assumes data are given in pairs of input (condition) and output (response or label). Accordingly, the goal of conditional outlier detection is to identify incorrect or unusual output assignments considering their input as condition. As a solution to conditional outlier detection, this paper proposes the ratio-based outlier scoring (ROS) approach and its variant. The propose solutions work by adopting conventional outlier scores and are able to apply them to identify conditional outliers in data. Experiments on synthetic and real-world image datasets are conducted to demonstrate the benefits and advantages of the proposed approaches.
Keywords
Conditional Outlier Detection; Outlier Analysis; Anomaly Detection;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 M. Hauskrecht, I. Batal, C. Hong, Q. Nguyen, G.F. Cooper, S. Visweswaran, et al., "Outlierbased Detection of Unusual Patient-management Actions: An ICU Study," Journal of Biomedical Informatics, Vol. 64, pp. 211-221, 2016.   DOI
2 V. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Review, Vol. 22, No. 2, pp. 85-126, 2004.   DOI
3 Y.T. Jeon, S.H. Yu, and H.Y. Kwon, “Improvement of PM Forecasting Performance by Outlier Data Removing,” Journal of Korea Multimedia Society, Vol. 23, No. 6, pp. 747-755, 2020.
4 X. Song, M. Wu, C. Jermaine, and S. Ranka, “Conditional Anomaly Detection,” IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 5, pp. 631-645, 2007.   DOI
5 V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” ACM Computing Surveys, Vol. 41, No. 3, pp. 1-58, 2009.
6 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based Learning Applied to Document Recognition," Proceedings of the IEEE, pp. 2278-2324, 1998.
7 M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G.F. Cooper, and G. Clermont, “Outlier Detection for Patient Monitoring and Alerting,” Journal of Biomedical Informatics, Vol. 46, No. 1, pp. 47-55, 2013.   DOI
8 M. Valko and M. Hauskrecht, "Distance Metric Learning for Conditional Anomaly Detection," Proceeding of 21st International Florida Artificial Intelligence Research Society Conference, pp. 684-689, 2008.
9 M.M. Breunig, H.P. Kriegel, R.T. Ng, and J. Sander, "LOF: Identifying Density-based Local Outliers," Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93-104, 2000.
10 B. Scholkopf, R.C. Williamson, A.J. Smola, J. S. Taylor, and J.C. Platt, "Support Vector Method for Novelty Detection," Proceeding of Conference on Neural Information Processing Systems, pp. 582-588, 1999.
11 F. Keller, E. Muller, and K. Bohm. "HICS: High Contrast Subspaces for Density-based Outlier Ranking," Proceeding of 2012 IEEE 28th International Conference on Data Engineering, pp. 1037-1048, 2012.
12 I.T. Jolliffe, "Principal Component Analysis and Factor Analysis," Principal Component Analysis, Springer, New York, NY, 1986.
13 M. Markou and S. Singh, “Novelty Detection: A Review - Part 1: Statistical Approaches,” Signal Processing, Vol. 83, No. 12, pp. 2481-2497, 2003.   DOI
14 A. Lazarevic and V. Kumar, "Feature Bagging for Outlier Detection," Proceedings of the 11st ACM Sigkdd International Conference on Knowledge Discovery in Data Mining, pp. 157-166, 2005.
15 S. Papadimitriou, H. Kitagawa, P.B. Gibbons, and C. Faloutsos, "LOCI: Fast Outlier Detection Using the Local Correlation Integral," Proceedings of 19th International Conference on IEEE Data Engineering, pp. 315-326, 2003.
16 D.M. Tax and R.P. Duin, “Support Vector Data Description,” Machine Learning, Vol. 54, No. 1, pp. 45-66, 2004.   DOI
17 R. Weber, H.J. Schek, and S. Blott, "A Quantitative Analysis and Performance Study for Similarity-search Methods in High-dimensional Spaces," Proceedings of the 24th International Conference on Very Large Data Bases, pp. 194-205, 1998.
18 A. Hinneburg, C.C. Aggarwal, and D.A. Keim, "What is the Nearest Neighbor in High Dimensional Spaces?," Proceedings of the 26th International Conference on Very Large Data Bases, pp. 506-515, 2000.
19 C.C. Aggarwal, A. Hinneburg, and D. Keim, "On the Surprising Behavior of Distance Metrics in High Dimensional Spaces," Proceedings of the 8th International Conference on Database Theory, pp. 420-434, 2001.
20 A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley and Sons, New York, NY, 2001.
21 T. Fawcett and F. Provost, “Adaptive Fraud Detection,” Data Mining and Knowledge Discovery, Vol. 1, No. 3, pp. 291-316, 1997.   DOI
22 H.P. Kriegel, P. Kroger, and A. Zimek, "Outlier Detection Techniques," Tutorial at 2010 Society for Industrial and Applied Mathematics Conference on Data Mining, 2010.
23 C.C. Aggarwal, Outlier Analysis, Springer, New York, 2013.
24 M. Pimentel, D. Clifton, L. Clifton, and L. Tarassenko, "A Review of Novelty Detection," Signal Processing, Vol. 99, pp. 215-249, 2014.   DOI
25 S. Wang, "A Comprehensive Survey of Data Mining-based Accounting-fraud Detection Research," Proceeding of Intelligent Computation Technology and Automation, 2010 International Conference, pp. 50-53, 2010.
26 K. Tan, K. Killourhy, and R. Maxion, "Undermining an Anomaly-based Intrusion Detection System Using Common Exploits," Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, pp. 54-73, 2002.
27 P.G. Teodoro, J.D. Verdejo, G. Macia- Fernandez, and E. Vazquez, “Anomaly-based Network Intrusion Detection: Techniques, Systems and Challenges,” Computers and Security, Vol. 28, No. 1, pp. 18-28, 2009.   DOI
28 W.K. Wong, A. Moore, G. Cooper, and M. Wagner, "Bayesian Network Anomaly Pattern Detection for Disease Outbreaks," Proceedings of the 20th International Conference on Machine Learning, pp. 808-815, 2003.
29 M. Hauskrecht, M. Valko, B. Kveton, S. Visweswaram, and G. Cooper, "Evidence-based Anomaly Detection," Proceeding of Annual American Medical Informatics Association Symposium, pp. 319-324, 2007.