Browse > Article
http://dx.doi.org/10.9765/KSCOE.2021.33.6.265

An Outlier Detection Using Autoencoder for Ocean Observation Data  

Kim, Hyeon-Jae (Department of Ocean Sciences, College of Natural Science, Inha University)
Kim, Dong-Hoon (Artificial Intelligence Convergence Research Center, Inha University)
Lim, Chaewook (Department of Ocean Sciences, College of Natural Science, Inha University)
Shin, Yongtak (Department of Ocean Sciences, College of Natural Science, Inha University)
Lee, Sang-Chul (Department of Computer Engineering, Inha University)
Choi, Youngjin (Dept. of Marine Forecast, GeoSystem Research Corporation)
Woo, Seung-Buhm (Department of Ocean Sciences, College of Natural Science, Inha University)
Publication Information
Journal of Korean Society of Coastal and Ocean Engineers / v.33, no.6, 2021 , pp. 265-274 More about this Journal
Abstract
Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.
Keywords
artificial intelligence; ocean data; outlier detection; unsupervised learning; Autoencoder;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Arribas, A., Glover, M., Maidens, A., Peterson, K., Gordon, M., MacLachlan, C., Graham, R., Fereday, D., Camp, J., Scaife, A.A., Xavier, P., McLean, P., Colman, A. and Cusack, S. (2011). The GloSea4 ensemble prediction system for seasonal forecasting. Monthly Weather Review, 139(6), 1891-1910.   DOI
2 Balmaseda, M.A., Vidard, A. and Anderson, D.L.T. (2008). The ECMWF ocean analysis system: ORA-S3. Monthly Weather Review, 136(8), 3018-3034.   DOI
3 Cho, H.Y., Oh, J.H., Kim, K.O. and Shim, J.S. (2013). Outlier detection and missing data filling methods for coastal water temperature data. Journal of Coastal Research, 165(65), 1898-1903.   DOI
4 Cummings, J.A. (2011). Ocean data quality control. In Operational oceanography in the 21st Century (pp. 91-121). Springer, Dordrecht.
5 Ingleby, B. and Huddleston, M. (2007). Quality control of ocean temperature and salinity profiles - Historical and real-time data. Journal of Marine Systems, 65(1-4 SPEC. ISS.), 158-175.   DOI
6 Levitus, S., Antonov, J.I., Boyer, T.P., Baranova, O.K., Garcia, H.E., Locarnini, R.A., Mishonov, A.V., Reagan, J.R., Seidov, D., Yarosh, E.S. and Zweng, M.M. (2012). World ocean heat content and thermosteric sea level change (0-2000m), 1955-2010. Geophysical Research Letters, 39(10).
7 Williams, G. and Gu, L. (2002). A comparative study of RNN for Outlier Detection in Data Mining. 2002 IEEE International Conference on Data Mining, 2002. Proceedings, 709-712.
8 Palmer, M.D., Haines, K., Tett, S.F.B. and Ansell, T.J. (2007). Isolating the signal of ocean global warming. Geophysical Research Letters, 34(23).
9 Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986). Learning Representations by Back-Propagating Errors. Nature, 323(6088), 533-536.   DOI
10 Smith, D.M., Cusack, S., Colman, A.W., Folland, C.K., Harris, G.R. and Murphy, J.M. (2007). Improved surface temperature prediction for the coming decade from a Global Climate Model. Paleoceanography, 796(2007), 796-799.
11 Wong, A.P.S., Johnson, G.C. and Owens, W.B. (2003). Delayed-Mode calibration of autonomous CTD profiling float salinity data by h-S climatology. J. Atmos. Ocean.Technol. 20: 308-318.   DOI
12 Yin, C., Zhang, S., Wang, J. and Xiong, N.N. (2020). Anomaly Detection Based on Convolutional Recurrent Autoencoder for IoT Time Series. IEEE Transactions on Systems, Man, and Cybernetics: Systems.
13 Charte, D., Charte, F., Garcia, S., del Jesus, M.J. and Herrera, F. (2018). A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Information Fusion, 44(December 2017), 78-96.   DOI
14 Giannoni, F., Mancini, M. and Marinelli, F. (2018). Anomaly Detection Models for IoT Time Series Data. arXiv preprint arXiv:1812.00890.
15 Good, S.A., Martin, M.J. and Rayner, N.A. (2013). EN4: Quality controlled ocean temperature and salinity profiles and monthly objective analyses with uncertainty estimates. Journal of Geophysical Research: Oceans, 118(12), 6704-6716.   DOI
16 Jingang, J., Lu, S., Zhongya, F. and Jiaguo, Q. (2017). Outlier detection and sequence reconstruction in continuous time series of ocean observation data based on difference analysis and the Dixon criterion. Limnology and Oceanography: Methods, 15(11), 916-927.   DOI
17 Boyer, T.P., Levitus, S., Antonov, J.I., Locarnini, R.A. and Garcia, H.E. (2005). Linear trends in salinity for the World Ocean, 1955-1998. Geophysical Research Letters, 32(1).
18 Durack, P.J. and Wijffels, S.E. (2010). Fifty-Year trends in global ocean salinities and their relationship to broad-scale warming. Journal of Climate, 23(16), 4342-4362.   DOI
19 Doong, D.J., Chen, S.H., Kao, C.C., Lee, B.C. and Yeh, S.P. (2007). Data quality check procedures of an operational coastal ocean monitoring network. Ocean Engineering, 34(2), 234-246.   DOI
20 Kim, T.Y. and Cho, S.B. (2018). Web traffic anomaly detection using C-LSTM neural networks. Expert Systems with Applications, 106, 66-76.   DOI