DOI QR코드

DOI QR Code

해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구

An Outlier Detection Using Autoencoder for Ocean Observation Data

  • 김현재 (인하대학교 해양과학과) ;
  • 김동훈 (인하대학교 인공지능융합센터) ;
  • 임채욱 (인하대학교 해양과학과) ;
  • 신용탁 (인하대학교 해양과학과) ;
  • 이상철 (인하대학교 컴퓨터공학과) ;
  • 최영진 ((주)지오시스템리서치 해양예보사업부) ;
  • 우승범 (인하대학교 해양과학과)
  • Kim, Hyeon-Jae (Department of Ocean Sciences, College of Natural Science, Inha University) ;
  • Kim, Dong-Hoon (Artificial Intelligence Convergence Research Center, Inha University) ;
  • Lim, Chaewook (Department of Ocean Sciences, College of Natural Science, Inha University) ;
  • Shin, Yongtak (Department of Ocean Sciences, College of Natural Science, Inha University) ;
  • Lee, Sang-Chul (Department of Computer Engineering, Inha University) ;
  • Choi, Youngjin (Dept. of Marine Forecast, GeoSystem Research Corporation) ;
  • Woo, Seung-Buhm (Department of Ocean Sciences, College of Natural Science, Inha University)
  • 투고 : 2021.11.19
  • 심사 : 2021.12.13
  • 발행 : 2021.12.31

초록

해양 이상 자료 탐지의 연구는 이전부터 활발하게 이루어지고 있으며, 통계 및 거리 기반의 기계 학습 알고리즘을 활용하는 기법들이 개발되었다. 최근에는 AI 기반의 해양 자료 이상 탐지 기법이 많은 관심을 받고 있으며, AI를 활용한 해양 이상 자료 탐지 기법은 정답이 주어지는 지도학습 기법이 주를 이루고 있다. 이러한 방법은 학습에 필요한 모든 자료에 수작업으로 분류 정보(라벨)를 지정해야 한다는 점에서 많은 시간과 비용이 요구된다. 본 연구에서는 이러한 문제를 극복하기 위해 비지도학습 기반의 오토인코더를 이상 자료 탐지 기법에 사용하였다. 실험으로는 오토인코더의 평가를 위해 단변수·다변수학습 두가지 실험을 구성하였고, 단변수 학습은 기상청에서 제공하는 덕적도 부이 정점 관측 자료 중 수온만 사용하였으며, 다변수 학습은 수온과 기온, 풍향, 풍속, 기압, 습도 등을 사용하였다. 사용기간은 1996~2020년의 25년간이며 학습 자료에 해양-기상 자료의 특성을 고려한 전처리 기법을 적용하였다. 학습된 다변수와 단변수 오토인코더를 활용하여 실제 표층 수온에 대한 이상 탐지를 시도하였다. 모델성능 비교를 위해 오차를 삽입한 합성 자료에 다변수와 단변수 오토인코더를 포함한 여러 이상 탐지 기법을 적용하여 정량적으로 평가하였으며, 다변수/단변수의 정확도가 각각 약 96%/91%로써 다변수 오토인코더가 더 나은 이상자료 탐지 성능을 보였다. 오토인코더를 이용한 비지도학습 기반 이상 탐지 기법은 주관적 판단에 의한 오류와 자료 라벨링에 필요한 시간과 비용을 줄일 수 있다는 점에서 다양하게 활용될 것으로 판단된다.

Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

키워드

과제정보

이 논문은 2021년 해양수산부 재원으로 해양수산과학기술진흥원의 지원을 받아 시행된 연구임(해양수치모델링과 지능정보기술을 활용한 해양예측 정확도 향상 연구). 이 논문은 2021년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임(2020-0-01389, 인공지능융합연구센터지원(인하대학교)).

참고문헌

  1. Arribas, A., Glover, M., Maidens, A., Peterson, K., Gordon, M., MacLachlan, C., Graham, R., Fereday, D., Camp, J., Scaife, A.A., Xavier, P., McLean, P., Colman, A. and Cusack, S. (2011). The GloSea4 ensemble prediction system for seasonal forecasting. Monthly Weather Review, 139(6), 1891-1910. https://doi.org/10.1175/2010mwr3615.1
  2. Balmaseda, M.A., Vidard, A. and Anderson, D.L.T. (2008). The ECMWF ocean analysis system: ORA-S3. Monthly Weather Review, 136(8), 3018-3034. https://doi.org/10.1175/2008mwr2433.1
  3. Boyer, T.P., Levitus, S., Antonov, J.I., Locarnini, R.A. and Garcia, H.E. (2005). Linear trends in salinity for the World Ocean, 1955-1998. Geophysical Research Letters, 32(1).
  4. Charte, D., Charte, F., Garcia, S., del Jesus, M.J. and Herrera, F. (2018). A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Information Fusion, 44(December 2017), 78-96. https://doi.org/10.1016/j.inffus.2017.12.007
  5. Cho, H.Y., Oh, J.H., Kim, K.O. and Shim, J.S. (2013). Outlier detection and missing data filling methods for coastal water temperature data. Journal of Coastal Research, 165(65), 1898-1903. https://doi.org/10.2112/SI65-321.1
  6. Cummings, J.A. (2011). Ocean data quality control. In Operational oceanography in the 21st Century (pp. 91-121). Springer, Dordrecht.
  7. Doong, D.J., Chen, S.H., Kao, C.C., Lee, B.C. and Yeh, S.P. (2007). Data quality check procedures of an operational coastal ocean monitoring network. Ocean Engineering, 34(2), 234-246. https://doi.org/10.1016/j.oceaneng.2006.01.011
  8. Durack, P.J. and Wijffels, S.E. (2010). Fifty-Year trends in global ocean salinities and their relationship to broad-scale warming. Journal of Climate, 23(16), 4342-4362. https://doi.org/10.1175/2010JCLI3377.1
  9. Giannoni, F., Mancini, M. and Marinelli, F. (2018). Anomaly Detection Models for IoT Time Series Data. arXiv preprint arXiv:1812.00890.
  10. Good, S.A., Martin, M.J. and Rayner, N.A. (2013). EN4: Quality controlled ocean temperature and salinity profiles and monthly objective analyses with uncertainty estimates. Journal of Geophysical Research: Oceans, 118(12), 6704-6716. https://doi.org/10.1002/2013JC009067
  11. Ingleby, B. and Huddleston, M. (2007). Quality control of ocean temperature and salinity profiles - Historical and real-time data. Journal of Marine Systems, 65(1-4 SPEC. ISS.), 158-175. https://doi.org/10.1016/j.jmarsys.2005.11.019
  12. Jingang, J., Lu, S., Zhongya, F. and Jiaguo, Q. (2017). Outlier detection and sequence reconstruction in continuous time series of ocean observation data based on difference analysis and the Dixon criterion. Limnology and Oceanography: Methods, 15(11), 916-927. https://doi.org/10.1002/lom3.10212
  13. Kim, T.Y. and Cho, S.B. (2018). Web traffic anomaly detection using C-LSTM neural networks. Expert Systems with Applications, 106, 66-76. https://doi.org/10.1016/j.eswa.2018.04.004
  14. Levitus, S., Antonov, J.I., Boyer, T.P., Baranova, O.K., Garcia, H.E., Locarnini, R.A., Mishonov, A.V., Reagan, J.R., Seidov, D., Yarosh, E.S. and Zweng, M.M. (2012). World ocean heat content and thermosteric sea level change (0-2000m), 1955-2010. Geophysical Research Letters, 39(10).
  15. Palmer, M.D., Haines, K., Tett, S.F.B. and Ansell, T.J. (2007). Isolating the signal of ocean global warming. Geophysical Research Letters, 34(23).
  16. Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986). Learning Representations by Back-Propagating Errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0
  17. Smith, D.M., Cusack, S., Colman, A.W., Folland, C.K., Harris, G.R. and Murphy, J.M. (2007). Improved surface temperature prediction for the coming decade from a Global Climate Model. Paleoceanography, 796(2007), 796-799.
  18. Williams, G. and Gu, L. (2002). A comparative study of RNN for Outlier Detection in Data Mining. 2002 IEEE International Conference on Data Mining, 2002. Proceedings, 709-712.
  19. Wong, A.P.S., Johnson, G.C. and Owens, W.B. (2003). Delayed-Mode calibration of autonomous CTD profiling float salinity data by h-S climatology. J. Atmos. Ocean.Technol. 20: 308-318. https://doi.org/10.1175/1520-0426(2003)020<0308:DMCOAC>2.0.CO;2
  20. Yin, C., Zhang, S., Wang, J. and Xiong, N.N. (2020). Anomaly Detection Based on Convolutional Recurrent Autoencoder for IoT Time Series. IEEE Transactions on Systems, Man, and Cybernetics: Systems.