DOI QR코드

DOI QR Code

Efficient Outlier Detection of the Water Temperature Monitoring Data

수온 관측 자료의 효율적인 이상 자료 탐지

  • Cho, Hongyeon (Marine Environments and Conservation Research Division, Korea Institute of Ocean Science and Technology) ;
  • Jeong, Shin Taek (Department of Civil and Environmental Engineering, Wonkwang University) ;
  • Ko, Dong Hui (Department of Civil and Environmental Engineering, Wonkwang University) ;
  • Son, Kyeong-Pyo (Resource Recycling Division, Ministry of Environment)
  • 조홍연 (한국해양과학기술원, 해양환경보전연구부) ;
  • 정신택 (원광대학교 토목환경공학과, 원광대학교 부설 공업기술개발연구소) ;
  • 고동휘 (원광대학교 토목환경공학과, 원광대학교 부설 공업기술개발연구소) ;
  • 손경표 (환경부 자원순환국 자원재활용과)
  • Received : 2014.08.30
  • Accepted : 2014.10.16
  • Published : 2014.10.31

Abstract

The statistical information of the coastal water temperature monitoring data can be biased because of outliers and missing intervals. Though a number of outlier detection methods have been developed, their applications are very limited to the in-situ monitoring data because of the assumptions of the a prior information of the outliers and no-missing condition, and the excessive computational time for some methods. In this study, the practical robust method is developed that can be efficiently and effectively detect the outliers in case of the big-data. This model is composed of these two parts, one part is the construction part of the approximate components of the monitoring data using the robust smoothing and data re-sampling method, and the other part is the main iterative outlier detection part using the detailed components of the data estimated by the approximate components. This model is tested using the two-years 5-minute interval water temperature data in Lake Saemangeum. It can be estimated that the outlier proportion of the data is about 1.6-3.7%. It shows that most of the outliers in the data are detected and removed with satisfaction by the model. In order to effectively detect and remove the outliers, the outlier detection using the long-span smoothing should be applied earlier than that using the short-span smoothing.

연안의 수온 모니터링 자료는 이상자료 및 결측을 포함하고 있기 때문에 통계정보를 왜곡할 수 있다. 다양한 이상자료 감지 기법이 제안되고 있으나 결측이 없고 이상자료에 대한 사전정보를 가정하고, 어떤 적용기법은 과도한 계산시간이 소요되기 때문에 적용에 제한이 따른다. 본 연구에서는 방대한 자료에서도 효과적으로 이상자료를 감지할 수 있는 실용적인 Robust 모형을 제안하였다. 이 모형은 계산시간을 크게 저감하는 부분자료 추출기법을 이용한 어림성분 추정과정 및 어림성분으로부터 계산되는 잔차성분으로부터 이상자료를 반복적으로 진단하여 제거하는 부분으로 구성되어 있다. 이 모형의 성능평가는 새만금호에서 5분 간격으로 관측한 2년 동안의 수온 자료를 이용하여 수행하였다. 모형 적용결과, 이상자료가 전체자료에서 차지하는 비율은 1.6-3.7% 정도로 파악되었으며, 전체적으로 대부분의 이상자료가 제거되는 것으로 파악되었다. 또한 어림성분 추정과정의 반복적용은 Long-span 조건을 먼저 적용하는 것이 효과적인 것으로 파악되었다.

Keywords

References

  1. Agresti, A. and Franklin, C. (2007). Statistics, The Art and Science of Learning from Data, Pearson Education Inc.
  2. Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. Third Edition, John Wiley & Sons.
  3. Basu, S. and Meckesheimer, M. (2007). Automatic outlier detection for time series: an application to sensor data, Knowledge and Information Systems, 11(2), 137-154. https://doi.org/10.1007/s10115-006-0026-6
  4. Ben-Gal, I. (2005). Outlier detection, Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researcher (Editors: Maimom, O. and Rockach, L), Chapter 1(1-16), Kluwer Academic Publishers.
  5. Chiang, J-T. (2008). The algorithm for multiple outliers detection against masking and swamping effects, International J. of Contemporary Mathematical Sciences, 3(17), 839-859.
  6. Cho, H.Y. Oh, J. Kim, K.O. and Shim, J.S. (2013). Outlier detection and missing data filling methods for coastal water temperature data, Journal of Coastal Research, Special Issue, No. 65, pp.1898-1903.
  7. Cho, H.Y. and Oh J., 2012. Outlier detection of the coastal water temperature monitoring data using the approximate and detailed components, J. of the Korean Society for Marine Environmental Engineering, Technical Note, 15(2), 156-162. https://doi.org/10.7846/JKOSMEE.2012.15.2.156
  8. Cleveland, W.S., 1979. Robust locally weighted regression and smoothing scatterplots, J. of the American Statistical Association, 74(368), 829-836. https://doi.org/10.1080/01621459.1979.10481038
  9. Hubert, M. and van der Veeken, S. (2008). Outlier detection for skewed data, J. of Chemometrics, Special Issue, 22, 235-246.
  10. Silverman, B.W. (1998). Density Estimation for Statistics and Data Analysis, Chap.3, Chapman & Hall/CRC.
  11. Storch, H.v. and Zwiers, F.W. (1999) Statistical Analysis in Climate Research, Sec. 5.3, Cambridge Univ. Press.
  12. Tsay, R.S. (1988). Outliers, level shifts, and variance changes in time-series, J. of Forecasting, 7, 1-20. https://doi.org/10.1002/for.3980070102

Cited by

  1. Estimation and Comparative Analysis on the Distribution Functions of Air and Water Temperatures in Korean Coastal Seas vol.28, pp.3, 2016, https://doi.org/10.9765/KSCOE.2016.28.3.171