DOI QR코드

DOI QR Code

Outlier Detection in Time Series Monitoring Datasets using Rule Based and Correlation Analysis Method

규칙기반 및 상관분석 방법을 이용한 시계열 계측 데이터의 이상치 판정

  • Jeon, Jesung (Department of Construction Information Engineering, Induk University) ;
  • Koo, Jakap (Department of Civil, Safety & Environmental Engineering, Hankyong National University) ;
  • Park, Changmok (Department of Technology & Systems Management, Induk University)
  • Received : 2015.02.24
  • Accepted : 2015.04.07
  • Published : 2015.05.01

Abstract

In this study, detection methods of outlier in various monitoring data that fit into big data category were developed and outlier detections were conducted for both artificial data and real field monitoring data. Rule-based methods applied rate of change and probability of error for monitoring data are effective to detect a large-scale short faults and constant faults having no change within a certain period. There are however, problems with misjudgement that consider the normal data with a large scale variation as outlier caused by using independent single dataset. Rule-based methods for noise faults detection have a limit to application of real monitoring data due to the problem with a choice of proper window size of data and finding of threshold for outlier judgment. A correlation analysis among different two datasets were very effective to detect localized outlier and abnormal variation for short and long-term monitoring dataset if reasonable range of training data could be selected.

본 연구에서는 빅데이터 범주에 포함되는 각종 계측 데이터를 대상으로 각종 이상치를 판단하기 위한 기법을 고안하고, 인공 데이터 및 실 계측 데이터를 이용한 이상치 분석을 수행하였다. 계측결과에 대한 1차 차분 값 및 오차율을 적용한 규칙기반 방법은 큰 규모의 Short fault 분석 및 일정 기간 계측값에 변화가 발생하지 않는 경우의 Constant fault 분석에 효과적으로 적용될 수 있었으나, 독립적인 단일 데이터셋만을 이용하는 관계로 큰 변화폭을 보이는 실 계측 데이터의 정상 데이터를 이상치로 오판하는 문제점이 있었다. 규칙기반 방법을 이용한 Noise fault 분석은 적정 데이터 윈도우 사이즈의 선택 및 이상치 판정용 한계값 선정상의 문제로 인해 실 계측 데이터 적용에 한계가 있었다. 이종 데이터 간 상관분석 방법은 학습 데이터의 적정범위 선정이 선행된다면 장단기 계측 데이터의 이상 거동 및 국부적 이상치 판정에 매우 효과적으로 이용될 수 있음을 알 수 있었다.

Keywords

References

  1. Elnahrawy, E. and Nath, B. (2003), Cleaning and querying noisy sensors, Proc. of 2nd ACM International Conference on Wireless Sensor Networks and Applications, USA, pp. 78-87.
  2. Jeffery, S. R., Alonso, G., Franklin, M. J., Hong, W. and Widom, J. (2006), Declarative support for sensor data cleaning, Proc. of 4th International Conference on Pervasive Computing, Ireland, pp. 83-100.
  3. Kailath, T. (1975), Square-root algorithms for least-squares estimation, IEEE Trans. Automatic Control, Vol. 20, No. 4, pp. 487-497. https://doi.org/10.1109/TAC.1975.1100994
  4. Krishnamachari, B. and Iyengar, S. (2004), Distributed bayesian algorithms for fault-tolerant event region detection in wireless sensor networks, IEEE Trans. Vol. 53, No. 3, pp. 241-250.
  5. Mourad, M. and Bertrand-Krajewski, J. L. (2002), A method for automatic validation of long time series of data in urban hydrology, Water Science & Technology, Vol. 45, No. 4-5, pp. 263-270.
  6. Ni, K., Ramanathan, N., Chehade, M., Balzano, L., Nair, S., Zahedi, S., Pottie, G., Hansen, M. and Srivastava., M. (2009), Sensor network data fault types, ACM Transactions on Sensor Networks, Vol. 5, No. 3, Article25, pp. 1-29.
  7. Park, H. C., Hwang, H. J. and Lee, J. W. (2012), Development of new data analysis method to evaluate reliability of the sensor or measured data, Journal of the Korea Institute for Structural Maintenance and Inspection, Vol. 16, No. 6, pp. 34-44. https://doi.org/10.11112/jksmi.2012.16.6.034
  8. Ramanathan, N., Balzano, L., Burt, M., Estrin, D., Kohler, E., Harmon, T., Harvey, C., Jay, J., Rothenberg, S. and Srivastava, M. (2006), Rapid deployment with confidence: calibration and fault detection in environmental sensor networks. Tech. Rep. 62, CENS. pp. 1-14.
  9. Sharma, A. B., Golubchik, L. and Govindan, R. (2010), Sensor faults: detection methods and prevalence in real-world datasets, ACM Transactions on Sensor Networks, Vol. 6, No. 3, Article23. pp. 1-39.
  10. Szewczyk, R., Mainwaring, A., Polastre, J., Anderson, J. and Culler, D. (2004), An analysis of a large scale habitat monitoring application, Proc. of the 2nd international conference on Embedded networked sensor systems, USA, pp. 214-226.
  11. Tolle, G., Polastre, J., Szewczyk, R., Culler, D., Turner, N., Tu, K., Burgess, S., Dawson, T., Buonadonna, P., Gay, D. and Hong, W. (2005), A macroscope in the redwoods, Proc. of the 2nd International Conference on Embedded Networked Sensor Systems, ACM Press, New York, pp. 51-63.
  12. Werner-Allen, G., Lorincz, K., Johnson, J., Lees, J. and Welsh, M. (2006), Fidelity and yield in a volcano monitoring sensor network, Proc. of the 7th USENIX Symposium on Operating Systems Design and Implementation, Seattle, pp. 381-396.
  13. Williams, G. J., Baxter, R. A., He, H. X., Hawkins, S. and Gu, L. (2002), A comparative study of RNN for outlier detection in data mining, IEEE International Conference on Data-mining (ICDM'02), Maebashi City, Japan, CSIRO Technical Report CMIS-02/102, pp. 1-709.