DOI QR코드

DOI QR Code

Detecting unusual observations in time series: An application of the normal-exp model

시계열에서 비정상적인 관측치의 식별: 정규-지수 분포의 활용

  • Ro Jin Pak (Department of Information Statistics, Dankook University)
  • 박노진 (단국대학교 정보통계학과)
  • Received : 2022.11.28
  • Accepted : 2023.01.04
  • Published : 2023.06.30

Abstract

A method to detect unusual or abnormal data in a time series or a discrete time signal is proposed. The idea of microarray background correction has been borrowed to detect abnormal data in a time series. Background correction to isolate anomalous signals or noise from the mircroarray's real signal is a very important step in tuning the data for ambient intensity surrounding each feature. The normal-exponential distribution was proposed to model background noise and signal by convoluting the exponential and the normal distribution. It is tried to model the error terms of a time series with the normal-exponential distribution. Once the residual components were well treated by the normal-exponential distribution, the observations with unexpectedly large residuals are detected as outliers or unusual observations. The marriage event data and the real estate price index data were considered for empirical studies and we were able to find several anomalous observations consistent with when the Korean economy or society actually underwent dramatic changes.

시계열 혹은 이산 시간 신호에서의 이상치 혹은 비정상적 관측치를 탐지하는 새로운 방법을 제안하였다. 마이크로 어레이 분석에서 시료의 발현값으로부터 배경 잡음을 제거하고 신호의 세기를 추정하기 위해 정규 분포와 지수 분포의 합성곱으로 이루어진 정규-지수 분포를 사용한다. 정규-지수 분포를 차용하여 시계열에서 이상치를 탐지하는 방법을 제안하여 보았다. 시계열 모형에서 오차항을 정규 분포로 정의하지만 실제로는 모형 적합 후 잔차들이 정규 분포에 적합하지 않은 경우가 있을 수 있다. 정규-지수 분포가 잔차 성분들을 잘 적합한다면 확률적으로 주어진 한계를 벗어나는 잔차를 갖는 관측치를 이상치로 판별하려 하였다. 실증적 검정을 위해 결혼 건수와 부동산 가격 지수를 예로 사용하였다. 제안된 방법을 통해 이상치 혹은 비정상적인 관측치를 판별해 내었고 판별된 시점들이 경제적으로 혹은 사회적으로 격변하던 시점들과 일치하는 측면이 있음을 확인할 수 있었다.

Keywords

Acknowledgement

이 연구는 2022년도 단국대학교 대학연구비의 지원으로 연구되었음.

References

  1. Bolstad BM (2004). Low level analysis of high-density oligonucleotide array data: Background, normalization and summarization (Dissertation), University of California-Berkeley, Berkeley, CA.
  2. Bruce AG and Martin RD (1989). Leave-k-out diagnostics for time series, Journal of the Royal Statistical Society B, 51, 363-401. https://doi.org/10.1111/j.2517-6161.1989.tb01435.x
  3. Chen C and Liu LM (1993). Joint estimation of model parameters and outlier effects in time series, Journal of the American Statistical Association, 88, 284-297. https://doi.org/10.1080/01621459.1993.10594321
  4. Gupta M, Gao J, Aggarwal CC, and Han J (2013). Outlier detection for temporal data: A survey, IEEE Transactions on Knowledge and Data Engineering, 26, 2250-2267. https://doi.org/10.1109/TKDE.2013.184
  5. Lopez-de-Lacalle J (2016). Tsoutliers: Detection of Outliers in Time Series, R package version 0.6-8.
  6. Lefrancois B (1991). Detecting over-influential observations in time series, Biometrika, 78, 91-99. https://doi.org/10.1093/biomet/78.1.91
  7. McGee M and Chen Z (2006). Parameter estimation for the exponential-normal convolution model for background correction of Affymetrix GeneChip data, Statistical Applications in Genetics and Molecular Biology, 5, Article 24.
  8. Pena D (1990). Influential observations in time series, ˜ Journal of Business & Economic Statistics, 8, 235-241. https://doi.org/10.1080/07350015.1990.10509795
  9. Plancade S, Rozenholc Y, and Lund E (2012). Generalization of the normal-exponential model: Exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays, BMC Bioinformatics, 13, 1-16. https://doi.org/10.1186/1471-2105-13-1
  10. Ren H, Xu B, Wang Y, Yi C, Huang C, Kou X, and Zhang Q (2019). Time-Series anomaly detection service at Microsoft, In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 3009-3017.
  11. Ritchie ME, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A, and Smyth GK (2007). A comparison of background correction methods for two-colour microarrays, Bioinformatics, 23, 2700-2707. https://doi.org/10.1093/bioinformatics/btm412
  12. Shittu IO and Shangodoyin DK (2008). Detection of outliers in time series data: A frequency domain approach, Asian Journal of Scientific Research, 1, 130-137. https://doi.org/10.3923/ajsr.2008.130.137
  13. Silver JD, Ritchie ME, and Smyth GK (2009). Microarray background correction: Maximum likelihood estimation for the normal-exponential convolution, Biostatistics, 10, 352-363. https://doi.org/10.1093/biostatistics/kxn042