DOI QR코드

DOI QR Code

Technical Trends of Time-Series Data Imputation

시계열 데이터 결측치 처리 기술 동향

  • Published : 2021.08.01

Abstract

Data imputation is a crucial issue in data analysis because quality data are highly correlated with the performance of AI models. Particularly, it is difficult to collect quality time-series data for uncertain situations (for example, electricity blackout, delays for network conditions). Thus, it is necessary to research effective methods of time-series data imputation. Many studies on time-series data imputation can be divided into 5 parts, including statistical based, matrix-based, regression-based, deep learning (RNN and GAN) based methodologies. This study reviews and organizes these methodologies. Recently, deep learning-based imputation methods are developed and show excellent performance. However, it is associated to some computational problems that make it difficult to use in real-time system. Thus, the direction of future work is to develop low computational but high-performance imputation methods for application in the real field.

Keywords

Acknowledgement

이 논문은 2021년도 정부(과학기술정보통신부)의 재원으로 정보통신산업진흥원의 지원을 받아 수행된 에너지 AI 융합 연구개발 사업임[No. S0317-21-1001]

References

  1. A. Donders et al., "A gentle introduction to imputation of missing values," J. Clin. Epidemiol., vol. 59, no. 10, 2006, pp. 1087-1091. https://doi.org/10.1016/j.jclinepi.2006.01.014
  2. https://scikit-learn.org/stable/
  3. S. Moritz et al., "ImputeTS: Time series missing value imputation in R," R J., vol. 9, no. 1, 2017, p. 207. https://doi.org/10.32614/rj-2017-009
  4. 윤성철, "결측값의 대치법," 대한예방의학회 예방의학회지, 제37권 제3호, 2004, pp. 209-211.
  5. D.B. Rubin et al., "Multiple imputation for nonresponse in surveys," vol. 81, Wiley, Hoboken, NJ, USA, 2004.
  6. B.N. Eskelson et al., "The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases," Scand. J. For. Res., vol. 24, no. 3, 2009, pp. 235-246. https://doi.org/10.1080/02827580902870490
  7. I.R. White et al., "Multiple imputation using chained equations: Issues and guidance for practice," Stat. Med., vol. 30, no. 4, 2011, pp. 377-399. https://doi.org/10.1002/sim.4067
  8. A. Mnih et al., "Probabilistic matrix factorization," Adv. Neural Inf. Process. Syst., vol. 20, 2007, pp. 1257-1264.
  9. H. Yu et al., "Temporal regularized matrix factorization for high-dimensional time series prediction," in Proc. Int. Conf. Neural Inf. Process. Syst., Barcelona, Spain, Dec. 2016, pp. 847-855.
  10. A. Agarwal et al., "Model agnostic time series analysis via matrix estimation," in Proc. ACM Meas. Anal. Comput. Syst., vol. 2, no. 3, 2018, pp. 1-39.
  11. O.D. Akyildiz et al., "Probabilistic sequential matrix factorization," arXiv preprint, CoRR, 2019, arXiv:1910.03906
  12. Z. Zhang, "Missing data imputation: Focusing on single imputation," Ann. of transl. med., vol. 4, no. 1, 2016.
  13. G.E. Box et al., "Time series analysis: Forecasting and control," Wiley, Hoboken, NJ, USA, 2015.
  14. X. Chen et al., "Low-rank autoregressive tensor completion for multivariate time series forecasting," arXiv preprint, CoRR, 2020, preprint arXiv:2006.10436
  15. J. Yoon et al., "Estimating missing data in temporal data streams using multi-directional recurrent neural networks," IEEE Trans. Biomed. Eng., vol. 66, no. 5, 2018, pp. 1477-1490. https://doi.org/10.1109/tbme.2018.2874712
  16. Z. Che et al., "Recurrent neural networks for multivariate time series with missing values," Sci. Rep. vol. 8, no. 1, 2018, pp. 1-12. https://doi.org/10.1038/s41598-017-17765-5
  17. W. Cao et al., "Brits: Bidirectional recurrent imputation for time series," arXiv preprint, CoRR, 2018, arXiv:1805.10572
  18. Y.F. Zhang et al., "SSIM-A deep learning approach for recovering missing time series sensor data," IEEE Internet Things J., vol. 6, no. 4, 2019, pp. 6618-6628. https://doi.org/10.1109/jiot.2019.2909038
  19. Y. Rubanova et al., "Latent odes for irregularly-sampled time series," arXiv preprint, CoRR, 2019, arXiv:1907.03907
  20. J. Ma et al., "CDSA: Cross-dimensional self-attention for multivariate, geo-tagged time series imputation," arXiv preprint, CoRR, 2019, arXiv:1905.09904
  21. J. Yoon et al., "Gain: Missing data imputation using generative adversarial nets," in Proc. Int. Conf. Mach. Learn. (PMLR), Stockholm, Sweden, July 2018, pp. 5689-5698.
  22. Y. Luo et al., "Multivariate time series imputation with generative adversarial networks," in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 1603-1614.
  23. Y. Luo et al., "E2gan: End-to-end generative adversarial network for multivariate time series imputation," AAAI Press, 2019, pp. 3094-3100.
  24. Y. Liu et al., "Naomi: Non-autoregressive multiresolution sequence imputation," arXiv preprint, CoRR, 2019, arXiv:1901.10946
  25. M. Gupta et al., "Time-series imputation and prediction with bi-directional generative adversarial networks," arXiv preprint, CoRR, 2020, arXiv:2009.08900
  26. https://paperswithcode.com/task/multivariate-time-seriesimputation