DOI QR코드

DOI QR Code

Imputation of Medical Data Using Subspace Condition Order Degree Polynomials

  • Received : 2013.05.14
  • Accepted : 2014.07.07
  • Published : 2014.09.30

Abstract

Temporal medical data is often collected during patient treatments that require personal analysis. Each observation recorded in the temporal medical data is associated with measurements and time treatments. A major problem in the analysis of temporal medical data are the missing values that are caused, for example, by patients dropping out of a study before completion. Therefore, the imputation of missing data is an important step during pre-processing and can provide useful information before the data is mined. For each patient and each variable, this imputation replaces the missing data with a value drawn from an estimated distribution of that variable. In this paper, we propose a new method, called Newton's finite divided difference polynomial interpolation with condition order degree, for dealing with missing values in temporal medical data related to obesity. We compared the new imputation method with three existing subspace estimation techniques, including the k-nearest neighbor, local least squares, and natural cubic spline approaches. The performance of each approach was then evaluated by using the normalized root mean square error and the statistically significant test results. The experimental results have demonstrated that the proposed method provides the best fit with the smallest error and is more accurate than the other methods.

Keywords

References

  1. C. M. Antunes and A. L. Oliveira, "Temporal data mining: an overview," in KDD 2001 Workshop on Temporal Data Mining, San Francisco, CA, August 26, 2001.
  2. X. Zhang, X. Song, H. Wang, and H. Zhang, "Sequential local least squares imputation estimating missing value of microarray data," Computers in Biology and Medicine, vol. 38, no. 10, pp. 1112-1120, Oct. 2008. https://doi.org/10.1016/j.compbiomed.2008.08.006
  3. A. R. Donders, G. J. van der Heijden, T. Stijnen, and K. G. Moons, "Review: a gentle introduction to imputation of missing values," J Clinical Epidemiology, vol. 59, no. 10, pp. 1087-1091, Oct. 2006. https://doi.org/10.1016/j.jclinepi.2006.01.014
  4. A. Sahu, T. Swarnkar, and K. Das, "Estimation methods for microarray data with missing values:a review," International Journal of Computer Science & Information Technologies, vol. 2, no. 2, pp. 614-620, Mar. 2011.
  5. B. Mehala, P. Ranjit Jeba Thangaiah, and K. Vivekanandan, "Selecting scalable algorithms to deal with missing values," International Journal of Recent Trends in Engineering, vol. 1, no. 2, pp. 80-83, May 2009.
  6. K. Raja, G. Tholkappia Arasu, and C. S. Nair, "Imputation framework for missing values," International Journal of Computer Trends and Technology, vol. 3, no. 2, pp. 215-219, 2012.
  7. O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, "Missing value estimation methods for DNA microarrays," Bioinformatics, vol. 17, no. 6, pp. 520-525, Jun. 2001. https://doi.org/10.1093/bioinformatics/17.6.520
  8. J. F. Roddick and M. Spiliopoulou, "A survey of temporal knowledge discovery paradigms and methods," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 750-767, Jul. 2002. https://doi.org/10.1109/TKDE.2002.1019212
  9. M. H. Dunham, Data Mining Introductory and Advanced Topics. Upper Saddle River, NJ: Prentice Hall/Pearson Education, 2003.
  10. M. Dvornikov, "Spectral properties of numerical differentiation," Journal of Concrete and Applicable Mathematics, vol. 6, no. 1, pp. 81-89, Jan. 2008.
  11. J. M. Jerez, I. Molina, J. L. Subirats, and L. Franco, "Missing data imputation in breast cancer prognosis," in Proceedings of the 24th IASTED International Conference on Biomedical Engineering, Innsbruck, Austria, 2006, pp. 323-328.
  12. N. Viana, A. Pereira, R. Ribeiro, and A. Donati, "Handling missing values in solar array performance degradation forecasting," in Proceedings of the 15th Mini-EURO Conference on Managing Uncertainty in Decision Support Models, Coimbra, Portugal, September 22-24, 2004.
  13. D. N. Varsamis and N. P. Karampetakis, "On a special case of the two-variable Newton interpolation polynomial," in 2nd International Conference on Communications, Computing and Control Applications, Marseilles, France, December 6-8, 2012, pp. 1-6.
  14. J. B. Scarborough, Numerical Mathematical Analysis, 6th ed. Baltimore, MD: Johns Hopkins Press, 1966.
  15. K. E. Atkinson, An Introduction to Numerical Analysis, 2nd ed. New York, NY: Wiley, 1989.
  16. M. N. Noraziana, Y. A. Shukric, R. N. Azamc, and A. M. M. Al Bakrib, "Estimation of missing values in air pollution data using single imputation techniques," ScienceAsia, vol. 34, no. 3, pp. 341-345, 2008. https://doi.org/10.2306/scienceasia1513-1874.2008.34.341
  17. S. Bose, C. Das, S. Dutta, and S. Chattopadhyay, "A novel interpolation based missing value estimation method to predict missing values in microarray gene expression data," in International Conference on Communications, Devices and Intelligent Systems, Kolkata, India, December 28-29, 2012, pp. 318-321.
  18. J. M. Jerez, I. Molina, P. J. Garcia-Laencina, E. Alba, N. Ribelles, M. Martin, and L. Franco, "Missing data imputation using statistical and machine learning methods in a real breast cancer problem," Artificial Intelligence in Medicine, vol. 50, no. 2, pp. 105-115, Oct. 2010. https://doi.org/10.1016/j.artmed.2010.05.002
  19. N. Eisemann, A. Waldmann, and A. Katalinic, "Imputation of missing values of tumour stage in population-based cancer registration," BMC Medical Research Methodology, vol. 11, p. 129, Sep. 2011. https://doi.org/10.1186/1471-2288-11-129
  20. S. Tsumoto, "Rule discovery in large time-series medical databases," in Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science Volume 1704, J. Zytkow and J. Rauch, Eds., Heidelberg: Springer Berlin, pp. 23-31, 1999.
  21. E. Acuna and C. Rodriguez, "The Treatment of Missing Values and its Effect on Classifier Accuracy," in Classification, Clustering, and Data Mining Applications, D. Banks, F. McMorris, P. Arabie, and W. Gaul, Eds.ed: Springer Berlin Heidelberg, 2004, pp. 639-647.
  22. T. Cover and P. Hart, "Nearest neighbor pattern classification," IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967. https://doi.org/10.1109/TIT.1967.1053964
  23. T. H. Bo, B. Dysvik, and I. Jonassen, "LSimpute: accurate estimation of missing values in microarray data with least squares methods," Nucleic Acids Research, vol. 32, no. 3, p. e34, Feb. 2004. https://doi.org/10.1093/nar/gnh026
  24. C. De Boor, A Practical Guide to Splines, Applied Mathematical Sciences Volume 27. New York. NY: Springer-Verlag, 1978.
  25. G. Walberg, "Cubic spline interpolation: a review," Department of Computer Science, Columbia University, New York, NY, Technical Report CUCS-389-88, 1988.
  26. B. Rosner, R. J. Glynn, and M. L. Lee, "The Wilcoxon signed rank test for paired comparisons of clustered data," Biometrics, vol. 62, no. 1, pp. 185-192, Mar. 2006. https://doi.org/10.1111/j.1541-0420.2005.00389.x
  27. M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods. New York, NY: Wiley, 1973.

Cited by

  1. An optimal resources scheduling strategy on multimedia cloud computing under multi- device constraint vol.76, pp.19, 2017, https://doi.org/10.1007/s11042-015-3140-1
  2. Development of Network Analysis and Visualization System for KEGG Pathways vol.7, pp.3, 2015, https://doi.org/10.3390/sym7031275
  3. Treatment Planning in Smart Medical: A Sustainable Strategy 2016, https://doi.org/10.3745/JIPS.04.0026
  4. A low cost wearable wireless sensing system for paretic hand management after stroke 2016, https://doi.org/10.1007/s11227-016-1787-7