Browse > Article
http://dx.doi.org/10.3745/JIPS.04.0007

Imputation of Medical Data Using Subspace Condition Order Degree Polynomials  

Silachan, Klaokanlaya (Department of Computing, Silpakorn University)
Tantatsanawong, Panjai (Department of Computing, Silpakorn University)
Publication Information
Journal of Information Processing Systems / v.10, no.3, 2014 , pp. 395-411 More about this Journal
Abstract
Temporal medical data is often collected during patient treatments that require personal analysis. Each observation recorded in the temporal medical data is associated with measurements and time treatments. A major problem in the analysis of temporal medical data are the missing values that are caused, for example, by patients dropping out of a study before completion. Therefore, the imputation of missing data is an important step during pre-processing and can provide useful information before the data is mined. For each patient and each variable, this imputation replaces the missing data with a value drawn from an estimated distribution of that variable. In this paper, we propose a new method, called Newton's finite divided difference polynomial interpolation with condition order degree, for dealing with missing values in temporal medical data related to obesity. We compared the new imputation method with three existing subspace estimation techniques, including the k-nearest neighbor, local least squares, and natural cubic spline approaches. The performance of each approach was then evaluated by using the normalized root mean square error and the statistically significant test results. The experimental results have demonstrated that the proposed method provides the best fit with the smallest error and is more accurate than the other methods.
Keywords
Imputation; Personal Temporal Data; Polynomial Interpolation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 N. Eisemann, A. Waldmann, and A. Katalinic, "Imputation of missing values of tumour stage in population-based cancer registration," BMC Medical Research Methodology, vol. 11, p. 129, Sep. 2011.   DOI
2 S. Tsumoto, "Rule discovery in large time-series medical databases," in Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science Volume 1704, J. Zytkow and J. Rauch, Eds., Heidelberg: Springer Berlin, pp. 23-31, 1999.
3 E. Acuna and C. Rodriguez, "The Treatment of Missing Values and its Effect on Classifier Accuracy," in Classification, Clustering, and Data Mining Applications, D. Banks, F. McMorris, P. Arabie, and W. Gaul, Eds.ed: Springer Berlin Heidelberg, 2004, pp. 639-647.
4 T. Cover and P. Hart, "Nearest neighbor pattern classification," IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967.   DOI   ScienceOn
5 T. H. Bo, B. Dysvik, and I. Jonassen, "LSimpute: accurate estimation of missing values in microarray data with least squares methods," Nucleic Acids Research, vol. 32, no. 3, p. e34, Feb. 2004.   DOI   ScienceOn
6 C. De Boor, A Practical Guide to Splines, Applied Mathematical Sciences Volume 27. New York. NY: Springer-Verlag, 1978.
7 G. Walberg, "Cubic spline interpolation: a review," Department of Computer Science, Columbia University, New York, NY, Technical Report CUCS-389-88, 1988.
8 B. Rosner, R. J. Glynn, and M. L. Lee, "The Wilcoxon signed rank test for paired comparisons of clustered data," Biometrics, vol. 62, no. 1, pp. 185-192, Mar. 2006.   DOI   ScienceOn
9 M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods. New York, NY: Wiley, 1973.
10 M. Dvornikov, "Spectral properties of numerical differentiation," Journal of Concrete and Applicable Mathematics, vol. 6, no. 1, pp. 81-89, Jan. 2008.
11 J. B. Scarborough, Numerical Mathematical Analysis, 6th ed. Baltimore, MD: Johns Hopkins Press, 1966.
12 J. M. Jerez, I. Molina, J. L. Subirats, and L. Franco, "Missing data imputation in breast cancer prognosis," in Proceedings of the 24th IASTED International Conference on Biomedical Engineering, Innsbruck, Austria, 2006, pp. 323-328.
13 N. Viana, A. Pereira, R. Ribeiro, and A. Donati, "Handling missing values in solar array performance degradation forecasting," in Proceedings of the 15th Mini-EURO Conference on Managing Uncertainty in Decision Support Models, Coimbra, Portugal, September 22-24, 2004.
14 D. N. Varsamis and N. P. Karampetakis, "On a special case of the two-variable Newton interpolation polynomial," in 2nd International Conference on Communications, Computing and Control Applications, Marseilles, France, December 6-8, 2012, pp. 1-6.
15 K. E. Atkinson, An Introduction to Numerical Analysis, 2nd ed. New York, NY: Wiley, 1989.
16 M. N. Noraziana, Y. A. Shukric, R. N. Azamc, and A. M. M. Al Bakrib, "Estimation of missing values in air pollution data using single imputation techniques," ScienceAsia, vol. 34, no. 3, pp. 341-345, 2008.   DOI
17 S. Bose, C. Das, S. Dutta, and S. Chattopadhyay, "A novel interpolation based missing value estimation method to predict missing values in microarray gene expression data," in International Conference on Communications, Devices and Intelligent Systems, Kolkata, India, December 28-29, 2012, pp. 318-321.
18 J. M. Jerez, I. Molina, P. J. Garcia-Laencina, E. Alba, N. Ribelles, M. Martin, and L. Franco, "Missing data imputation using statistical and machine learning methods in a real breast cancer problem," Artificial Intelligence in Medicine, vol. 50, no. 2, pp. 105-115, Oct. 2010.   DOI   ScienceOn
19 X. Zhang, X. Song, H. Wang, and H. Zhang, "Sequential local least squares imputation estimating missing value of microarray data," Computers in Biology and Medicine, vol. 38, no. 10, pp. 1112-1120, Oct. 2008.   DOI   ScienceOn
20 C. M. Antunes and A. L. Oliveira, "Temporal data mining: an overview," in KDD 2001 Workshop on Temporal Data Mining, San Francisco, CA, August 26, 2001.
21 A. R. Donders, G. J. van der Heijden, T. Stijnen, and K. G. Moons, "Review: a gentle introduction to imputation of missing values," J Clinical Epidemiology, vol. 59, no. 10, pp. 1087-1091, Oct. 2006.   DOI   ScienceOn
22 A. Sahu, T. Swarnkar, and K. Das, "Estimation methods for microarray data with missing values:a review," International Journal of Computer Science & Information Technologies, vol. 2, no. 2, pp. 614-620, Mar. 2011.
23 B. Mehala, P. Ranjit Jeba Thangaiah, and K. Vivekanandan, "Selecting scalable algorithms to deal with missing values," International Journal of Recent Trends in Engineering, vol. 1, no. 2, pp. 80-83, May 2009.
24 K. Raja, G. Tholkappia Arasu, and C. S. Nair, "Imputation framework for missing values," International Journal of Computer Trends and Technology, vol. 3, no. 2, pp. 215-219, 2012.
25 J. F. Roddick and M. Spiliopoulou, "A survey of temporal knowledge discovery paradigms and methods," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 750-767, Jul. 2002.   DOI   ScienceOn
26 O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, "Missing value estimation methods for DNA microarrays," Bioinformatics, vol. 17, no. 6, pp. 520-525, Jun. 2001.   DOI   ScienceOn
27 M. H. Dunham, Data Mining Introductory and Advanced Topics. Upper Saddle River, NJ: Prentice Hall/Pearson Education, 2003.