Browse > Article
http://dx.doi.org/10.5391/JKIIS.2004.14.6.789

A Sparse Data Preprocessing Using Support Vector Regression  

Jun, Sung-Hae (청주대학교 통계학과)
Park, Jung-Eun (서강대학교 컴퓨터학과)
Oh, Kyung-Whan (서강대학교 컴퓨터학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.14, no.6, 2004 , pp. 789-792 More about this Journal
Abstract
In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.
Keywords
Sparse data; Preprocessing; Support vector machine;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. B. Rubin, “Multiple Imputation for Nonresponse in Surveys”, John Wiley & Sons, 1987.
2 R. J. A. Lavori, R. Dawson, D. Shera, “A Multiple Imputation Strategy for Clinical Trials with Truncation of Patent Data”, Statistics in Medicine, vol. 14, 1913-1925, 1995.   DOI   ScienceOn
3 G. Casella, R. L. Berger, “Statistical Inference”, Duxbury Press, (1990).
4 C. Cortes, V. Vapnik, “Support Vector Networks”, Machine Learning, vol. 20, 273-297, 1995.
5 J. Han, K. Kamber, "Data Mining: concepts and Techniques", Morgan Kaufmann Publishers, 2000.
6 D. C. Hoaglin, F. Mosteller, J. W. Tukey, nderstanding robust and exploratory data analysis”, John Wiley & Sons Inc. 2000.
7 V. N. Vapnik, “The Nature of Statistical Learning Theory”, Springer, 1995.
8 UCI Machine Learning Repository, www.ics.uci. edu/mlearn
9 R. J. A. Little, D. B. Rubin, “Statistical Analysis with Missing Data”, Wiley Interscience, 2002.
10 J. L. Schafer, “Analysis of Incomplete Multivariate Data”, Chapman and Hall, 1997.
11 V. N. Vapnik, “Statistical Learning Theory”, Hohn Wiley & Sons, 1998.