Browse > Article
http://dx.doi.org/10.5351/KJAS.2004.17.3.605

A Study on the Data Fusion for Data Enrichment  

정성석 (전북대학교 수학 통계정보과학과)
김순영 (전북대학교 수학 통계정보과학과)
김현진 (전북대학교 수학 통계정보과학과)
Publication Information
The Korean Journal of Applied Statistics / v.17, no.3, 2004 , pp. 605-617 More about this Journal
Abstract
One of the best important thing in data mining process is the quality of data used. When we perform the mining on data with excellent quality, the potential value of data mining can be improved. In this paper, we propose the data fusion technique for data enrichment that one phase can improve data quality in KDD process. We attempted to add k-NN technique to the regression technique, to improve performance of fusion technique through reduction of the loss of information. Simulations were performed to compare the proposed data fusion technique with the regression technique. As a result, the newly proposed data fusion technique is characterized with low MSE in continuous fusion variables.
Keywords
Statistical matching; Data enrichment; Data mining; k-Nearest neighbor; Recipient file; Donor file;
Citations & Related Records
연도 인용수 순위
  • Reference
1 U.S. Department of Commerce, (1980). Report on exact and statistical matching techniques, Statistical Policy Working Paper 5. Washington, DC: Federal Committee on Statistical Methodology
2 Van der Putten, P., Joost N. K. and Gupta, A. (2002). Why the information explosion can be bad for data mining, and how data fusion provides a way out, Second SIAM International Conference on Data Mining, Arlington, April, 11-13
3 Yoshizoe, Y. and Araki, M. (1999). Use of statistical matching for household surveys in Japan, In 52nd Session of the International Statistical Institute, Helsinki, Finland
4 Ingram, D., O'Hare, J., Scheuren, F. and Turek, J. (2000). Statistical matching: a new validation case study, Proceedings of the Survey Research Methods Section, American Statistical Association
5 R$\"a$ssler, S. (2002). Statistical Matching : A frequentist theory, Practical applications, and alternative Bayesian approaches, Springer Verlag, New York
6 Saporta, G. (2002). Data fusion and data grafting, Computational Statistics & Data Analysis, 38, 465-473   DOI   ScienceOn