Browse > Article
http://dx.doi.org/10.5351/CKSS.2005.12.2.509

A Study on a Statistical Matching Method Using Clustering for Data Enrichment  

Kim Soon Y. (전북대학교 통계정보과학과)
Lee Ki H. (전주대학교 경영학부)
Chung Sung S. (전북대학교 통계정보과학과)
Publication Information
Communications for Statistical Applications and Methods / v.12, no.2, 2005 , pp. 509-520 More about this Journal
Abstract
Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.
Keywords
Clustering; Data enrichment; Data fusion Data Mining; k-Nearest Neighbor; Statistical matching;
Citations & Related Records
연도 인용수 순위
  • Reference
1 van der Putten, P., Joost N. K. and Gupta, A. (2002). Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out, Second SIAM International Conference on Data Mining, Arlington, April 11-13
2 Yoshizoe, Y. and Araki, M. (1999). Use of statistical matching for household surveys In Japan. In 52nd Session of the International Statistical Institute, Helsinki, Finland
3 Saporta, G. (2002). Data fusion and data grafting, Computational Statistics & Data Analysis 38 465-473   DOI   ScienceOn
4 U.S. Department of Commerce, (1980). Report on exact and statistical matching techniques. Statistical Policy Working Paper 5. Washington, DC: Federal Committee on Statistical Methodology
5 정성석, 김순영, 김현진 (2004). 데이터 보강을 위한 데이터 통합기법에 관한 연구, '응용통계연구', 제17권, 605-617
6 Blake, C. L. and Merz, C. J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/-mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
7 Ingram, D., O'Hare, J., Scheuren, F. and Turek, J (2000). Statistical matching: a new validation case study. Proceedings of the Survey Research Methods Section, American Statistical Association
8 Rassler, S. (2002). Statistical Matching : A frequentist theory, practical applications, and alternative Bayesian approaches. New York, Springer Verlag