Browse > Article
http://dx.doi.org/10.9708/jksci.2011.16.6.119

A Data Cleansing Strategy for Improving Data Quality of National R&D Information - Case Study of NTIS  

Shin, Sung-Ho (Dept. of Information Technology Research, KISTI)
Yoon, Young-Jun (Dept. NTIS, KISTI)
Yang, Myung-Suk (Dept. NTIS, KISTI)
Kim, Jin-Man (Dept. NTIS, KISTI)
Shon, Kang-Ryul (Dept. NTIS, KISTI)
Abstract
On the point of data quality management, data quality is influenced by quality policy, quality organization, business process, and business rule. Business rules, guide of data manipulation, have effects on data quality directly. In case of building an integration database among distributed databases, defining business rule is more important because data integration needs to consider heterogeneous structure, code, and data standardization. Also data value has various figures depended on data type, unit, and transcription. Finally, database structure and data value problem have to be solved to improve data quality. For handling them, it is needed to draw database integration model and cleanse data in integrated database. NTIS(stands for National science and Technology Information Service) has an aim to serve users who need all information about national R&D by internet, and for that aim, it has a integrated database which has been made with several database sources. We prove that database integration model and data cleansing are needed to build a successful integrated database through NTIS case study.
Keywords
Data Quality; Data Cleansing; Database Integration;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 J. I. Maletic and A. Marcus, "Data Cleansing: Beyond Integrity Analysis," Proceedings of the Conference on Information Quality, pp. 200-209, Jun. 2000.
2 E. Simoudis et al, "Using Recon for Data Cleaning," KDD-95 Proceedings, pp. 282-287, 1995.
3 I. Guyon et al, "Discovering Informative Patterns and Data Cleaning," AAAI-94 Workshop on Knowledge Discovery in Databased, AAAI Technical Report WS-94-03, pp. 145-156, Mar. 1996.
4 R. Kimbal, "Dealing with Dirty Data: Every serious data warehouse application needs good data, yet few people address the issue", DBMS, Vol. 9, No. 10, pp. 55-62, Sep. 1996.
5 M. A. Hernandez and J. S. Stolfo, "Real-World Data is Dirty: Data Cleansing and The Merge/Purge Problem," Journal of Data Mining and Knowledge Discovery, Vol. 2, No. 1, pp. 9-37, Jan. 1998.   DOI   ScienceOn
6 H. J. Whang, "A Study on Data Cleansing Methodology," Baewha Women's Univ., Vol. 23, pp. 185-203, May 2004.
7 A. Levitin and T. Redman "A Model of the Data (life) cycles with application to quality," Information and Software Technology, Vol. 35, No. 4, pp. 217-223, Apr. 1993.   DOI   ScienceOn
8 A. P. Sheth and J. A. Larson, "Federated database systems for managing distributed, heterogeneous, and autonomous databases," ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases, Vol. 22, No. 3, pp. 183-236, Sep. 1990.   DOI
9 I. N. Kwon et al., "R&D Information Distribution Infrastructure," Journal of scientific & technological knowledge infrastructure, Vol. 30, pp. 45-53, May 2008.
10 Data Quality Assessment Procedure Manual(Ver1.0), Korea Database Agency, Oct. 2009
11 K. R. Shon, "A Data Quality Improvement Method in Integrations of Distributed Data : National Science & Technology Information Services," The Journal of Korean Institute of Marine Information and Communication Sciences, Vol. 13, No. 8, pp. 1623-1636, Aug. 2009.
12 J. A. Seol, "Design of Data Integrating System Using XML Metadata Registry in a Distributed Environment", Kwangwoon Univ., Feb. 2004.
13 A. D. Chapman, "Principles and Methods of Data Cleaning - Primary Species and Species-Occurrence Data," Global Biodiversity Information Facility, Jul. 2005.
14 Jae-Soo Kim, "Introduction of NTIS," Journal of Scientific & Technological Knowledge Infrastructure, Vol. 30, pp. 31-34, May 2008.
15 H. Galhardas et al, "An Extensible Framework for Data Cleansing," Rapport Recherche, Institute National de Recherche en informatique et en Automatique, Jul. 1999.
16 E. Rahm and H. H. Do, "Data Cleaning: Problems and Current Approaches," IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 23, No. 4, pp. 3-13, Dec. 2000.