A Data Quality Measuring Tool

데이타 품질 측정 도구

  • 양자영 (이화여자대학교 컴퓨터학과) ;
  • 최병주 (이화여자대학교 컴퓨터학과)
  • Published : 2003.06.01

Abstract

Quality of the software is affected by quality of data required for operating the actual software. Especially, it is important that assure the quality of data in a knowledge-engineering system that extracts the meaningful knowledge from stored data. In this paper, we developed DAQUM tool that can measure quality of data. This paper shows: 1) main contents for implement of DAQUM tool; 2) detection of dirty data via DAQUM tool through case study and measurement of data quality which is quantifiable from end-user's point of view. DAQUM tool will greatly contribute to improving quality of software product that processes mainly the data through control and measurement of data quality.

소프트웨어 제품을 실행시키기 위해 요구되는 데이타의 품질은 소프트웨어 품질에 영향을 미치고 있다 특히 대용량의 데이타로부터 의미 있는 지식을 추출하는 지식공학 시스템에서 원시 데이터의 품질을 보장하는 일은 매우 중요하다. 본 논문에서는 데이타의 측정 도구인 DAQUM도구를 설계 구현하였다. 본 논문에서는 DAQUM도구의 설계 및 구현에 관한 주요내용을 기술하고, 사례연구를 통하여 DAQUM도구가 오류데이타를 검색하여 데이타 사용자 관점에서 데이타의 품질을 정량적으로 측정 가능하도록 함을 나타낸다. DAQUM도구는 데이타의 품질 측정 및 품질 제어를 가능하게 함으로써 데이타를 주로 처리하는 소프트웨어 제품의 품질 향상에 기여할 수 있다.

Keywords

References

  1. ISO/IEC 14598-1,2,3,4,5,6, JTC 1 SC 7 Documents, 1999
  2. Won Kim et 'A Component-Based Knowledge Engineering Architecture,' JOOP, vol.12, no.6, pp 40-48, 1999
  3. Won Kim et al. 'The Chamois component-based knowledge engineering framework,' IEEE Computer, May 2002
  4. Won Kim et al. 'The Chamois Re-configurable Data-Mining Architecture,' Journal of Object Technology, pp21-34 , June 2002
  5. D. Ballou and G.K. Tayi 'Enhancing Data Quality in Data Warehouse Environments,' Communications of the ACM, vol. 42, no. 1, pp. 73-78, Jan. 1999 https://doi.org/10.1145/291469.291471
  6. Amir Parssian, Sumit Sarkar, Varghese S. Jacob, 'Assessing data quality for information products,' Proceeding of the 20th international conference on Information Systems, p.428-433, January, 1999
  7. Won Kim, Byoung-Ju Choi, Eui-Kyeong Hong, Soo-Kyung Kim, Doheon Lee, 'A Taxonomy of Dirty Data,' Data Mining and Knowledge Discovery, 2002, Acceptedfor publication https://doi.org/10.1023/A:1021564703268
  8. Richard Y. Wang 'A Product Perspective on Total Data Quality Management,' Communication of the ACM, vol. 41, no. 2, pp. 58-65, Feb. 1998 https://doi.org/10.1145/269012.269022
  9. Ballou, D. P. and Pazer, H.L 'Modeling Data and process Quality in multi-input, multi-output information systems,' Management Science 31, pp 150-162, Feb. 1998 https://doi.org/10.1287/mnsc.31.2.150
  10. R. Wang, V. Storey and C. Firth 'A Framework for Analysis of Data Quality Research,' IEEE Transactions on Knowledge and Engineering, vol. 7, no. 4, pp. 623-640, Aug. 1995 https://doi.org/10.1109/69.404034
  11. Wang et al. 'Data Quality in context,' Communication of the ACM, vol. 40, no 5, May 1997 https://doi.org/10.1145/253769.253804
  12. Ken Orr, 'Data Quality and System Theory,' Communications of the ACM, vol.41 , no.2 Feb. 1998 https://doi.org/10.1145/269012.269023