• Title/Summary/Keyword: Data Quality Validation

Search Result 379, Processing Time 0.026 seconds

Development of 2D Data Quality Validation Techniques for Pipe-type Underground Facilities (2차원 관로형 지하시설물 정보 품질검증기술 개발)

  • Sang-Keun Bae;Sang-Min Kim;Eun-Jin Yoo;Keo-Bae Lim;Da-Woon Jeong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.285-292
    • /
    • 2023
  • As various accidents have occurred in underground spaces, we aim to improve the quality validation standards and methods as specified in the Regulations on Producing Integrated Map of Underground Spaces devised by the Ministry of Land, Infrastructure and Transport of the Republic of Korea for a high-quality integrated map of underground spaces. Specifically, we propose measures to improve the quality assurance of pipeline-type underground facilities, the so-called life lines given their importance for citizens' daily activities and their highest risk of accident among the 16 types of underground facilities. After implementing quality validation software based on the developed quality validation standards, the adequacy of the validation standards was demonstrated by testing using data from two-dimensional water supply facilities in some areas of Busan, Korea. This paper has great significance in that it has laid the foundation for reducing the time and manpower required for data quality inspection and improving data quality reliability by improving current quality validation standards and developing technologies that can automatically extract errors through software.

Validation of Quality Control Algorithms for Temperature Data of the Republic of Korea (한국의 기온자료 품질관리 알고리즘의 검증)

  • Park, Changyong;Choi, Youngeun
    • Atmosphere
    • /
    • v.22 no.3
    • /
    • pp.299-307
    • /
    • 2012
  • This study is aimed to validate errors for detected suspicious temperature data using various quality control procedures for 61 weather stations in the Republic of Korea. The quality control algorithms for temperature data consist of four main procedures (high-low extreme check, internal consistency check, temporal outlier check, and spatial outlier check). Errors of detected suspicious temperature data are judged by examining temperature data of nearby stations, surface weather charts, hourly temperature data, daily precipitation, and daily maximum wind direction. The number of detected errors in internal consistency check and spatial outlier check showed 4 days (3 stations) and 7 days (5 stations), respectively. Effective and objective methods for validation errors through this study will help to reduce manpower and time for conduct of quality management for temperature data.

An Evaluation Study on Artificial Intelligence Data Validation Methods and Open-source Frameworks (인공지능 데이터 품질검증 기술 및 오픈소스 프레임워크 분석 연구)

  • Yun, Changhee;Shin, Hokyung;Choo, Seung-Yeon;Kim, Jaeil
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.10
    • /
    • pp.1403-1413
    • /
    • 2021
  • In this paper, we investigate automated data validation techniques for artificial intelligence training, and also disclose open-source frameworks, such as Google's TensorFlow Data Validation (TFDV), that support automated data validation in the AI model development process. We also introduce an experimental study using public data sets to demonstrate the effectiveness of the open-source data validation framework. In particular, we presents experimental results of the data validation functions for schema testing and discuss the limitations of the current open-source frameworks for semantic data. Last, we introduce the latest studies for the semantic data validation using machine learning techniques.

Case Study of BIM Quality Assurance (BIM 모델의 품질검증 사례연구)

  • Jeong, Yeon-Suk;Park, Sang-Il;Lee, Sang-Ho
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2010.04a
    • /
    • pp.379-382
    • /
    • 2010
  • This study proposes a way to validate BIM data quality in BIM applications. Solibri model checker is adopted as a module development platform, which is based on Java programming language. The platform makes application developers implement BIM model checker for their own purpose. This study has developed a BIM validation module for circulation analysis of building design. The validation module enables end-users to automatically detect data corrupted or not defined. In case studies, the module found that an IFC file generated from a BIM software has wrong relation information between a space and boundary elements. A building model should satisfy modeling requirements and then domain users can get analysis results. The BIM data validation module needs to be developed in each BIM application domain.

  • PDF

Automatic Validation of the Geometric Quality of Crowdsourcing Drone Imagery (크라우드소싱 드론 영상의 기하학적 품질 자동 검증)

  • Dongho Lee ;Kyoungah Choi
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.577-587
    • /
    • 2023
  • The utilization of crowdsourced spatial data has been actively researched; however, issues stemming from the uncertainty of data quality have been raised. In particular, when low-quality data is mixed into drone imagery datasets, it can degrade the quality of spatial information output. In order to address these problems, the study presents a methodology for automatically validating the geometric quality of crowdsourced imagery. Key quality factors such as spatial resolution, resolution variation, matching point reprojection error, and bundle adjustment results are utilized. To classify imagery suitable for spatial information generation, training and validation datasets are constructed, and machine learning is conducted using a radial basis function (RBF)-based support vector machine (SVM) model. The trained SVM model achieved a classification accuracy of 99.1%. To evaluate the effectiveness of the quality validation model, imagery sets before and after applying the model to drone imagery not used in training and validation are compared by generating orthoimages. The results confirm that the application of the quality validation model reduces various distortions that can be included in orthoimages and enhances object identifiability. The proposed quality validation methodology is expected to increase the utility of crowdsourced data in spatial information generation by automatically selecting high-quality data from the multitude of crowdsourced data with varying qualities.

A Study on HVAC Parameter Monitoring System (Regarding Computer Validation) (HVAC 파라미터 모니터링 시스템에 대한 고찰 (Computer Validation 중심으로))

  • Kim, Jong-Gu
    • Proceedings of the SAREK Conference
    • /
    • 2008.06a
    • /
    • pp.90-95
    • /
    • 2008
  • This article presents practical advice regarding the implementation and management of an impeccable Building Management System. The BMS was introduced to the series of computerized systems including manufacturing, storage, distribution, and quality control. Recently revised GMP regulation is requesting an improvement in drug product quality regulatory system by computer system validation. Quality is critical to guarantee the efficacy and the safety of drugs and is approved in the evaluation process after the audit trail application. HVAC parameter monitoring system will record the identity of operators entering or confirming critical data. Authority to amend entered data should be restricted to nominated persons. Any alteration to an entry of critical data should be authorized in advance and recorded with the reason for the change.

  • PDF

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Yoon, Hoijin
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.549-555
    • /
    • 2021
  • Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

A Study on Quality Checking of National Scholar Content DB

  • Kim, Byung-Kyu;Choi, Seon-Hee;Kim, Jay-Hoon;You, Beom-Jong
    • International Journal of Contents
    • /
    • v.6 no.3
    • /
    • pp.1-4
    • /
    • 2010
  • The national management and retrieval service of the national scholar Content DB are very important. High quality content can improve the user's utilization and satisfaction and be a strong base for both the citation index creation and the calculation of journal impact factors. Therefore, the system is necessary to check data quality effectively. We have closely studied and developed a webbased data quality checking system that will support anything from raw digital data to its automatic validation as well as hands-on validation, all of which will be discussed in this paper.

A Study on the Efficacy and Equivalence of D-antigen Quantitative Analysis through QbD6sigma Process (QbD6시그마 프로세스를 통한 D-항원 정량 시험법의 유효성과 동등성에 관한 연구)

  • Kim, Kang Hee;Hyun-jung, Kim
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.831-842
    • /
    • 2022
  • Purpose: This study carried out the Quality by Design (QbD)6σ process to verify the effectiveness and equivalence of the finished D-antigen quantitative test method, and compared the OFAT-based method validation and test result acceptance criteria with the Analytical Quality by Design (AQbD)-based method validation and test method. This is a study on how to reduce the risk of delay in permit change by increasing the reliability of permit data in the existing method by statistically analyzing the results. Methods: With the QbD6σ process, the effectiveness and equivalence of the D-antigen quantitative test method were verified with the data of the existing test method and the new test method. Results: Method validation tests are performed based on AQbD. Critical Method Parameters are identified through risk assessment, and single/combined actions are verified by designing and performing tests for Critical Method Parameters (analysis of variance, full factorial design method). Method validation can be effectively accomplished with the QbD6σ process. Conclusion: The use of QbD6σ can be used to achieve satisfactory results for both pharmaceutical companies and regulators by using appropriate statistical analytical methods for method validation as required by regulatory agencies.

Estimation of Pollutant Load Using Genetic-algorithm and Regression Model (유전자 알고리즘과 회귀식을 이용한 오염부하량의 예측)

  • Park, Youn Shik
    • Korean Journal of Environmental Agriculture
    • /
    • v.33 no.1
    • /
    • pp.37-43
    • /
    • 2014
  • BACKGROUND: Water quality data are collected less frequently than flow data because of the cost to collect and analyze, while water quality data corresponding to flow data are required to compute pollutant loads or to calibrate other hydrology models. Regression models are applicable to interpolate water quality data corresponding to flow data. METHODS AND RESULTS: A regression model was suggested which is capable to consider flow and time variance, and the regression model coefficients were calibrated using various measured water quality data with genetic-algorithm. Both LOADEST and the regression using genetic-algorithm were evaluated by 19 water quality data sets through calibration and validation. The regression model using genetic-algorithm displayed the similar model behaviors to LOADEST. The load estimates by both LOADEST and the regression model using genetic-algorithm indicated that use of a large proportion of water quality data does not necessarily lead to the load estimates with smaller error to measured load. CONCLUSION: Regression models need to be calibrated and validated before they are used to interpolate pollutant loads, as separating water quality data into two data sets for calibration and validation.