• Title/Summary/Keyword: 데이터 품질 진단

Search Result 86, Processing Time 0.031 seconds

Proposal of diagnosis rule mapping model to support public data quality diagnosis (공공데이터 품질진단 지원을 위한 진단규칙 매핑모델 제안)

  • Jeong, Ha-Na;Kim, Jae-Woong;Lee, Yun-Yeol;Chae, Yi-Geun;Chung, Young-Suk
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.127-128
    • /
    • 2022
  • 정부는 공공데이터 개방을 통해 신산업, 일자리 창출 등 경제 활성화를 위한 도구로 활용하는 것을 목표로 한다. 정부는 고품질의 공공데이터 보유를 위하여 품질 개선 활동을 통해 공공데이터 품질 향상을 진행하고 있다. 그러나 공공데이터 품질관리 수준 진단을 진행하는 담당자의 데이터에 대한 전문성과 이해도에 따라 품질진단 결과에 격차가 발생하여 진단 결과의 신뢰성을 보장하기 어렵다. 본 논문은 공공데이터의 원활한 품질진단 지원을 위해 품질진단규칙 매핑 모델을 제안하여 공공데이터 품질진단의 안정성과 신뢰성을 높인다.

  • PDF

A Study on Domain Discrimination Model for CSV Format Public Data Using Data Distribution Statistics (데이터 분포 통계를 이용한 CSV 형식의 공공데이터 도메인 판별 모델에 관한 연구)

  • Ha-Na Jeong;Jae-Woong Kim;Yun-Yeol Lee;Yi-Geun Chae;Young-Suk Chung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.79-80
    • /
    • 2023
  • 정부는 공공데이터의 품질 관리를 위하여 공공데이터 품질관리 수준평가를 진행하여 공공데이터 품질을 관리하고 있다. 파일 형식의 공공데이터를 진단 시 품질진단 담당자가 대량의 파일데이터를 필드명과 필드 내 데이터에 의존하여 수작업으로 도메인을 판단하여 진단한다. 때문에 품질진단의 정확성을 신뢰하기 어렵고 진단에 많은 시간이 소요된다. 본 논문은 파일형식의 공공데이터 품질진단의 정확성을 확보하고 진단 소요시간을 단축하기 위해 데이터 분포 통계를 이용한 CSV 형식의 공공데이터 도메인 판별 모델을 제안하였다. 제안된 모델을 적용하면 공공데이터 품질의 정확성을 향상하고 진단 소비 시간을 단축시킬 것으로 기대된다.

  • PDF

Proposal of Public Data Quality Management Level Evaluation Domain Rule Mapping Model

  • Jeong, Ha-Na;Kim, Jae-Woong;Chung, Young-Suk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.12
    • /
    • pp.189-195
    • /
    • 2022
  • The Korean government has made it a major national task to contribute to the revitalization of the creative economy, such as creating new industries and jobs, by encouraging the private opening and utilization of public data. The Korean government is promoting public data quality improvement through activities such as conducting public data quality management level evaluation for high-quality public data retention. However, there is a difference in diagnosis results depending on the understanding and data expertise of users of the public data quality diagnosis tool. Therefore, it is difficult to ensure the accuracy of the diagnosis results. This paper proposes a public data quality management level evaluation domain rule mapping model applicable to validation diagnosis among the data quality diagnosis standards. This increases the stability and accuracy of public data quality diagnosis.

A Study on the Domain Discrimination Model of CSV Format Public Open Data

  • Ha-Na Jeong;Jae-Woong Kim;Young-Suk Chung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.129-136
    • /
    • 2023
  • The government of the Republic of Korea is conducting quality management of public open data by conducting a public data quality management level evaluation. Public open data is provided in various open formats such as XML, JSON, and CSV, with CSV format accounting for the majority. When diagnosing the quality of public open data in CSV format, the quality diagnosis manager determines and diagnoses the domain for each field based on the field name and data within the field of the public open data file. However, it takes a lot of time because quality diagnosis is performed on large amounts of open data files. Additionally, in the case of fields whose meaning is difficult to understand, the accuracy of quality diagnosis is affected by the quality diagnosis person's ability to understand the data. This paper proposes a domain discrimination model for public open data in CSV format using field names and data distribution statistics to ensure consistency and accuracy so that quality diagnosis results are not influenced by the capabilities of the quality diagnosis person in charge, and to support shortening of diagnosis time. As a result of applying the model in this paper, the correct answer rate was about 77%, which is 2.8% higher than the file format open data diagnostic tool provided by the Ministry of Public Administration and Security. Through this, we expect to be able to improve accuracy when applying the proposed model to diagnosing and evaluating the quality management level of public data.

A Study on Automation of Big Data Quality Diagnosis Using Machine Learning (머신러닝을 이용한 빅데이터 품질진단 자동화에 관한 연구)

  • Lee, Jin-Hyoung
    • The Journal of Bigdata
    • /
    • v.2 no.2
    • /
    • pp.75-86
    • /
    • 2017
  • In this study, I propose a method to automate the method to diagnose the quality of big data. The reason for automating the quality diagnosis of Big Data is that as the Fourth Industrial Revolution becomes a issue, there is a growing demand for more volumes of data to be generated and utilized. Data is growing rapidly. However, if it takes a lot of time to diagnose the quality of the data, it can take a long time to utilize the data or the quality of the data may be lowered. If you make decisions or predictions from these low-quality data, then the results will also give you the wrong direction. To solve this problem, I have developed a model that can automate diagnosis for improving the quality of Big Data using machine learning which can quickly diagnose and improve the data. Machine learning is used to automate domain classification tasks to prevent errors that may occur during domain classification and reduce work time. Based on the results of the research, I can contribute to the improvement of data quality to utilize big data by continuing research on the importance of data conversion, learning methods for unlearned data, and development of classification models for each domain.

  • PDF

A Study of Big Data Domain Automatic Classification Using Machine Learning (머신러닝을 이용한 빅데이터 도메인 자동 판별에 관한 연구)

  • Kong, Seongwon;Hwang, Deokyoul
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.11-18
    • /
    • 2018
  • This study is a study on domain automatic classification for domain - based quality diagnosis which is a key element of big data quality diagnosis. With the increase of the value and utilization of Big Data and the rise of the Fourth Industrial Revolution, the world is making efforts to create new value by utilizing big data in various fields converged with IT such as law, medical, and finance. However, analysis based on low-reliability data results in critical problems in both the process and the result, and it is also difficult to believe that judgments based on the analysis results. Although the need of highly reliable data has also increased, research on the quality of data and its results have been insufficient. The purpose of this study is to shorten the work time to automizing the domain classification work which was performed from manually to using machine learning in the domain - based quality diagnosis, which is a key element of diagnostic evaluation for improving data quality. Extracts information about the characteristics of the data that is stored in the database and identifies the domain, and then featurize it, and automizes the domain classification using machine learning. We will use it for big data quality diagnosis and contribute to quality improvement.

Quality Diagnosis of Library-Related Open Government Data: Focused on Book Details API of Data for Library (도서관 공공데이터의 품질에 관한 연구: 도서관 정보나루의 도서 상세 조회 API를 중심으로)

  • Yang, Suwan
    • Journal of the Korean Society for information Management
    • /
    • v.37 no.4
    • /
    • pp.181-206
    • /
    • 2020
  • With the popularization of open government data, Library-related open government data is also open and utilized to the public. The purpose of this paper is to diagnose the quality of library-related open government data and propose improvement measures to enhance the quality based on the diagnosis result. As a result of diagnosing the completeness of the data, a number of blanks are identified in the bibliographic elements essential for identifying and searching a book. As a result of diagnosing the accuracy of the data, the bibliographic elements that are not compliant with the data schema have been identified. Based on the result of data quality diagnosis, this study suggested improving the data collection procedure, establishing data set schema, providing details on data collection and data processing, and publishing raw data.

Quality Evaluation of Chest X-ray Open Dataset through Pixel Value Analysis by Region (영역별 화소값 분석을 통한 흉부 X선 오픈 데이터셋 품질 평가)

  • Choi, Hyeon-Jin;Bea, Su-Bin;Sun, Joo-Sung;Lee, Jung-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.614-617
    • /
    • 2022
  • 인공지능의 발전으로 의료영상 분야에서 딥러닝 기반 질병 진단 연구가 활발하다. 그러나 모델 개발 시 학습 데이터의 개수와 품질은 매우 중요한데, 의료 분야 특성상 접근 가능한 데이터셋이 적으며 오픈 데이터셋은 서로 다른 기관에서 배포되거나 웹상에서 수집된 것으로 진단에 적합한 품질을 기대하기 어렵다. 또한, 기존 연구는 데이터셋이 학습에 적합한지에 대한 품질검증 없이 사용한다. 따라서 본 논문에서는 임상에서 사용하는 화질 평가 요소에 근거를 두고 영역별 화소값 분석을 통한 흉부 X선 영상 품질 평가 기법을 제안한다. 오픈 데이터셋 JSRT, Chest14와 국내 A 병원 데이터셋 AUH에 제안한 기법을 적용한 결과 민감도 91.5%, 특이도 96.1%의 우수한 성능을 확인하였다.

항로표지 수집정보의 관리 기술 개발

  • 양진홍;한준희;장준혁;오세웅;한윤석;이예경;정제한;김민규;김호준;정성훈;신상문
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.11a
    • /
    • pp.221-223
    • /
    • 2022
  • 항로 표지 수집정보의 관리에 있어서 데이터의 품질을 높이고 진단하는 것은 중요하다. 본 연구에서는 디지털 항로표지에서 수집되는 정보의 품질을 향상시키기고 진단하기 위해 다양한 데이터 알고리즘을 비교 분석하였으며, 공정능력지수를 이용하여 데이터 품질진단지수를 개발하였다.

  • PDF

항로표지 데이터 품질지수 산출에 관한 연구

  • 정제한;한윤석;이예경;다이리;탕멍위엔;장준혁;신상문
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.100-102
    • /
    • 2022
  • 데이터의 품질을 파악하고 그 기준을 선정하는 것은 해양 항로 표지와 같은 분석에 있어서 중요한 역할을 한다. 본 연구에서는 해양 분야에서 디지털 항로표지 데이터의 품질 진단을 위해 공정능력지수를 이용하여 데이터의 품질을 정량적으로 산출하고 그 결과에 대한 판정 기준을 명확히 하여 데이터에 대한 품질을 판단할 수 있는 척도를 제시하였다.

  • PDF