• Title/Summary/Keyword: 도멘

Search Result 3, Processing Time 0.018 seconds

해외동향

  • Korea Electrical Manufacturers Association
    • NEWSLETTER 전기공업
    • /
    • no.98-11 s.204
    • /
    • pp.12-33
    • /
    • 1998
  • PDF

Application Method of Regular Expressions and Suffixes to improve the Accuracy of Automatic Domain Identification of Public Data (공공데이터의 도메인 자동 판별 정확도 향상을 위한 정규표현식 및 접미사 적용 방법)

  • Kim, Seok-Kyoun;Lee, Kwanwoo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.4
    • /
    • pp.81-86
    • /
    • 2022
  • In this work, we propose a method for automatically determining the domain of columns of file data structured by csv format. New data can be generated through convergence between data and data, and the consistency of the joined columns must be maintained in order for these new data to become an important resource. One of the methods for measuring data quality is a domain-based quality diagnosis method. Domain is the broadest indicator that defines the nature of each column, so a method of automatically determining it is necessary. Although previous studies mainly studied domain automatic discrimination of relational databases, this study developed a model that can automate domains using the characteristics of file data. In order to specialize in the domain discrimination of file data, the data were simplified and patterned using a regular expression, and the contents of the data header corresponding to the column name were analyzed, and the suffix used was used as a derived variable. When derivatives of regular expressions and suffixes were added, the result of automatically determining the domain with an accuracy of 95% greater than the existing method of 87% was derived. This study is expected to reduce the quality measurement period and number of people by presenting an automation methodology to the quality diagnosis of public data.