Browse > Article
http://dx.doi.org/10.20465/KIOTS.2022.8.1.037

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal  

Oh, Hyo-Jung (Dept. of Library & Information Science, Jeonbuk National University)
Yun, Bo-Hyun (Div. of Software Liberal Arts, Mokwon University)
Publication Information
Journal of Internet of Things and Convergence / v.8, no.1, 2022 , pp. 37-43 More about this Journal
Abstract
The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.
Keywords
Valid Data; Machine Learning; Data Discrimination; Quality of Data; Public Big data;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 S.H.Yoon, J.H.Na, and H.-J.Oh, "Data Opening Status Analysis and Quality Management Strategies in Land, Infrastructure and Transport Domain,", Journal of Digital Culture Archives, Vol.3, No.2, pp.73-85, 2020
2 IDC. IDC Forecasts Improved Growth for global AI Market in 2021 [Internet], https://www.idc.com/getdoc.jsp?containerId=prUS47482321
3 T.J.Kim, Data Dam', What Kind of Businesses Are They Made Up [Internet], https://zdnet.co.kr/view/?no=20200902101741
4 K.V.Cruz, "Moon Jae-In's Strategy Amid Covid-19 Pandemic: Reviving the Green in the Korean New Deal." in Collection of Essays on Korea's Public Diplomacy, 2020
5 D.Fang and L.Deng, "Legal Regulation of Government Data Opening: American Legislation and China's Path: Reflection Based on the US the Open, Public, Electronic, and Necessary (OPEN) Government Data Act," Information and Documentation Services Vol.42, No.5, pp.50-57, 2021
6 J.H.Na, S.H.Yoon, and H.-J.Oh, "Black Ice Formation Prediction Model Based on Public Data in Land, Infrastructure and Transport Domain," KIPS Transactions on Software and Data Engineering, Vol.10, No.7, pp.257-262. 2021   DOI
7 G.Viscusi, B.Spahiu, A.Maurino, and C.Batini, "Compliance with open government data policies: An empirical assessment of Italian local public administrations." Information polity Vol.19, No.3, pp.263-275, 2014.   DOI
8 S.O.Yun and J.W.Hyun, "An Analysis of Open Data Policy in Korea: Focused on National Core Data in Open Data Portal," Korean Public Management Review, Vol.33, No.1, pp.219-247, 2019   DOI
9 D.J.Kim, "Spatial Big Data Plan for Government 3.0 and Creative Economy", Korea Research Institute For Human Settlements, No.14, pp.40-47, 2014
10 H.W.Lee, "Intrusion Artifact Acquisition Method based on IoT Botnet Malware," Journal of KIOTS, Vol.7, No.3, pp.1-8, 2021
11 S.S.Yu, K.P.Choi, H.Myung, and H.-J.Oh, "Prediction Model of Pest According to Individual Farms Based on Heterogeneous Public Big data." Journal of KIIT. Vol.18, No.6, pp.1-9, 2020
12 W.S.Lim and S.J.Jung, Open Data, Small Amount. Useless Files [Internet], https://www.donga.com/news/article/all/20160517/78152584/1
13 K.P.Choi, S.S.Yu, N.H.Yoo, and H.-J.Oh, "Pest Prediction and Prevention Model Visualization using Farm Map for Ecological Smart Farm," Journal of KIIT. Vol.19, No.2, pp.105-113, 2021
14 H.W.Lee and H.S.Lee, "Optimal Machine Learning Model for Detecting Normal and Malicious Android Apps," Journal of KIOTS, Vol.6, No.2, pp.1-10, 2020
15 Gartner Reserach. Measuring the Business Value of Data Quality [Internet], https://www.gartner.com/en/documents/1819214/measuring-the-business-value-of-data-quality