• Title/Summary/Keyword: Actual data

Search Result 7,416, Processing Time 0.036 seconds

Forecasting Open Government Data Demand Using Keyword Network Analysis (키워드 네트워크 분석을 이용한 공공데이터 수요 예측)

  • Lee, Jae-won
    • Informatization Policy
    • /
    • v.27 no.4
    • /
    • pp.24-46
    • /
    • 2020
  • This study proposes a way to timely forecast open government data (OGD) demand(i.e., OGD requests, search queries, etc.) by using keyword network analysis. According to the analysis results, most of the OGD belonging to the high-demand topics are provided by the domestic OGD portal(data.go.kr), while the OGD related to users' actual needs predicted through topic association analysis are rarely provided. This is because, when providing(or selecting) OGD, relevance to OGD topics takes precedence over relevance to users' OGD requests. The proposed keyword network analysis framework is expected to contribute to the establishment of OGD policies for public institutions in the future as it can quickly and easily forecast users' demand based on actual OGD requests.

A Study on Efficient AI Model Drift Detection Methods for MLOps (MLOps를 위한 효율적인 AI 모델 드리프트 탐지방안 연구)

  • Ye-eun Lee;Tae-jin Lee
    • Journal of Internet Computing and Services
    • /
    • v.24 no.5
    • /
    • pp.17-27
    • /
    • 2023
  • Today, as AI (Artificial Intelligence) technology develops and its practicality increases, it is widely used in various application fields in real life. At this time, the AI model is basically learned based on various statistical properties of the learning data and then distributed to the system, but unexpected changes in the data in a rapidly changing data situation cause a decrease in the model's performance. In particular, as it becomes important to find drift signals of deployed models in order to respond to new and unknown attacks that are constantly created in the security field, the need for lifecycle management of the entire model is gradually emerging. In general, it can be detected through performance changes in the model's accuracy and error rate (loss), but there are limitations in the usage environment in that an actual label for the model prediction result is required, and the detection of the point where the actual drift occurs is uncertain. there is. This is because the model's error rate is greatly influenced by various external environmental factors, model selection and parameter settings, and new input data, so it is necessary to precisely determine when actual drift in the data occurs based only on the corresponding value. There are limits to this. Therefore, this paper proposes a method to detect when actual drift occurs through an Anomaly analysis technique based on XAI (eXplainable Artificial Intelligence). As a result of testing a classification model that detects DGA (Domain Generation Algorithm), anomaly scores were extracted through the SHAP(Shapley Additive exPlanations) Value of the data after distribution, and as a result, it was confirmed that efficient drift point detection was possible.

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal (기계학습에 유효한 데이터 요건 및 선별: 공공데이터포털 제공 데이터 사례를 통해)

  • Oh, Hyo-Jung;Yun, Bo-Hyun
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.1
    • /
    • pp.37-43
    • /
    • 2022
  • The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.

Indoor Localization for Mobile Robot using Extended Kalman Filter (확장 칼만 필터를 이용한 로봇의 실내위치측정)

  • Kim, Jung-Min;Kim, Youn-Tae;Kim, Sung-Shin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.5
    • /
    • pp.706-711
    • /
    • 2008
  • This paper is presented an accurate localization scheme for mobile robots based on the fusion of ultrasonic satellite (U-SAT) with inertial navigation system (INS), i.e., sensor fusion. Our aim is to achieve enough accuracy less than 100 mm. The INS consist of a yaw gyro, two wheel-encoders. And the U-SAT consist of four transmitters, a receiver. Besides the localization method in this paper fuse these in an extended Kalman filter. The performance of the localization is verified by simulation and two actual data(straight, curve) gathered from about 0.5 m/s of driving actual driving data. localization methods used are general sensor fusion and sensor fusion through Kalman filter using data from INS. Through the simulation and actual data studies, the experiment show the effectiveness of the proposed method for autonomous mobile robots.

Analysis and Visualization of Real Estate Market Price using Elasticsearch (Elasticsearch를 이용한 부동산 시장 가격 분석 및 시각화)

  • Seung-Yeon Hwang;Jeong-Joon Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.2
    • /
    • pp.185-190
    • /
    • 2024
  • In 2022, we can see the real estate market in Korea going down. Corona 19 and the Russian invasion of Ukraine are cited as the biggest causes for this. These two problems ignited the economic recession, causing prices to fall and subsequently raising exchange rates and interest rates. Due to the aforementioned problems in the previously active real estate market, the number of actual transactions has decreased, resulting in a decline in the real estate market due to high interest rates. Data provided by the public data portal, KOSIS, and the Seoul Metropolitan Government were collected through Logstash, transferred to Elasticsearch, and visualized inflation, exchange rates, and loan interest rates using the dashboard function provided by Kibana, to analyze causes and derive results. In addition, three specific apartments in Nowon-gu and Jongno-gu, which have the highest number of actual transactions in Seoul, are selected and the actual transaction prices that change every month are displayed in the Data Table.

Study on Developing Program for Efficient Landscape Woody Plants Management - Mainly Focused on the Development of a Tree Inventory System - (조경수목의 효율적 관리를 위한 프로그램 개발에 관한 연구 - 관리대장(Tree Inventory) 개발을 중심으로 -)

  • 조영환;곽행구
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.24 no.4
    • /
    • pp.1-22
    • /
    • 1997
  • This paper was focused on the efficient management of landscape woody plants, and concerned itself with their important role in the urban environment. Based on the philosophy that there is nothing that can be done without an inventory, the purpose of this study was to develop an inventory system and iris proper application to a site for establishing a management plan Two different approaches were used, The first was to make a newly structured inventory system through collecting, analyzing, and evaluating various types of inventories used in Korea, the U. S. A., and Japan. The second approach was to apply a newly designed inventory system to the case study area. using GIS 'as a tool of spacial analysis and statistics for making decisions. The results could be summarized as follows; 1. In Korea, most of the Landscape Woozy Plants Inventories had datas which represented possession of trees, and only the work which they had done according to their traditional ways, There was no data related to the conditions, management needs, and site conditions of individual trees, This is essential information for organizing an inventory system . 2. There needs to be data which is balanced, containing tree characteristics and site characteristics. Through such information the management needs could be adjusted properly. The inventory list described in this paper was determined by botanical identity, placement condition, condition of tree, and types of work for maintaining as well as improving the condition of each tree One of the most important things was to determine the location data of each tree so as to compare data with other trees. The data gained from the field survey still had some problems because of lack of scientific method for supporting objective views, and because of actual situations, especially in the field of evaluating site conditions and management needs. All data should be revised to fit a computer data management system , if possible 3. The GIS(Geographic Information System) application showed good performance in handling inventory data for decision making. All the data used for the GIS application was divided into location and non-spatial data. Using the location data, it was easy to find the exact location of each tree on the monitor and on the maps generated by the computer even in the actual managed trite, along with various attribute data. Therefore it could be said that the entire management plan should start from data of individual trees with their exact locations, for making concrete management goals through actual budget planning.

  • PDF

More Efficient k-Modes Clustering Algorithm

  • Kim, Dae-Won;Chae, Yi-Geun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.549-556
    • /
    • 2005
  • A hard-type centroids in the conventional clustering algorithm such as k-modes algorithm cannot keep the uncertainty inherently in data sets as long as possible before actual clustering(decision) are made. Therefore, we propose the k-populations algorithm to extend clustering ability and to heed the data characteristics. This k-population algorithm as found to give markedly better clustering results through various experiments.

  • PDF

A Study on the Principal Component Analysis of Anthropometric Data (인체계측치(人體計測値)의 주성분분석(主成分分析)에 관한 연구(硏究))

  • Lee, Sang-Do;Jeong, Jung-Hui;Kim, Geuk-Bae
    • Journal of the Ergonomics Society of Korea
    • /
    • v.2 no.1
    • /
    • pp.3-11
    • /
    • 1983
  • Anthropometric data is most basic materials in the all studies related with it. Therefore, in anthropometric data, not only consideration of the state of variance, but more various analysis is needed. This study selected the 13 parts that properly show a whole characteristics of human body and, anthropometric data were obtained through the actual measurements for male and female workers who were engaged in production factory. And, to interpret anthropometric data, principal component analysis of multivariate analysis methods was applied.

  • PDF

Parameter Estimation and Prediction for NHPP Software Reliability Model and Time Series Regression in Software Failure Data

  • Song, Kwang-Yoon;Chang, In-Hong
    • Journal of Integrative Natural Science
    • /
    • v.7 no.1
    • /
    • pp.67-73
    • /
    • 2014
  • We consider the mean value function for NHPP software reliability model and time series regression model in software failure data. We estimate parameters for the proposed models from two data sets. The values of SSE and MSE is presented from two data sets. We compare the predicted number of faults with the actual two data sets using the mean value function and regression curve.