• Title/Summary/Keyword: administration information dataset

Search Result 84, Processing Time 0.028 seconds

Default Prediction for Real Estate Companies with Imbalanced Dataset

  • Dong, Yuan-Xiang;Xiao, Zhi;Xiao, Xue
    • Journal of Information Processing Systems
    • /
    • v.10 no.2
    • /
    • pp.314-333
    • /
    • 2014
  • When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the two-class imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using prediction-oriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.

Improvement of Administration Information Dataset Transfer Tools 'SIARD_KR' (행정정보 데이터세트 이관도구 SIARD_KR의 개선방안)

  • Byeon, Woo-Yeong;Yim, Jin-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.195-217
    • /
    • 2022
  • SIARD_KR is an administrative information dataset preservation tool. It is a partially modified version of SIARD, technology used for long-term preservation of relational databases developed by the Swiss Federal Archives, to suit Korea's situation better. Previous studies have focused on how SIARD is able to effectively extract all data contained in the relational database without loss. However, not all data contained in the database is meaningful information, that is, an administrative information dataset. This paper began, therefore, with the awareness of the problem of whether SIARD_KR reflects the characteristics of the administrative information dataset. SIARD_KR is not only a tool for extracting data stored in the DB. We want to see if it is capable of identifying and extracting only meaningful information, and maintaining meaningful information, even if it is separated from the original system. The purpose of this paper is to analyze the structure of SIARD_KR, identify expected problems, and suggest improvement measures for them.

A Study on Managing Dataset in the Administration Information System of Closed Private Universities (폐교 사립대학 행정정보 데이터세트의 기록관리 방안 연구)

  • Lee, Jae-Young;Chung, Yeon-Kyoung
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.21 no.1
    • /
    • pp.75-95
    • /
    • 2021
  • In this study, we focused on creating plans to manage the administrative information dataset of public records in closed universities. In particular, according to various reference materials and internal materials of the institution, we studied the theoretical discussion about the dataset and figured out the management status of the closed university's dataset. Therefore, as a measure for the data management of the Comprehensive Information Management System, recording targets are selected, retention periods are determined, administrative information dataset management standards are prepared, administrative information dataset evaluation and deletion are implemented, and comprehensive management systems of closed universities are established.

AraProdMatch: A Machine Learning Approach for Product Matching in E-Commerce

  • Alabdullatif, Aisha;Aloud, Monira
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.4
    • /
    • pp.214-222
    • /
    • 2021
  • Recently, the growth of e-commerce in Saudi Arabia has been exponential, bringing new remarkable challenges. A naive approach for product matching and categorization is needed to help consumers choose the right store to purchase a product. This paper presents a machine learning approach for product matching that combines deep learning techniques with standard artificial neural networks (ANNs). Existing methods focused on product matching, whereas our model compares products based on unstructured descriptions. We evaluated our electronics dataset model from three business-to-consumer (B2C) online stores by putting the match products collectively in one dataset. The performance evaluation based on k-mean classifier prediction from three real-world online stores demonstrates that the proposed algorithm outperforms the benchmarked approach by 80% on average F1-measure.

STAR-24K: A Public Dataset for Space Common Target Detection

  • Zhang, Chaoyan;Guo, Baolong;Liao, Nannan;Zhong, Qiuyun;Liu, Hengyan;Li, Cheng;Gong, Jianglei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.365-380
    • /
    • 2022
  • The target detection algorithm based on supervised learning is the current mainstream algorithm for target detection. A high-quality dataset is the prerequisite for the target detection algorithm to obtain good detection performance. The larger the number and quality of the dataset, the stronger the generalization ability of the model, that is, the dataset determines the upper limit of the model learning. The convolutional neural network optimizes the network parameters in a strong supervision method. The error is calculated by comparing the predicted frame with the manually labeled real frame, and then the error is passed into the network for continuous optimization. Strongly supervised learning mainly relies on a large number of images as models for continuous learning, so the number and quality of images directly affect the results of learning. This paper proposes a dataset STAR-24K (meaning a dataset for Space TArget Recognition with more than 24,000 images) for detecting common targets in space. Since there is currently no publicly available dataset for space target detection, we extracted some pictures from a series of channels such as pictures and videos released by the official websites of NASA (National Aeronautics and Space Administration) and ESA (The European Space Agency) and expanded them to 24,451 pictures. We evaluate popular object detection algorithms to build a benchmark. Our STAR-24K dataset is publicly available at https://github.com/Zzz-zcy/STAR-24K.

Data mining approach to predicting user's past location

  • Lee, Eun Min;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.97-104
    • /
    • 2017
  • Location prediction has been successfully utilized to provide high quality of location-based services to customers in many applications. In its usual form, the conventional type of location prediction is to predict future locations based on user's past movement history. However, as location prediction needs are expanded into much complicated cases, it becomes necessary quite frequently to make inference on the locations that target user visited in the past. Typical cases include the identification of locations that infectious disease carriers may have visited before, and crime suspects may have dropped by on a certain day at a specific time-band. Therefore, primary goal of this study is to predict locations that users visited in the past. Information used for this purpose include user's demographic information and movement histories. Data mining classifiers such as Bayesian network, neural network, support vector machine, decision tree were adopted to analyze 6868 contextual dataset and compare classifiers' performance. Results show that general Bayesian network is the most robust classifier.

Construction of a Spatio-Temporal Dataset for Deep Learning-Based Precipitation Nowcasting

  • Kim, Wonsu;Jang, Dongmin;Park, Sung Won;Yang, MyungSeok
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.spc
    • /
    • pp.135-142
    • /
    • 2022
  • Recently, with the development of data processing technology and the increase of computational power, methods to solving social problems using Artificial Intelligence (AI) are in the spotlight, and AI technologies are replacing and supplementing existing traditional methods in various fields. Meanwhile in Korea, heavy rain is one of the representative factors of natural disasters that cause enormous economic damage and casualties every year. Accurate prediction of heavy rainfall over the Korean peninsula is very difficult due to its geographical features, located between the Eurasian continent and the Pacific Ocean at mid-latitude, and the influence of the summer monsoon. In order to deal with such problems, the Korea Meteorological Administration operates various state-of-the-art observation equipment and a newly developed global atmospheric model system. Nevertheless, for precipitation nowcasting, the use of a separate system based on the extrapolation method is required due to the intrinsic characteristics associated with the operation of numerical weather prediction models. The predictability of existing precipitation nowcasting is reliable in the early stage of forecasting but decreases sharply as forecast lead time increases. At this point, AI technologies to deal with spatio-temporal features of data are expected to greatly contribute to overcoming the limitations of existing precipitation nowcasting systems. Thus, in this project the dataset required to develop, train, and verify deep learning-based precipitation nowcasting models has been constructed in a regularized form. The dataset not only provides various variables obtained from multiple sources, but also coincides with each other in spatio-temporal specifications.

Video Retrieval Algorithm for Building a Dataset for Highlight Video Generation (하이라이트 비디오 생성을 위한 데이터셋 구축을 위한 비디오 탐색 알고리즘)

  • Gi-Yeon Song;Jaehwan Lee
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.517-518
    • /
    • 2024
  • 본 연구에서는 특정 비디오에서 추출된 비디오 클립이 어떤 비디오에서 추출된 것인지 탐색하는 알고리즘을 제안한다. 국내 이스포츠 리그 중 하나인 LCK의 경기 영상과 하이라이트 영상을 수집하여 알고리즘의 성능을 테스트하였다. 본 연구에서 제안한 알고리즘은 하이라이트 비디오 추출 모델개발에 필요한 비디오-하이라이트 클립 데이터셋을 구축하는 데 도움이 될 것이라 기대한다.

Implementation of a Web-Based Early Warning System for Meteorological Hazards (기상위험 조기경보를 위한 웹기반 표출시스템 구현)

  • Kong, In Hak;Kim, Hong Joong;Oh, Jai Ho;Lee, Yang Won
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.4
    • /
    • pp.21-28
    • /
    • 2016
  • Numeric weather prediction is important to prevent meteorological disasters such as heavy rain, heat wave, and cold wave. The Korea meteorological administration provides a realtime special weather report and the rural development administration demonstrates information about 2-day warning of agricultural disasters for farms in a few regions. To improve the early warning systems for meteorological hazards, a nation-wide high-resolution dataset for weather prediction should be combined with web-based GIS. This study aims to develop a web service prototype for early warning of meteorological hazards, which integrates web GIS technologies with a weather prediction database in a temporal resolution of 1 hour and a spatial resolution of 1 km. The spatially and temporally high-resolution dataset for meteorological hazards produced by downscaling of GME was serviced via a web GIS. In addition to the information about current status of meteorological hazards, the proposed system provides the hourly dong-level forecasting of meteorologic hazards for upcoming seven days, such as heavy rain, heat wave, and cold wave. This system can be utilized as an operational information service for municipal governments in Korea by achieving the future work to improve the accuracy of numeric weather predictions and the preprocessing time for raster and vector dataset.

A Case-Based Reasoning Method Improving Real-Time Computational Performances: Application to Diagnose for Heart Disease (대용량 데이터를 위한 사례기반 추론기법의 실시간 처리속도 개선방안에 대한 연구: 심장병 예측을 중심으로)

  • Park, Yoon-Joo
    • Information Systems Review
    • /
    • v.16 no.1
    • /
    • pp.37-50
    • /
    • 2014
  • Conventional case-based reasoning (CBR) does not perform efficiently for high volume dataset because of case-retrieval time. In order to overcome this problem, some previous researches suggest clustering a case-base into several small groups, and retrieve neighbors within a corresponding group to a target case. However, this approach generally produces less accurate predictive performances than the conventional CBR. This paper suggests a new hybrid case-based reasoning method which dynamically composing a searching pool for each target case. This method is applied to diagnose for the heart disease dataset. The results show that the suggested hybrid method produces statistically the same level of predictive performances with using significantly less computational cost than the CBR method and also outperforms the basic clustering-CBR (C-CBR) method.