• 제목/요약/키워드: incomplete data

Search Result 721, Processing Time 0.035 seconds

Design of the Integrated Incomplete Information Processing System based on Rough Set

  • Jeong, Gu-Beom;Chung, Hwan-Mook;Kim, Guk-Boh;Park, Kyung-Ok
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.5
    • /
    • pp.441-447
    • /
    • 2001
  • In general, Rough Set theory is used for classification, inference, and decision analysis of incomplete data by using approximation space concepts in information system. Information system can include quantitative attribute values which have interval characteristics, or incomplete data such as multiple or unknown(missing) data. These incomplete data cause tole inconsistency in information system and decrease the classification ability in system using Rough Sets. In this paper, we present various types of incomplete data which may occur in information system and propose INcomplete information Processing System(INiPS) which converts incomplete information system into complete information system in using Rough Sets.

  • PDF

A Study on the Incomplete Information Processing System(INiPS) Using Rough Set

  • Jeong, Gu-Beom;Chung, Hwan-Mook;Kim, Guk-Boh;Park, Kyung-Ok
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.243-251
    • /
    • 2000
  • In general, Rough Set theory is used for classification, inference, and decision analysis of incomplete data by using approximation space concepts in information system. Information system can include quantitative attribute values which have interval characteristics, or incomplete data such as multiple or unknown(missing) data. These incomplete data cause the inconsistency in information system and decrease the classification ability in system using Rough Sets. In this paper, we present various types of incomplete data which may occur in information system and propose INcomplete information Processing System(INiPS) which converts incomplete information system into complete information system in using Rough Sets.

  • PDF

An Efficient Processing Method of Top-k(g) Skyline Group Queries for Incomplete Data (불완전 데이터를 위한 효율적 Top-k(g) 스카이라인 그룹 질의 처리 기법)

  • Park, Mi-Ra;Min, Jun-Ki
    • The KIPS Transactions:PartD
    • /
    • v.17D no.1
    • /
    • pp.17-24
    • /
    • 2010
  • Recently, there has been growing interest in skyline queries. Most of works for skyline queries assume that the data do not have null value. However, when we input data through the Web or with other different tools, there exist incomplete data with null values. As a result, several skyline processing techniques for incomplete data have been proposed. However, available skyline query techniques for incomplete data do not consider the environments that coexist complete data and incomplete data since these techniques deal with the incomplete data only. In this paper, we propose a novel skyline group processing technique which evaluates skyline queries for the environments that coexist complete data and incomplete data. To do this, we introduce the top-k(g) skyline group query which searches g skyline groups with respect to the user's dimensional preference. In our experimental study, we show efficiency of our proposed technique.

Fuzzy Classification Method for Processing Incomplete Dataset

  • Woo, Young-Woon;Lee, Kwang-Eui;Han, Soo-Whan
    • Journal of information and communication convergence engineering
    • /
    • v.8 no.4
    • /
    • pp.383-386
    • /
    • 2010
  • Pattern classification is one of the most important topics for machine learning research fields. However incomplete data appear frequently in real world problems and also show low learning rate in classification models. There have been many researches for handling such incomplete data, but most of the researches are focusing on training stages. In this paper, we proposed two classification methods for incomplete data using triangular shaped fuzzy membership functions. In the proposed methods, missing data in incomplete feature vectors are inferred, learned and applied to the proposed classifier using triangular shaped fuzzy membership functions. In the experiment, we verified that the proposed methods show higher classification rate than a conventional method.

Estimation of seismicity parameters of the seismic zones of the Korean Peninsula using incomplete and complete data files (불완전한 자료 및 완전한 자료 목록을 이용한 한반도 지진구들의 지진활동 매개변수 평가)

  • 이기화
    • Proceedings of the Earthquake Engineering Society of Korea Conference
    • /
    • 1998.04a
    • /
    • pp.23-30
    • /
    • 1998
  • An estimation of seismic risk parameters by seismic zones of the Korea Peninsula in order to calculate the seismic hazard values using these was erformed. Seven seismic source zones were selected in consideration of seismicity and geology of Korean Peninsula. The seismicity parameters that should be estimated are maximum intensity, activity rate and b value in the Gutenberg - Richter relation. For computation of these parameters, least square method or maximum likelihood method is applied to the earthquake data in two ways; the one for the data without maximum intensity and the other with maximum intensity. Earthquake data since Choseon Dynasty is regarded as complete and estimation of parameters was made for these data using above two ways. And recently, a new method is published that estimate the seismicity parameters using mixed data containing large historical events and recent complete observations. Therefore, this method is applied to the whole earthquake data of the Korean Peninsula. It turns out that the b value computed considering maximum intensity is slightly lower than that computed considering without maximum intensity, and it becomes still lower when the incomplete data prior to Choseon Dynasty is used. In the case of the activity rates, the values obtained without maximum intensity and that with maximum intensity are similar, though they are lower when the incomplete data is used. The values of maximum intensities are usually lower when considering incomplete data. In the seismic source zone including the Yangsan Fault zone, however, the values are higher when considering the incomplete data.

  • PDF

Metropolis-Hastings Expectation Maximization Algorithm for Incomplete Data (불완전 자료에 대한 Metropolis-Hastings Expectation Maximization 알고리즘 연구)

  • Cheon, Soo-Young;Lee, Hee-Chan
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.183-196
    • /
    • 2012
  • The inference for incomplete data such as missing data, truncated distribution and censored data is a phenomenon that occurs frequently in statistics. To solve this problem, Expectation Maximization(EM), Monte Carlo Expectation Maximization(MCEM) and Stochastic Expectation Maximization(SEM) algorithm have been used for a long time; however, they generally assume known distributions. In this paper, we propose the Metropolis-Hastings Expectation Maximization(MHEM) algorithm for unknown distributions. The performance of our proposed algorithm has been investigated on simulated and real dataset, KOSPI 200.

Customer Classification Method for Household Appliances Industries with a Large Number of Incomplete Data (다수의 결측치가 존재하는 가전업 고객 데이터 활용을 위한 고객분류기법의 개발)

  • Chang, Young-Soon;Seo, Jong-Hyen
    • IE interfaces
    • /
    • v.19 no.1
    • /
    • pp.86-96
    • /
    • 2006
  • Some customer data of manufacturing industries have a large number of incomplete data set due to the customer's infrequent purchasing behavior and the limitation of customer profile data gathered from sales representatives. So that, most sophisticated data analysis methods may not be applied directly. This paper proposes a heuristic data analysis method to classify customers in household appliances industries. The proposed PD (percent of difference) method can be used for the discriminant analysis of incomplete customer data with simple mathematical calculations. The method is composed of variable distribution estimation step, PD measure and cluster score evaluation steps, variable impact construction step, and segment assignment step. A real example is also presented.

Predicting Personal Credit Rating with Incomplete Data Sets Using Frequency Matrix technique (Frequency Matrix 기법을 이용한 결측치 자료로부터의 개인신용예측)

  • Bae, Jae-Kwon;Kim, Jin-Hwa;Hwang, Kook-Jae
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.4
    • /
    • pp.273-290
    • /
    • 2006
  • This study suggests a frequency matrix technique to predict personal credit rate more efficiently using incomplete data sets. At first this study test on multiple discriminant analysis and logistic regression analysis for predicting personal credit rate with incomplete data sets. Missing values are predicted with mean imputation method and regression imputation method here. An artificial neural network and frequency matrix technique are also tested on their performance in predicting personal credit rating. A data set of 8,234 customers in 2004 on personal credit information of Bank A are collected for the test. The performance of frequency matrix technique is compared with that of other methods. The results from the experiments show that the performance of frequency matrix technique is superior to that of all other models such as MDA-mean, Logit-mean, MDA-regression, Logit-regression, and artificial neural networks.

  • PDF

Reject Inference of Incomplete Data Using a Normal Mixture Model

  • Song, Ju-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.425-433
    • /
    • 2011
  • Reject inference in credit scoring is a statistical approach to adjust for nonrandom sample bias due to rejected applicants. Function estimation approaches are based on the assumption that rejected applicants are not necessary to be included in the estimation, when the missing data mechanism is missing at random. On the other hand, the density estimation approach by using mixture models indicates that reject inference should include rejected applicants in the model. When mixture models are chosen for reject inference, it is often assumed that data follow a normal distribution. If data include missing values, an application of the normal mixture model to fully observed cases may cause another sample bias due to missing values. We extend reject inference by a multivariate normal mixture model to handle incomplete characteristic variables. A simulation study shows that inclusion of incomplete characteristic variables outperforms the function estimation approaches.

Discrete HMM Training Algorithm for Incomplete Time Series Data (불완전 시계열 데이터를 위한 이산 HMM 학습 알고리듬)

  • Sin, Bong-Kee
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.1
    • /
    • pp.22-29
    • /
    • 2016
  • Hidden Markov Model is one of the most successful and popular tools for modeling real world sequential data. Real world signals come in a variety of shapes and variabilities, among which temporal and spectral ones are the prime targets that the HMM aims at. A new problem that is gaining increasing attention is characterizing missing observations in incomplete data sequences. They are incomplete in that there are holes or omitted measurements. The standard HMM algorithms have been developed for complete data with a measurements at each regular point in time. This paper presents a modified algorithm for a discrete HMM that allows substantial amount of omissions in the input sequence. Basically it is a variant of Baum-Welch which explicitly considers the case of isolated or a number of omissions in succession. The algorithm has been tested on online handwriting samples expressed in direction codes. An extensive set of experiments show that the HMM so modeled are highly flexible showing a consistent and robust performance regardless of the amount of omissions.