• Title/Summary/Keyword: 결측값 대치법

Search Result 3, Processing Time 0.019 seconds

A Missing Data Imputation by Combining K Nearest Neighbor with Maximum Likelihood Estimation for Numerical Software Project Data (K-NN과 최대 우도 추정법을 결합한 소프트웨어 프로젝트 수치 데이터용 결측값 대치법)

  • Lee, Dong-Ho;Yoon, Kyung-A;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.273-282
    • /
    • 2009
  • Missing data is one of the common problems in building analysis or prediction models using software project data. Missing imputation methods are known to be more effective missing data handling method than deleting methods in small software project data. While K nearest neighbor imputation is a proper missing imputation method in the software project data, it cannot use non-missing information of incomplete project instances. In this paper, we propose an approach to missing data imputation for numerical software project data by combining K nearest neighbor and maximum likelihood estimation; we also extend the average absolute error measure by normalization for accurate evaluation. Our approach overcomes the limitation of K nearest neighbor imputation and outperforms on our real data sets.

On the Use of Sequential Adaptive Nearest Neighbors for Missing Value Imputation (순차 적응 최근접 이웃을 활용한 결측값 대치법)

  • Park, So-Hyun;Bang, Sung-Wan;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1249-1257
    • /
    • 2011
  • In this paper, we propose a Sequential Adaptive Nearest Neighbor(SANN) imputation method that combines the Adaptive Nearest Neighbor(ANN) method and the Sequential k-Nearest Neighbor(SKNN) method. When choosing the nearest neighbors of missing observations, the proposed SANN method takes the local feature of the missing observations into account as well as reutilizes the imputed observations in a sequential manner. By using a Monte Carlo study and a real data example, we demonstrate the characteristics of the SANN method and its potential performance.

Missing values imputation for time course gene expression data using the pattern consistency index adaptive nearest neighbors (시간경로 유전자 발현자료에서 패턴일치지수와 적응 최근접 이웃을 활용한 결측값 대치법)

  • Shin, Heyseo;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.3
    • /
    • pp.269-280
    • /
    • 2020
  • Time course gene expression data is a large amount of data observed over time in microarray experiments. This data can also simultaneously identify the level of gene expression. However, the experiment process is complex, resulting in frequent missing values due to various causes. In this paper, we propose a pattern consistency index adaptive nearest neighbors as a method of missing value imputation. This method combines the adaptive nearest neighbors (ANN) method that reflects local characteristics and the pattern consistency index that considers consistent degree for gene expression between observations over time points. We conducted a Monte Carlo simulation study to evaluate the usefulness of proposed the pattern consistency index adaptive nearest neighbors (PANN) method for two yeast time course data.