Search | Korea Science

Lee, Dong-Ho;Yoon, Kyung-A;Bae, Doo-Hwan
- Journal of KIISE:Software and Applications
- /
- v.36 no.4
- /
- pp.273-282
- /
- 2009
Missing data is one of the common problems in building analysis or prediction models using software project data. Missing imputation methods are known to be more effective missing data handling method than deleting methods in small software project data. While K nearest neighbor imputation is a proper missing imputation method in the software project data, it cannot use non-missing information of incomplete project instances. In this paper, we propose an approach to missing data imputation for numerical software project data by combining K nearest neighbor and maximum likelihood estimation; we also extend the average absolute error measure by normalization for accurate evaluation. Our approach overcomes the limitation of K nearest neighbor imputation and outperforms on our real data sets.
PDF KSCI

Park, So-Hyun;Bang, Sung-Wan;Jhun, Myoung-Shic
- The Korean Journal of Applied Statistics
- /
- v.24 no.6
- /
- pp.1249-1257
- /
- 2011
In this paper, we propose a Sequential Adaptive Nearest Neighbor(SANN) imputation method that combines the Adaptive Nearest Neighbor(ANN) method and the Sequential k-Nearest Neighbor(SKNN) method. When choosing the nearest neighbors of missing observations, the proposed SANN method takes the local feature of the missing observations into account as well as reutilizes the imputed observations in a sequential manner. By using a Monte Carlo study and a real data example, we demonstrate the characteristics of the SANN method and its potential performance.
https://doi.org/10.5351/KJAS.2011.24.6.1249 인용 PDF KSCI

Shin, Heyseo;Kim, Dongjae
- The Korean Journal of Applied Statistics
- /
- v.33 no.3
- /
- pp.269-280
- /
- 2020
Time course gene expression data is a large amount of data observed over time in microarray experiments. This data can also simultaneously identify the level of gene expression. However, the experiment process is complex, resulting in frequent missing values due to various causes. In this paper, we propose a pattern consistency index adaptive nearest neighbors as a method of missing value imputation. This method combines the adaptive nearest neighbors (ANN) method that reflects local characteristics and the pattern consistency index that considers consistent degree for gene expression between observations over time points. We conducted a Monte Carlo simulation study to evaluate the usefulness of proposed the pattern consistency index adaptive nearest neighbors (PANN) method for two yeast time course data.
https://doi.org/10.5351/KJAS.2020.33.3.269 인용 PDF KSCI