Browse > Article
http://dx.doi.org/10.5351/KJAS.2020.33.3.269

Missing values imputation for time course gene expression data using the pattern consistency index adaptive nearest neighbors  

Shin, Heyseo (Department of Biomedicine.Health Science, The Catholic University of Korea)
Kim, Dongjae (Department of Biomedicine.Health Science, The Catholic University of Korea)
Publication Information
The Korean Journal of Applied Statistics / v.33, no.3, 2020 , pp. 269-280 More about this Journal
Abstract
Time course gene expression data is a large amount of data observed over time in microarray experiments. This data can also simultaneously identify the level of gene expression. However, the experiment process is complex, resulting in frequent missing values due to various causes. In this paper, we propose a pattern consistency index adaptive nearest neighbors as a method of missing value imputation. This method combines the adaptive nearest neighbors (ANN) method that reflects local characteristics and the pattern consistency index that considers consistent degree for gene expression between observations over time points. We conducted a Monte Carlo simulation study to evaluate the usefulness of proposed the pattern consistency index adaptive nearest neighbors (PANN) method for two yeast time course data.
Keywords
missing values imputation; adaptive nearest neighbors; pattern consistency index; time course gene expression data;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686.   DOI
2 Jhun, M., Jeong, H., and Koo, J. (2007). On the use of adaptive nearest neighbors for missing value imputation, Communications in Statistics: Simulation and Computation, 36, 1275-1286.   DOI
3 Kim, K., Oh, M., and Son, Y. (2008). Missing values estimation for tine course gene expression data using the sequential partial least squares regression fitting, The Korean Journal of Applied Statistics, 21, 275-290.   DOI
4 Kim, S. and Kim, D. (2018). Imputation method for missing data based on clustering and measure of property, The Korean Journal of Applied Statistics, 31, 29-40.   DOI
5 Park, J. and Lee, I. (2002). Utilization of BioInforMetics with high efficiency array biotech, News & Information for Chemical Engineers, 20, 431-440.
6 Son, Y. and Baek, J. (2005). A pattern consistency index for detecting heterogeneous time series in clustering time course gene expression data, The Korean Journal of Applied Statistics, 18, 371-379.   DOI
7 Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle- regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell, 9, 3273-3297.   DOI
8 Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Bostein, D., and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525.   DOI