DOI QR코드

DOI QR Code

A hybrid method to compose an optimal gene set for multi-class classification using mRMR and modified particle swarm optimization

mRMR과 수정된 입자군집화 방법을 이용한 다범주 분류를 위한 최적유전자집단 구성

  • Lee, Sunho (Division of Mathematics and Statistics, Sejong University)
  • 이선호 (세종대학교 수학통계학부)
  • Received : 2020.08.24
  • Accepted : 2020.10.19
  • Published : 2020.12.31

Abstract

The aim of this research is to find an optimal gene set that provides highly accurate multi-class classification with a minimum number of genes. A two-stage procedure is proposed: Based on minimum redundancy and maximum relevance (mRMR) framework, several statistics to rank differential expression genes and K-means clustering to reduce redundancy between genes are used for data filtering procedure. And a particle swarm optimization is modified to select a small subset of informative genes. Two well known multi-class microarray data sets, ALL and SRBCT, are analyzed to indicate the effectiveness of this hybrid method.

표본의 다범주 표현형을 예측하는데 사용되는 최적의 유전자집단이란 적은 수의 유전자로 표현형을 정확히 예측할 수 있는 유전자들의 모임이다. 특이발현유전자를 검색하는 통계량은 이미 여러 가지가 있고, K-평균 군집화를 곁들여 중복성이 적은 특이발현유전자들을 선택 가능하다. 이들을 바탕으로 적은 수로 정확하게 다범주 분류가 가능한 유전자집단을 구성할 수 있도록 수정한 입자최적화 방법을 제안한다. 널리 알려진 ALL 248례와 SRBCT 83례를 이용하여 제안된 방법으로 최적유전자집단을 찾을 수 있음을 보였다.

Keywords

References

  1. Ambroise, C. and McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data, Proceedings of National Academy of Sciences, 99, 6562-6566. https://doi.org/10.1073/pnas.102102699
  2. Broet, P., Lewin, A., Richardson, S., Dalmasso, C., and Magdelenat, H. (2004). A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments, Bioinformatics, 20, 2562-2571. https://doi.org/10.1093/bioinformatics/bth285
  3. Deutsch, J. (2003). Evolutionary algorithms for finding optimal gene sets in microarray prediction, Bioinformatics, 19, 45-52. https://doi.org/10.1093/bioinformatics/19.1.45
  4. Ding, C. and Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data, Journal of Bioinformatics and Computational Biology, 3, 185-205. https://doi.org/10.1142/S0219720005001004
  5. Dudoit, S., Fridlyand, J., and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of American Statistical Association, 97, 77-87. https://doi.org/10.1198/016214502753479248
  6. Grisci, B., Feltes, B., and Dorn, M. (2018). Microarray classification and gene selection with FS-NEAT, 2018 IEEE Congress on Evolutionary Computation, 1-8.
  7. Golub, T., Slonim, D., Tamayo, P., et al. (1999). Molecular classication of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-536. https://doi.org/10.1126/science.286.5439.531
  8. Han, F., Yang, C., Wu, YQ., Zhu, JS., Ling, Q. H., Song, Y. Q., and Huang, D. S. (2017). A gene selection method for microarray data based on binary pso encoding gene-to-class sensitivity information, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14, 85-96. https://doi.org/10.1109/TCBB.2015.2465906
  9. Khan, J., Wei, J., Ringner, M., et al. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, 7, 673-679. https://doi.org/10.1038/89044
  10. Kennedy, J. and Eberhart, R. (1995). Particle swarm optimization. In Proceedings. of IEEE International Confefence on Neural Networks, 4, 1942-1948.
  11. Kennedy, J. and Eberhart, R. (1997). A discrete binary version of the particle swarm algorithm, IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, 5, 4104-4108.
  12. Kononenko, I. (1994). Estimating attributes: Analysis and extensions of Relief, European Conference on Machine Learning Springer-Verlag, 171-182.
  13. Lee, K., Kim, K., and Shin, S. (2019). A comparative study of feature screening methods for ultrahigh dimensional multiclass classication, The Korean Journal of Applied Statistics, 30, 793-808. https://doi.org/10.5351/KJAS.2017.30.5.793
  14. Li, Y. X. and Ruan, X. G. (2005). Feature selection for cancer classification based on support vector machine, Journal of Computer Research and Development, 42, 1796-1801. https://doi.org/10.1360/crad20051024
  15. Liu, J. and Fan, X. (2009). The analysis and improvement of binary particle swarm optimization, International Conference on Computational Intelligence and Security, 254-258,
  16. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 281-297.
  17. Mohamad, M. S., Omatu, S., Deris, S., Yoshioka, M., Abdullah, A., and Ibrahim, Z. (2013). An enhancement of binary particle swarm optimization for gene selection in classifying can cer classes, Algorithms for Molecular Biology, 8, 15. https://doi.org/10.1186/1748-7188-8-15
  18. Peram, T., Veeramacheneni, K., and Mohan, C. K. (2003). Fitness-distance-ratio based particle swarm optimization, IEEE Swarm Intelligence Symposium, IEEE Press, 174-181.
  19. Shi Y. and Eberhart R. (1998). A modified particles swarm optimizer, IEEE Congress on Evolutionary Computation, 69-73.
  20. Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. (2003), Class predicition by nearest shrunken centroids with applications to DNA microarrays, Statistical Science, 18, 104-117. https://doi.org/10.1214/ss/1056397488
  21. Too, J., Abdullah, AR., and Saad, N. (2019). A new co-evolution binary particle swarm optimization with multiple inertia weight strategy for feature selection, Informatics, 6, 21. https://doi.org/10.3390/informatics6020021
  22. Tusher, V. G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, Proceedings of National Academy of Sciences, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  23. Wang, S., Zhu Y., Jia, W., and Huang, D. (2012). Robust classification method of tumor subtype by using correlation filters, IEEE-Acm Transactions on Computational Biology and Bioinformatics, 9, 580-591. https://doi.org/10.1109/TCBB.2011.135
  24. Yang, K., Cai, Z., Li, J., and Lin, G. (2006). A stable gene selection in microarray data analysis, BMC Bioinformatics 7, 228. https://doi.org/10.1186/1471-2105-7-228
  25. Yeoh, E., Ross, M., Shurtleffet, S., et al. (2002). Classification, subtype discovery, and prediction of outcome in pediatric lymphoblastic leukemia by gene expression profiling, Cancer Cell, 1, 133-143. https://doi.org/10.1016/S1535-6108(02)00032-6
  26. Zhang, Y., Ding, C., and Li, T. (2008). Gene selection algorithm by combining reliefF and mRMR, BMC Genomics, 9, S27 https://doi.org/10.1186/1471-2164-9-S2-S27
  27. Zhou, X. and Tuck, D. (2007). MSVM-RFE: Extensions of SVM-RFE for multiclass gene selection on DNA microarray data, Bioinformatics, 23, 1106-1114. https://doi.org/10.1093/bioinformatics/btm036