DOI QR코드

DOI QR Code

mRMR과 수정된 입자군집화 방법을 이용한 다범주 분류를 위한 최적유전자집단 구성

A hybrid method to compose an optimal gene set for multi-class classification using mRMR and modified particle swarm optimization

  • 이선호 (세종대학교 수학통계학부)
  • Lee, Sunho (Division of Mathematics and Statistics, Sejong University)
  • 투고 : 2020.08.24
  • 심사 : 2020.10.19
  • 발행 : 2020.12.31

초록

표본의 다범주 표현형을 예측하는데 사용되는 최적의 유전자집단이란 적은 수의 유전자로 표현형을 정확히 예측할 수 있는 유전자들의 모임이다. 특이발현유전자를 검색하는 통계량은 이미 여러 가지가 있고, K-평균 군집화를 곁들여 중복성이 적은 특이발현유전자들을 선택 가능하다. 이들을 바탕으로 적은 수로 정확하게 다범주 분류가 가능한 유전자집단을 구성할 수 있도록 수정한 입자최적화 방법을 제안한다. 널리 알려진 ALL 248례와 SRBCT 83례를 이용하여 제안된 방법으로 최적유전자집단을 찾을 수 있음을 보였다.

The aim of this research is to find an optimal gene set that provides highly accurate multi-class classification with a minimum number of genes. A two-stage procedure is proposed: Based on minimum redundancy and maximum relevance (mRMR) framework, several statistics to rank differential expression genes and K-means clustering to reduce redundancy between genes are used for data filtering procedure. And a particle swarm optimization is modified to select a small subset of informative genes. Two well known multi-class microarray data sets, ALL and SRBCT, are analyzed to indicate the effectiveness of this hybrid method.

키워드

참고문헌

  1. Ambroise, C. and McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data, Proceedings of National Academy of Sciences, 99, 6562-6566. https://doi.org/10.1073/pnas.102102699
  2. Broet, P., Lewin, A., Richardson, S., Dalmasso, C., and Magdelenat, H. (2004). A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments, Bioinformatics, 20, 2562-2571. https://doi.org/10.1093/bioinformatics/bth285
  3. Deutsch, J. (2003). Evolutionary algorithms for finding optimal gene sets in microarray prediction, Bioinformatics, 19, 45-52. https://doi.org/10.1093/bioinformatics/19.1.45
  4. Ding, C. and Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data, Journal of Bioinformatics and Computational Biology, 3, 185-205. https://doi.org/10.1142/S0219720005001004
  5. Dudoit, S., Fridlyand, J., and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of American Statistical Association, 97, 77-87. https://doi.org/10.1198/016214502753479248
  6. Grisci, B., Feltes, B., and Dorn, M. (2018). Microarray classification and gene selection with FS-NEAT, 2018 IEEE Congress on Evolutionary Computation, 1-8.
  7. Golub, T., Slonim, D., Tamayo, P., et al. (1999). Molecular classication of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-536. https://doi.org/10.1126/science.286.5439.531
  8. Han, F., Yang, C., Wu, YQ., Zhu, JS., Ling, Q. H., Song, Y. Q., and Huang, D. S. (2017). A gene selection method for microarray data based on binary pso encoding gene-to-class sensitivity information, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14, 85-96. https://doi.org/10.1109/TCBB.2015.2465906
  9. Khan, J., Wei, J., Ringner, M., et al. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, 7, 673-679. https://doi.org/10.1038/89044
  10. Kennedy, J. and Eberhart, R. (1995). Particle swarm optimization. In Proceedings. of IEEE International Confefence on Neural Networks, 4, 1942-1948.
  11. Kennedy, J. and Eberhart, R. (1997). A discrete binary version of the particle swarm algorithm, IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, 5, 4104-4108.
  12. Kononenko, I. (1994). Estimating attributes: Analysis and extensions of Relief, European Conference on Machine Learning Springer-Verlag, 171-182.
  13. Lee, K., Kim, K., and Shin, S. (2019). A comparative study of feature screening methods for ultrahigh dimensional multiclass classication, The Korean Journal of Applied Statistics, 30, 793-808. https://doi.org/10.5351/KJAS.2017.30.5.793
  14. Li, Y. X. and Ruan, X. G. (2005). Feature selection for cancer classification based on support vector machine, Journal of Computer Research and Development, 42, 1796-1801. https://doi.org/10.1360/crad20051024
  15. Liu, J. and Fan, X. (2009). The analysis and improvement of binary particle swarm optimization, International Conference on Computational Intelligence and Security, 254-258,
  16. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 281-297.
  17. Mohamad, M. S., Omatu, S., Deris, S., Yoshioka, M., Abdullah, A., and Ibrahim, Z. (2013). An enhancement of binary particle swarm optimization for gene selection in classifying can cer classes, Algorithms for Molecular Biology, 8, 15. https://doi.org/10.1186/1748-7188-8-15
  18. Peram, T., Veeramacheneni, K., and Mohan, C. K. (2003). Fitness-distance-ratio based particle swarm optimization, IEEE Swarm Intelligence Symposium, IEEE Press, 174-181.
  19. Shi Y. and Eberhart R. (1998). A modified particles swarm optimizer, IEEE Congress on Evolutionary Computation, 69-73.
  20. Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. (2003), Class predicition by nearest shrunken centroids with applications to DNA microarrays, Statistical Science, 18, 104-117. https://doi.org/10.1214/ss/1056397488
  21. Too, J., Abdullah, AR., and Saad, N. (2019). A new co-evolution binary particle swarm optimization with multiple inertia weight strategy for feature selection, Informatics, 6, 21. https://doi.org/10.3390/informatics6020021
  22. Tusher, V. G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, Proceedings of National Academy of Sciences, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  23. Wang, S., Zhu Y., Jia, W., and Huang, D. (2012). Robust classification method of tumor subtype by using correlation filters, IEEE-Acm Transactions on Computational Biology and Bioinformatics, 9, 580-591. https://doi.org/10.1109/TCBB.2011.135
  24. Yang, K., Cai, Z., Li, J., and Lin, G. (2006). A stable gene selection in microarray data analysis, BMC Bioinformatics 7, 228. https://doi.org/10.1186/1471-2105-7-228
  25. Yeoh, E., Ross, M., Shurtleffet, S., et al. (2002). Classification, subtype discovery, and prediction of outcome in pediatric lymphoblastic leukemia by gene expression profiling, Cancer Cell, 1, 133-143. https://doi.org/10.1016/S1535-6108(02)00032-6
  26. Zhang, Y., Ding, C., and Li, T. (2008). Gene selection algorithm by combining reliefF and mRMR, BMC Genomics, 9, S27 https://doi.org/10.1186/1471-2164-9-S2-S27
  27. Zhou, X. and Tuck, D. (2007). MSVM-RFE: Extensions of SVM-RFE for multiclass gene selection on DNA microarray data, Bioinformatics, 23, 1106-1114. https://doi.org/10.1093/bioinformatics/btm036