DOI QR코드

DOI QR Code

Cancer subtype's classifier based on Hybrid Samples Balanced Genetic Algorithm and Extreme Learning Machine

하이브리드 균형 표본 유전 알고리즘과 극한 기계학습에 기반한 암 아류형 분류기

  • Sachnev, Vasily (School of Information, Communication and Electronics Engineering, Catholic University) ;
  • Suresh, Sundaram (School of Computer Science and Engineering, Nanyang Technological University) ;
  • Choi, Yong Soo (Division of Liberal Arts & Teaching, Sungkyul University)
  • Received : 2016.12.12
  • Accepted : 2016.12.31
  • Published : 2016.12.31

Abstract

In this paper a novel cancer subtype's classifier based on Hybrid Samples Balanced Genetic Algorithm with Extreme Learning Machine (hSBGA-ELM) is presented. Proposed cancer subtype's classifier uses genes' expression data of 16063 genes from open Global Cancer Map (GCM) data base for accurate cancer subtype's classification. Proposed method efficiently classifies 14 subtypes of cancer (breast, prostate, lung, colorectal, lymphoma, bladder, melanoma, uterus, leukemia, renal, pancreas, ovary, mesothelioma and CNS). Proposed hSBGA-ELM unifies genes' selection procedure and cancer subtype's classification into one framework. Proposed Hybrid Samples Balanced Genetic Algorithm searches a reduced robust set of genes responsible for cancer subtype's classification from 16063 genes available in GCM data base. Selected reduced set of genes is used to build cancer subtype's classifier using Extreme Learning Machine (ELM). As a result, reduced set of robust genes guarantees stable generalization performance of the proposed cancer subtype's classifier. Proposed hSBGA-ELM discovers 95 genes probably responsible for cancer. Comparison with existing cancer subtype's classifiers clear indicates efficiency of the proposed method.

본 논문에서는 극한 기계학습을 이용하는 하이브리드 균형 표본 유전자 알고리즘(hSBGA-ELM)을 기반으로 한 새로운 암 아류형 분류자를 제안하였다. 제안 된 암 아류형 분류자는 정확한 암 아류형 분류기 설계를 위해 공개 전체암지도 (Global Cancer Map)로부터 15063개의 유전자 발현 데이터를 사용합니다. 제안된 방법에서는 14가지(유방암, 전립선 암, 폐암, 대장 암, 림프종, 방광, 흑색 종, 자궁, 백혈병, 신장, 췌장, 난소, 중피종 및 CNS)의 암 아류형을 효율적으로 분류합니다. 제안 된 hSBGA-ELM은 유전자 선택 절차 및 암 아류형 분류를 하나의 프레임 워크로 단일화 한다. 제안 된 하이브리드 균형 표본 유전 알고리즘은 GCM 데이터베이스에서 이용 가능한 16,063 개의 유전자로부터 암 아류형 분류를 담당하는 축소된 강인 유전자 셋을 찾는다. 선택/축소된 유전자 세트는 익스트림 기계학습을 이용하여 암 아류형 분류기를 구성하는데 사용된다. 결과적으로, 크기가 축소된 강인 유전자 집합이 제안하는 암 아류형 분류기의 안정된 일반화 성능을 보장하게 한다. 제안 된 hSBGA-ELM은 암에 관여하는 것으로 예측되는 95개의 유전자를 발견하였으며 기존의 암 아류형 분류기와의 비교를 통해 제안 된 방법의 효율을 보여준다.

Keywords

References

  1. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T. R. Golub, "Multiclass cancer diagnosis using tumor gene expression signatures", Proceeding of National Academic Science US, vol. 98, no. 26, pp. 15149-15154, 2001. https://doi.org/10.1073/pnas.211566398
  2. D. Koller and M. Sahami, "Toward optimal feature selection," In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284 - 292, Bari, Italy, 1996.
  3. Z. J Lee, "An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer," Artificial Intelligent in Medicine, vol. 42, no. 1, pp. 81-93, 2008. https://doi.org/10.1016/j.artmed.2007.09.004
  4. T.-C. Lin, R.-S. Liu, Y.-T. Chao, and S.-Y. Chen, "Multiclass Microarray Data Classification Using GA/ANN Method," in PRICAI 2006: Trends in Artificial Intelligence, vol. 4099, pp. 1037-1041, 2006.
  5. G. Piatetsky-Shapiro, P. Tamayo, K. Dnuggets, and U.M. Lowell, "Microarray Data Mining: Facing the Challenges," SIGKDD Explorations, vol. 5, no. 2, pp. 1-5, Dec. 2003.
  6. A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, Jr., L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, vol. 403, no. 6769, pp. 503- 511, 2000. https://doi.org/10.1038/35000501
  7. S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub, "Prediction of central nervous system embryonal tumor outcome based on gene expression," Nature, vol. 415, no. 6870, pp. 436-442, 2002. https://doi.org/10.1038/415436a
  8. M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J.A. Olson, Jr., J.R. Marks, and J.R. Nevins, "Predicting the clinical status of human breast cancer by using gene expression profiles," Proceeding of National Academic Science USA, vol. 98, no. 20, pp. 11462-11467, 2001. https://doi.org/10.1073/pnas.201162998
  9. S. Saraswathi, S. Sundaram, N. Sundararajan, M. Zimmermann, and M. Nilsen-Hamilton, "ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented", IEEE ACM Transaction on Computational Biology and Bioinformatics, vol. 8, No. 3, pp. 452 - 463, 2011. https://doi.org/10.1109/TCBB.2010.13
  10. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification using Support Vector Machines," Machine Learning, vol. 46, no. 1-3, pp. 389-422, 2002. https://doi.org/10.1023/A:1012487302797
  11. X. Zhou and D. Tuck, "MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data," Bioinformatics, vol. 23, no. 9, pp. 1106-1114, 2007. https://doi.org/10.1093/bioinformatics/btm036
  12. Y. Tang, Y.-Q. Zhang, and Z. Huang, "Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis," IEEE/ACM Transaction Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 365-381, 2007. https://doi.org/10.1109/TCBB.2007.70224
  13. Y. Wang, I.V. Tetko, H.A. Mark, E. Frank, A. Facius, K.F.X. Mayer, and H.W. Mewes, "Gene selection from microarray data for cancer classification - a machine learning approach," Computational Biology and Chemistry, vol. 29, no. 1, p. 37-46, Feb. 2005. https://doi.org/10.1016/j.compbiolchem.2004.11.001
  14. Vasily Sachnev, Saras Saraswathi, Rashid Niaz, Andrzej Kloczkowski and Sundaram Suresh, "Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer", BMC Bioinformatics, vol. 16, no. 166, 2015
  15. G.-B. Huang, Q. Y. Zhu, and C. K. Siew, "Extreme learning machine: theory and applications", Neurocomputing, vol. 70, no. 1-3, pp. 985990, 2006.
  16. S. Suresh, S. N. Omkar, V. Mani, T. N. G. Prakash, "Lift coefficient prediction at high angle of attack using recurrent neural network", Aerospace Science and Technology, vol. 7, pp. 595-602, 2003 https://doi.org/10.1016/S1270-9638(03)00053-1
  17. L. V. Ma, S. H. Park, J. H. Jang and J. H. Park, "Fuzzy Decision Making-based Recommendation Channel System using the Social Network Database," J. of Digital Contents Society, Vol.17, No.5, 2016

Cited by

  1. MLW-gcForest: A Multi-Weighted gcForest Model for Cancer Subtype Classification by Methylation Data vol.9, pp.17, 2016, https://doi.org/10.3390/app9173589