DOI QR코드

DOI QR Code

Gene Selection Based on Support Vector Machine using Bootstrap

붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법

  • Published : 2007.11.30

Abstract

The recursive feature elimination for support vector machine is known to be useful in selecting relevant genes. Since the criterion for choosing relevant genes is the absolute value of a coefficient, the recursive feature elimination may suffer from a scaling problem. We propose a modified version of the recursive feature elimination algorithm using bootstrap. In our method, the criterion for determining relevant genes is the absolute value of a coefficient divided by its standard error, which accounts for statistical variability of the coefficient. Through numerical examples, we illustrate that our method is effective in gene selection.

본 연구에서는 유전자 선택 방법으로 최근 이용되는 SVM-RFE 알고리즘은 단순히 가중치의 절대값을 유전자 선택 기준으로 사용하여 유전자 값의 변동성을 고려하지 못하므로 가중치의 절대값을 그것의 표준오차로 나눈 보완된 통계량, B-RFE 알고리즘을 새로운 기준으로 제안하였다. 두 방법을 모의실험을 통해서 비교한 결과 본 연구에서 제안한 B-RFE 알고리즘이 더 의미 있는 순위를 도출하였다.

Keywords

References

  1. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra. S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750
  2. Dudoit, S., Fridlyand, J. and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 77-87 https://doi.org/10.1198/016214502753479248
  3. Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M. and Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 906-914 https://doi.org/10.1093/bioinformatics/16.10.906
  4. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh M. L., Downing, J. R., Caligiuri, M. A., Bloomfield C. D. and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537 https://doi.org/10.1126/science.286.5439.531
  5. Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines, Machine Learning, 46, 389-422 https://doi.org/10.1023/A:1012487302797
  6. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, 7, 673-679 https://doi.org/10.1038/89044
  7. Kohavi, R. and John, G. (1997). Wrappers for feature subset selection, Artificial Intelligence, 97, 273-324 https://doi.org/10.1016/S0004-3702(97)00043-X
  8. Koutsoukos, A. D., Rubinstein, L. V., Faraggi, D., Simon, R. M., Kalyandrug, S., Weinstein, J. N., Kohn, K. W. and Paull, K. D. (1994). Discrimination techniques applied to the NCI in vitro anti-tumour drug screen: predicting biochemical mechanism of action, Statistics in Medicine, 13, 719-730 https://doi.org/10.1002/sim.4780130532
  9. LeCun, Y., Denker, J. S. and Solla, S. A. (1990). Optimum brain damage, Advances in neural information processing systems, 2, 598-605
  10. Pavlidis, P., Weston, J., Cai, J. and Grundy, W. N. (2001). Gene functional classification from heterogeneous data, Annual Conference on Research in Computational Molecular Biology Proceedings of the fifth annual international conference on Computational biology
  11. Philip, M. L. and Vinsensius, B. V. (2003). Boosting and microarray data, Machine Learning, 52, 31-44 https://doi.org/10.1023/A:1023937123600
  12. Vapnik, V. N. (1998). Statistical Learning Theory, John Wiley & Sons, New York