A comparison study of classification method based of SVM and data depth in microarray data

마이크로어레이 자료에서 서포트벡터머신과 데이터 뎁스를 이용한 분류방법의 비교연구

  • Published : 2009.03.31

Abstract

A robust L1 data depth was used in clustering and classification, so called DDclus and DDclass by Jornsten (2004). SVM-based classification works well in most of the situation but show some weakness in the presence of outliers. Proper gene selection is important in classification since there are so many redundant genes. Either by selecting appropriate genes or by gene clustering combined with classification method enhance the overall performance of classification. The performance of depth based method are evaluated among several SVM-based classification methods.

군집과 분류분석에서 L1 데이터 뎁스를 이용한 DDclust와 DDclass라고 불리는 로버스트한 방법이 Jornsten (2004)에 의하여 제안되었다. SVM-기반방법이 많이 사용되나 이상치가 있는 경우에는 약간의 문제가 있다. 유전자 자료에서는 유전자 수가 많기 때문에 적절한 유전자 선택과정이 필요하다. 따라서 적절한 유전자 또는 유전자 군집을 선택하여 분류에 이용하면 분류의 성능을 향상시킬 수 있다. 이러한 관점에서 뎁스 기반 분류방법과 SVM-기반 분류방법을 비교 연구하여 그 성능을 비교 하였다.

Keywords

References

  1. 백수진, 김진경, 황진수 (2006). L1 거리와 L1 데이터 뎁스를 이용한 분류방법의 비교연구. <응용통계연구>, 19, 183-193.
  2. Bishop, C. (2006). Pattern recognition and machine learning, Springer, New York.
  3. Christmann, A. (2002). Classification based on the support vector machine and on regression depth. In Statistical data analysis based on L1-norm and related methods, Ed. Y. Dodge, 341-352, Birkhauser, Boston.
  4. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J. and Caligiuri, M. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537. https://doi.org/10.1126/science.286.5439.531
  5. Guyon, I., Weston, J. Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classi cation using support vector machines. Machine Learning, 46, 389-422. https://doi.org/10.1023/A:1012487302797
  6. Jornstern, R. (2004). Clustering and classification based on L1 data depth. Journal of Multivariate Analysis, 90, 67-89. https://doi.org/10.1016/j.jmva.2004.02.013
  7. Jornsten, R., Vardi, Y. and Zhang, C.-H (2002). A robust clustering method and visualization tool based on data depth, In Statistical data analysis based on the L1-norm and related methods, Ed. Y. Dodge, 313-366, Birkhauser, Boston.
  8. Liu, R., Parelius, J. and Singh, K. (1999). Multivariate analysis by data depth : descriptuve statistics, graphics and inference (with discussion). The Annals of Statistics, 27, 783-858.
  9. Seok, K. H. (2007). Semi-supervised learning using kernel estimation. Journal of Korean Data & Information Science Society, 18, 629-636.
  10. Seok, K. H., Hwang, C. H. and Cho, D. H. (2002). On approximate prediction intervals for support vector machine. Journal of Korean Data & Information Science Society, 13, 65-75.
  11. Shim, J., Sohn, I., Kim, S., Lee, J., Green, P. and Hwang, C. (2009). Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine. Computational Statistics and Data Analysis, 53, 1736-1742. https://doi.org/10.1016/j.csda.2008.04.028
  12. Vardi, Y. and Zhang, C. (2000). The multivariate L1 median and associated data depth. Proceedings of the National Academy of Sciences, 97, 1423-1426. https://doi.org/10.1073/pnas.97.4.1423
  13. Yousef, M., Jung S., Showe, L. and Showe, M. (2007). Recursive cluster elimination (RCE) for classification and feature selection from gene expression data. BMC Bioinformatics, 8, 144. https://doi.org/10.1186/1471-2105-8-144
  14. Zhang, X., Lu X., Shi, Q., Xu, X., Leung, H., Harris, L., Iglehart, J., Miron, A., Liu, J. andWong, W. (2006). Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7, 197. https://doi.org/10.1186/1471-2105-7-197
  15. Zhou, X. and Tuck, D. P. (2007). MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics, 23, 1106-1114. https://doi.org/10.1093/bioinformatics/btm036