Browse > Article
http://dx.doi.org/10.9709/JKSS.2013.22.4.139

Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest  

Ohn, Syng-Yup (한국항공대학교 컴퓨터 공학과)
Chi, Seung-Do (한국항공대학교 컴퓨터 공학과)
Han, Mi-Young (한국과학창의재단)
Abstract
This paper proposes a novel method for feature selection for mass spectrometric proteomic data based on Random Forest. The method includes an effective preprocessing step to filter a large amount of redundant features with high correlation and applies a tournament strategy to get an optimal feature subset. Experiments on three public datasets, Ovarian 4-3-02, Ovarian 7-8-02 and Prostate shows that the new method achieves high performance comparing with widely used methods and balanced rate of specificity and sensitivity.
Keywords
Feature Selection; Bioinformatics; Pattern Recognition; SELDI-TOF; Proteome; Spectrum; Random Forest; Pearson Correlation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Das, Filters, wrappers and a boosting-based hybrid for feature selection, Proceedings of the 18th ICML, pp. 74-81, 2001.
2 A.Y. Ng, "On feature selection: learning with exponentially many irrelevant features as training examples", Proceedings of the Fifteenth International Conference on Machine Learning, 1998.
3 E. Xing, M. Jordan and R. Carp, "Feature selection for highdimensional genomic microarray data", Proc. of the 18th ICML, 2001.
4 E.F. Petricoin, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn and L.A. Liotta, "Use of proteomic patterns in serum to identify ovarian cancer", Lancet. Vol. 359, No. 9306, pp. 572-577, 2002.   DOI   ScienceOn
5 K. Jong, E. Marchiori, M. Sebagy and A. Vaart, Feature Selection in Proteomic Pattern Data with Support Vector Machines, pp. 41-48, 2004.
6 I. Levner, Feature selection and nearest centroid classification for protein mass spectrometry, BMC Bioinformatics, 2005, available from http://www.biomedcentral.com/1471-2105/6/68.
7 R.H. Lilien, H. Farid and B.R. Donald, Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. Computational Biology, Vol. 10, No. 6, pp. 925-946, 2003.   DOI   ScienceOn
8 W. Michael, D.N. Naik, S. Kasukurti, A. Pothen, R.R. Devineni, B.L. Adam, O.J. Semmes and G.L. Wright, Computational protein biomarker prediction: a case study for prostate cancer. BMC Bioinformatics, 2004, available from http://www.biomedcentral.com/1471-2105/5/26.
9 B. Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams and H. Zhao, Comparison of statistical methods for classifcation of ovarian cancer using mass spectrometry data. BioInformatics, Vol. 19, No. 13, pp. 1636-1643, 2003.   DOI   ScienceOn
10 L. Breiman, Random forest, Machine Learning, Vol. 45, pp. 5-32, 2001.   DOI   ScienceOn
11 R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification; 2nd Edition, John Wiley & Sons Inc, 2001.
12 P.N. Tan, M. Steinbach and V.S Kumar, Introduction to Data mining, Addison-Wesley, 2006.
13 I, Guyon and A. Elisseeff, An introduction to variable and feature selection, Machine learning, Vol. 3, Special Issue on variable and feature selection, pp. 1157-1182, 2003.
14 http://clinicalproteomics.steem.com/
15 R. Tibshirani, T. Hastiey, B. Narasimhanz, S. Soltys, G. Shi, A. Koong and Q. Le, Sample classifcation from protein mass spectrometry by 'peak probability contrasts'. BioInformatics, Vol. 7, No. 17, pp. 3034-3044, 2004.