[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9709/JKSS.2013.22.4.139

Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest

Ohn, Syng-Yup (한국항공대학교 컴퓨터 공학과)
Chi, Seung-Do (한국항공대학교 컴퓨터 공학과)
Han, Mi-Young (한국과학창의재단)

Publication Information

Journal of the Korea Society for Simulation / v.22, no.4, 2013 , pp. 139-147 More about this Journal

Abstract

This paper proposes a novel method for feature selection for mass spectrometric proteomic data based on Random Forest. The method includes an effective preprocessing step to filter a large amount of redundant features with high correlation and applies a tournament strategy to get an optimal feature subset. Experiments on three public datasets, Ovarian 4-3-02, Ovarian 7-8-02 and Prostate shows that the new method achieves high performance comparing with widely used methods and balanced rate of specificity and sensitivity.

Keywords

Feature Selection; Bioinformatics; Pattern Recognition; SELDI-TOF; Proteome; Spectrum; Random Forest; Pearson Correlation;

Citations & Related Records

Reference

1	S. Das, Filters, wrappers and a boosting-based hybrid for feature selection, Proceedings of the 18th ICML, pp. 74-81, 2001.
2	A.Y. Ng, "On feature selection: learning with exponentially many irrelevant features as training examples", Proceedings of the Fifteenth International Conference on Machine Learning, 1998.
3	E. Xing, M. Jordan and R. Carp, "Feature selection for highdimensional genomic microarray data", Proc. of the 18th ICML, 2001.
4	E.F. Petricoin, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn and L.A. Liotta, "Use of proteomic patterns in serum to identify ovarian cancer", Lancet. Vol. 359, No. 9306, pp. 572-577, 2002. DOI ScienceOn
5	K. Jong, E. Marchiori, M. Sebagy and A. Vaart, Feature Selection in Proteomic Pattern Data with Support Vector Machines, pp. 41-48, 2004.
6	I. Levner, Feature selection and nearest centroid classification for protein mass spectrometry, BMC Bioinformatics, 2005, available from http://www.biomedcentral.com/1471-2105/6/68.
7	R.H. Lilien, H. Farid and B.R. Donald, Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. Computational Biology, Vol. 10, No. 6, pp. 925-946, 2003. DOI ScienceOn
8	W. Michael, D.N. Naik, S. Kasukurti, A. Pothen, R.R. Devineni, B.L. Adam, O.J. Semmes and G.L. Wright, Computational protein biomarker prediction: a case study for prostate cancer. BMC Bioinformatics, 2004, available from http://www.biomedcentral.com/1471-2105/5/26.
9	B. Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams and H. Zhao, Comparison of statistical methods for classifcation of ovarian cancer using mass spectrometry data. BioInformatics, Vol. 19, No. 13, pp. 1636-1643, 2003. DOI ScienceOn
10	L. Breiman, Random forest, Machine Learning, Vol. 45, pp. 5-32, 2001. DOI ScienceOn
11	R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification; 2nd Edition, John Wiley & Sons Inc, 2001.
12	P.N. Tan, M. Steinbach and V.S Kumar, Introduction to Data mining, Addison-Wesley, 2006.
13	I, Guyon and A. Elisseeff, An introduction to variable and feature selection, Machine learning, Vol. 3, Special Issue on variable and feature selection, pp. 1157-1182, 2003.
14	http://clinicalproteomics.steem.com/
15	R. Tibshirani, T. Hastiey, B. Narasimhanz, S. Soltys, G. Shi, A. Koong and Q. Le, Sample classifcation from protein mass spectrometry by 'peak probability contrasts'. BioInformatics, Vol. 7, No. 17, pp. 3034-3044, 2004.

KSCI

Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest 단백체 스펙트럼 데이터의 분류를 위한 랜덤 포리스트 기반 특성 선택 알고리즘

Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest