Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data

유전자 알고리즘과 Feature Wrapping을 통한 마이크로어레이 데이타 중복 특징 소거법

  • Published : 2008.08.15

Abstract

Due to the high dimensional problem, typically machine learning algorithms have relied on feature selection techniques in order to perform effective classification in microarray gene expression datasets. However, the large number of features compared to the number of samples makes the task of feature selection computationally inprohibitive and prone to errors. One of traditional feature selection approach was feature filtering; measuring one gene per one step. Then feature filtering was an univariate approach that cannot validate multivariate correlations. In this paper, we proposed a function for measuring both class separability and correlations. With this approach, we solved the problem related to feature filtering approach.

본 논문에서는 유전자 사이의 상관계수가 높은 마이크로어레이 데이타에 대하여 제안하는 알고리즘을 통해 상관계수가 낮은 유전자들의 부집합을 만들고, 이에 대해 적합 함수를 통한 평가로 기존 방법론이 가지는 한계를 극복할 수 있도록 하였다. 기존 방법론은 개별 특징의 평가를 통해 중복 특징을 제거하며, 상관계수에 대한 고려가 없어 선택된 유전자 부집합들의 상관계수가 논은 문제가 있었다. 이에 따라 제안하는 알고리즘은 특징간의 관계를 평가하는 Feature Wrapping 기법을 활용하여, 추출된 유전자 부집합에 포함된 유전자 사이의 상관관계가 낮고, 클래스 구분력이 높은 특징을 갖도록 하였다.

Keywords

References

  1. Stephen Erickson, Hierarchical empirical Bayes analysis of genomic microarrays, University of California, Los Angeles, AAT 3247476, 2006
  2. Peng H.C., Long, F., Ding, C., "Feature selection based on mutual information: criteria of max- dependency, max-relevance, and min-redundancy," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, pp. 1226-1238, 2005 https://doi.org/10.1109/TPAMI.2005.159
  3. Ian A. Wood, Peter M. Visscher, Kerrie L. Mengersen, "Classification based upon gene expression data: bias and precision of error rates," Bioinformatics, Vol.23, pp. 1363-1370, 2007 https://doi.org/10.1093/bioinformatics/btm117
  4. Yudi Pawitan, Karuturi R. Krishna Murthy, Stefan Michiels, Alexander Ploner, "Bias in the estimation of false discovery rate in microarray studies," Bioinformatics, Vol.21, p. 3865, 2005 https://doi.org/10.1093/bioinformatics/bti626
  5. Dan Nettleton, "A Discussion of Statistical Methods for Design and Analysis of Microarray Experiments for Plant Scientists," Plant Cell, Vol.18, pp. 2112-2121, 2006 https://doi.org/10.1105/tpc.106.041616
  6. Kevin Dobbin, Richard Simon, "Sample size determination in microarray experiments for class comparison and prognostic classification," Biostatistics, Vol.6, p. 27, 2005 https://doi.org/10.1093/biostatistics/kxh015
  7. T. R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, Vol.286, pp. 531-537, 1999 https://doi.org/10.1126/science.286.5439.531
  8. Danh V. et al., "Tumor classification by partial least squares using microarray gene expression data," Bioinformatics, Vol.18, No. 1, pp. 39-50, 2001 https://doi.org/10.1093/bioinformatics/18.1.39
  9. David P. Kreil, Roslin R Russell, "There is no silver bullet - a guide to low-level data transforms and normalisation methods for microarray data," Briefings in Bioinformatics, Vol.6, pp. 86-97, 2005 https://doi.org/10.1093/bib/6.1.86
  10. Seo Young Kim, Jae Won Lee, In Suk Sohn, "Comparison of various statistical methods for identifying differential gene expression in replicated microarray data," Statistical Methods in Medical Research, Vol.15, p. 3, 2006 https://doi.org/10.1191/0962280206sm423oa
  11. Carla S. Möller-Levet, Catharine M. West, Crispin J. Miller, "Exploiting sample variability to enhance multivariate analysis of microarray data," Bioinformatics, Vol.23, pp. 2733-2740, 2007 https://doi.org/10.1093/bioinformatics/btm441
  12. Guo Yu, Statistical issues in microarry data analysis: Array-to-array normalization, Empirical Bayes batch effect adjustment, and Pearson's correlation coefficient in the context of replicated experiments, Harvard University, AAT 3217745, 2006
  13. Cianluca B., "A Blocking Startegy to Improve Gene Selection for Classification of Gene Expression Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, pp. 293-300, 2007
  14. Miin-Shen, Kuo-Lung Wu, "A Similarity-Based Robust Clustering Method," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.26, pp. 434-448, 2004 https://doi.org/10.1109/TPAMI.2004.1265860
  15. Yvan Saeys, Iñaki Inza, Pedro Larrañaga, "A review of feature selection techniques in bioinformatics," Bioinformatics, Vol.23, pp. 2507-2517, 2007 https://doi.org/10.1093/bioinformatics/btm344