Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data

Lee, Jae-Sung;Kim, Dae-Won;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 35 Issue 8
/
Pages.463-478
/
2008
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data

유전자 알고리즘과 Feature Wrapping을 통한 마이크로어레이 데이타 중복 특징 소거법

이재성 (중앙대학교 컴퓨터공학과) ;
김대원 (중앙대학교 컴퓨터공학과)

Published : 2008.08.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Due to the high dimensional problem, typically machine learning algorithms have relied on feature selection techniques in order to perform effective classification in microarray gene expression datasets. However, the large number of features compared to the number of samples makes the task of feature selection computationally inprohibitive and prone to errors. One of traditional feature selection approach was feature filtering; measuring one gene per one step. Then feature filtering was an univariate approach that cannot validate multivariate correlations. In this paper, we proposed a function for measuring both class separability and correlations. With this approach, we solved the problem related to feature filtering approach.

본 논문에서는 유전자 사이의 상관계수가 높은 마이크로어레이 데이타에 대하여 제안하는 알고리즘을 통해 상관계수가 낮은 유전자들의 부집합을 만들고, 이에 대해 적합 함수를 통한 평가로 기존 방법론이 가지는 한계를 극복할 수 있도록 하였다. 기존 방법론은 개별 특징의 평가를 통해 중복 특징을 제거하며, 상관계수에 대한 고려가 없어 선택된 유전자 부집합들의 상관계수가 논은 문제가 있었다. 이에 따라 제안하는 알고리즘은 특징간의 관계를 평가하는 Feature Wrapping 기법을 활용하여, 추출된 유전자 부집합에 포함된 유전자 사이의 상관관계가 낮고, 클래스 구분력이 높은 특징을 갖도록 하였다.

Keywords

References

Stephen Erickson, Hierarchical empirical Bayes analysis of genomic microarrays, University of California, Los Angeles, AAT 3247476, 2006
Peng H.C., Long, F., Ding, C., "Feature selection based on mutual information: criteria of max- dependency, max-relevance, and min-redundancy," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, pp. 1226-1238, 2005 https://doi.org/10.1109/TPAMI.2005.159
Ian A. Wood, Peter M. Visscher, Kerrie L. Mengersen, "Classification based upon gene expression data: bias and precision of error rates," Bioinformatics, Vol.23, pp. 1363-1370, 2007 https://doi.org/10.1093/bioinformatics/btm117
Yudi Pawitan, Karuturi R. Krishna Murthy, Stefan Michiels, Alexander Ploner, "Bias in the estimation of false discovery rate in microarray studies," Bioinformatics, Vol.21, p. 3865, 2005 https://doi.org/10.1093/bioinformatics/bti626
Dan Nettleton, "A Discussion of Statistical Methods for Design and Analysis of Microarray Experiments for Plant Scientists," Plant Cell, Vol.18, pp. 2112-2121, 2006 https://doi.org/10.1105/tpc.106.041616
Kevin Dobbin, Richard Simon, "Sample size determination in microarray experiments for class comparison and prognostic classification," Biostatistics, Vol.6, p. 27, 2005 https://doi.org/10.1093/biostatistics/kxh015
T. R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, Vol.286, pp. 531-537, 1999 https://doi.org/10.1126/science.286.5439.531
Danh V. et al., "Tumor classification by partial least squares using microarray gene expression data," Bioinformatics, Vol.18, No. 1, pp. 39-50, 2001 https://doi.org/10.1093/bioinformatics/18.1.39
David P. Kreil, Roslin R Russell, "There is no silver bullet - a guide to low-level data transforms and normalisation methods for microarray data," Briefings in Bioinformatics, Vol.6, pp. 86-97, 2005 https://doi.org/10.1093/bib/6.1.86
Seo Young Kim, Jae Won Lee, In Suk Sohn, "Comparison of various statistical methods for identifying differential gene expression in replicated microarray data," Statistical Methods in Medical Research, Vol.15, p. 3, 2006 https://doi.org/10.1191/0962280206sm423oa
Carla S. Möller-Levet, Catharine M. West, Crispin J. Miller, "Exploiting sample variability to enhance multivariate analysis of microarray data," Bioinformatics, Vol.23, pp. 2733-2740, 2007 https://doi.org/10.1093/bioinformatics/btm441
Guo Yu, Statistical issues in microarry data analysis: Array-to-array normalization, Empirical Bayes batch effect adjustment, and Pearson's correlation coefficient in the context of replicated experiments, Harvard University, AAT 3217745, 2006
Cianluca B., "A Blocking Startegy to Improve Gene Selection for Classification of Gene Expression Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, pp. 293-300, 2007
Miin-Shen, Kuo-Lung Wu, "A Similarity-Based Robust Clustering Method," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.26, pp. 434-448, 2004 https://doi.org/10.1109/TPAMI.2004.1265860
Yvan Saeys, Iñaki Inza, Pedro Larrañaga, "A review of feature selection techniques in bioinformatics," Bioinformatics, Vol.23, pp. 2507-2517, 2007 https://doi.org/10.1093/bioinformatics/btm344

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data

유전자 알고리즘과 Feature Wrapping을 통한 마이크로어레이 데이타 중복 특징 소거법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)