Unsupervised Feature Selection Method Based on Principal Component Loading Vectors

Park, Young Joon;Kim, Seoung Bum;

doi:10.7232/JKIIE.2014.40.3.275

Journal of Korean Institute of Industrial Engineers (대한산업공학회지)

Volume 40 Issue 3
/
Pages.275-282
/
2014
/
1225-0988(pISSN)
/
2234-6457(eISSN)

Korean Institute of Industrial Engineers (대한산업공학회)

DOI QR Code

Unsupervised Feature Selection Method Based on Principal Component Loading Vectors

주성분 분석 로딩 벡터 기반 비지도 변수 선택 기법

Park, Young Joon (School of Industrial Management Engineering, Korea University) ;
Kim, Seoung Bum (School of Industrial Management Engineering, Korea University)

박영준 (고려대학교 산업경영공학과) ;
김성범 (고려대학교 산업경영공학과)

Received : 2013.12.26
Accepted : 2014.05.17
Published : 2014.06.15

https://doi.org/10.7232/JKIIE.2014.40.3.275 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

One of the most widely used methods for dimensionality reduction is principal component analysis (PCA). However, the reduced dimensions from PCA do not provide a clear interpretation with respect to the original features because they are linear combinations of a large number of original features. This interpretation problem can be overcome by feature selection approaches that identifying the best subset of given features. In this study, we propose an unsupervised feature selection method based on the geometrical information of PCA loading vectors. Experimental results from a simulation study demonstrated the efficiency and usefulness of the proposed method.

Keywords

References

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999), Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences, 96(12), 6745-6750. https://doi.org/10.1073/pnas.96.12.6745
Bolshakova, N. and Azuaje, F. (2003), Cluster validation techniques for genome expression data, Signal processing, 83(4), 825-833. https://doi.org/10.1016/S0165-1684(02)00475-9
Borovecki, F., Lovrecic, L., Zhou, J., Jeong, H., Then, F., Rosas, H. D., and Krainc, D. (2005), Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease, Proceedings of the National Academy of Sciences of the United States of America, 102(31), 11023-11028. https://doi.org/10.1073/pnas.0504921102
Boutsidis, C., Mahoney, M. W., and Drineas, P. (2008), Unsupervised feature selection for principal components analysis, In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 61-69.
Dash, M., Choi, K., Scheuermann, P., and Liu, H. (2002), Feature selection for clustering-a filter solution. In Data Mining, 2002, ICDM 2003, Proceedings, 2002 IEEE International Conference, IEEE, 115-122
Chin, K., DeVries, S., Fridlyand, J., Spellman, P. T., Roydasgupta, R., Kuo, W. L., and Gray, J. W. (2006), Genomic and transcriptional aberrations linked to breast cancer pathophysiologies, Cancer cell, 10(6), 529-541. https://doi.org/10.1016/j.ccr.2006.10.009
Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., and Mazumder, A. (2006), Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative, The journal of molecular diagnostics, 8(1), 31-39. https://doi.org/10.2353/jmoldx.2006.050056
Gordon, G. J., Jensen, R. V., Hsiao, L. L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., and Bueno, R. (2002), Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma, Cancer research, 62(17), 4963-4967.
Gravier, E., Pierron, G., Vincent-Salomon, A., Gruel, N., Raynal, V., Savignoni, A., and Delattre, O. (2010), A prognostic DNA signature for T1T2 node-negative breast cancer patients, Genes, Chromosomes and Cancer, 49(12), 1125-1134. https://doi.org/10.1002/gcc.20820
Guo, Q., Wu, W., Massart, D. L., Boucon, C., and De Jong, S. (2002), Feature selection in principal component analysis of analytical data, Chemometrics and Intelligent Laboratory Systems, 61(1), 123-132. https://doi.org/10.1016/S0169-7439(01)00203-9
Guyon, I. and Elisseeff, A. (2003), An introduction to variable and feature selection, The Journal of Machine Learning Research, 3, 1157-1182.
Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., and Tibshirani, R. (2009), The elements of statistical learning, 2(1), New York : Springer.
Jolliffe, I. T. (1972), Discarding variables in a principal component analysis, I : Artificial data. Applied statistics, 160-173.
Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., and Meltzer, P. S. (2001), Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature medicine, 7(6), 673-679. https://doi.org/10.1038/89044
Kim, S. B. (2009), Feature Extraction/Selection in High-Dimensional Spectral Data, In J. Wang (Ed.), Encyclopedia of Data Warehousing and Mining, Second Edition, (863-869), Hershey, PA : Information Science Reference, doi:10.4018/978-1-60566-010-3.ch133.
Kim, S. B. and Rattakorn, P. (2011), Unsupervised feature selection using weighted principal components, Expert Systems with Applications, 38(5), 5704-5710. https://doi.org/10.1016/j.eswa.2010.10.063
Malhi, A. and Gao, R. X. (2004), PCA-based feature selection scheme for machine defect classification, Instrumentation and Measurement, IEEE Transactions, 53(6), 1517-1525. https://doi.org/10.1109/TIM.2004.834070
Mao, K. Z. (2005), Identifying critical variables of principal components for unsupervised feature selection, Systems, Man, and Cybernetics, Part B : Cybernetics, IEEE Transactions, 35(2), 339-344. https://doi.org/10.1109/TSMCB.2004.843269
Mitra, P., Murthy, C. A., and Pal, S. K. (2002), Unsupervised feature selection using feature similarity, IEEE transactions on pattern analysis and machine intelligence, 24(3), 301-312. https://doi.org/10.1109/34.990133
Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., and Golub, T. R. (2002), Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, 415(6870), 436-442. https://doi.org/10.1038/415436a
Roth, V. and Lange, T. (2003), Feature selection in clustering problems, In Advances in neural information processing systems.
Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C., and Golub, T. R. (2002), Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning, Nature medicine, 8(1), 68-74. https://doi.org/10.1038/nm0102-68
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., and Mesirov, J. P. (2005), Gene set enrichment analysis : a knowledge-based approach for interpreting genome-wide expression profiles, Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550. https://doi.org/10.1073/pnas.0506580102
Tian, E., Zhan, F., Walker, R., Rasmussen, E., Ma, Y., Barlogie, B., and Shaughnessy Jr, J. D. (2003), The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma, New England Journal of Medicine, 349(26), 2483-2494. https://doi.org/10.1056/NEJMoa030847
Wang, P. and Kim, J. (2014), Analysis of Chinese Provinces for Introduction of Reverse Mortgage Scheme Using Principal Component Analysis, Journal of the Korean Institute of Industrial Engineers, 40(2), 205-214. https://doi.org/10.7232/JKIIE.2014.40.2.205
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., and Nevins, J. R. (2001), Predicting the clinical status of human breast cancer by using gene expression profiles, Proceedings of the National Academy of Sciences, 98(20), 11462-11467. https://doi.org/10.1073/pnas.201162998
Widjaja, D., Varon, C., Dorado, A., Suykens, J. A., and Van Huffel, S. (2012), Application of Kernel Principal Component Analysis for Single-Lead-ECG-Derived Respiration, Biomedical Engineering, IEEE Transactions on, 59(4), 1169-1176.
Yu, L. and Liu, H. (2003), Feature selection for high-dimensional data : A fast correlation-based filter solution, In ICML, 3, 856-863.

Journal of Korean Institute of Industrial Engineers (대한산업공학회지)

Unsupervised Feature Selection Method Based on Principal Component Loading Vectors

주성분 분석 로딩 벡터 기반 비지도 변수 선택 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)