Browse > Article
http://dx.doi.org/10.5012/bkcs.2012.33.5.1527

A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models  

Ghasemi, Jahan B. (Chemistry Department, Faculty of Sciences, K.N. Toosi University of Technology)
Zolfonoun, Ehsan (Chemistry Department, Faculty of Sciences, K.N. Toosi University of Technology)
Publication Information
Abstract
Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.
Keywords
Mutual information; Variable selection; Quantitative structure-property relationship;
Citations & Related Records
연도 인용수 순위
  • Reference
1 FAO, Agriculture Towards 2010; C 93/24 Document of 27th Session of FAO Conference: Rome, 1993.
2 Cai, C. P.; Liang, M.; Wen, R. R. Chromatographia 1995, 40, 417.   DOI   ScienceOn
3 Yan, D.; Jiang, X.; Xu, S.; Wang, L.; Bian, Y.; Yu, G. Chemosphere 2008, 71, 1809.   DOI   ScienceOn
4 Tomizawa, L. Environ. Qual. Saf. 1975, 4, 117.
5 Vogue, P. A.; Kerle, E. A.; Jenkins, J. J. National Pesticide Information Center; OSU Extension Pesticide Properties Database, 1994.
6 Forst, L.; Conroy, L. M. In Rafson, H. J., Ed.; Odor and VOC Control Handbook; McGraw-Hill: New York, 1998; p 3.1.
7 Calvert, J. G. Chemistry for the 21st Century. The Chemistry of the Atmosphere: Its Impact on Global Change; Blackwell Scientific Publications: Oxford, 1994.
8 EPA Method 8260C: Volatile organic compounds by Gas chromatography- mass/spectrometry (GC/MS), 2006.
9 Sprunger, L. M.; Gibbs, J.; Acree, W. E.; Abraham, M. H. QSAR Comb. Sci. 2009, 28, 72.   DOI   ScienceOn
10 Astakhov, S. A.; Grassberger, P.; Kraskov, A.; Stogbauer, H. MILCA algorithm, available at http://www.klab.caltech.edu/kraskov/MILCA/ index.html.
11 Amiri, F.; Rezaei Yousefi, M.; Lucas, C.; Shakery, A.; Yazdani, N. J. Netw. Comput. Appl. 2011, 34, 1184.   DOI   ScienceOn
12 Liu, H.; Sun, J.; Liu, L.; Zhang, H. Pattern Recogn. 2009, 42, 1330.   DOI   ScienceOn
13 Huang, D.; Chow, T. W. S. Neurocomputing 2005, 63, 325.   DOI
14 Rossi, F.; Lendasse, A.; François, D.; Wertz, V.; Verleysen, M. Chemom. Intell. Lab. Syst. 2006, 80, 215.   DOI   ScienceOn
15 Durand, A.; Devos, O.; Ruckebusch, C.; Huvenne, J. P. Anal. Chim. Acta 2007, 595, 72.   DOI
16 Caetano, S.; Krier, C.; Verleysen, M.; Vander Heyden, Y. Anal. Chim. Acta 2007, 602, 37.   DOI   ScienceOn
17 Eckschlager, K.; Danzer, K. Information Theory in Analytical Chemistry; John Wiley and Sons: Wiley Interscience, 1994.
18 Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer-Verlag: New York, 2001.
19 Cover, T. M.; Thomas, J. A. Elements of Information Theory; Wiley: New Jersey, 2005.
20 Kojadinovic, I. Comput. Stat. Data Anal. 2005, 49, 1205.   DOI   ScienceOn
21 Kraskov, A.; Stogbauer, H.; Grassberger, P. Phys. Rev. E 2004, 69, 066138.   DOI
22 Harald, S.; Alexander, K.; Sergey, A. A.; Peter, G. Phys. Rev. E 2004, 70, 066123.   DOI
23 Despagne, F.; Massart, D. L. Analyst 1998, 123, 157.   DOI   ScienceOn
24 Perez-Marin, D.; Garrido-Varo, A.; Guerrero, J. E. Talanta 2007, 72, 28.   DOI   ScienceOn
25 Park, J.; Sandberg, I. W. Neural Comput. 1993, 5, 305.
26 Akhlaghi, Y.; Kompany-Zareh, M. J. Chemom. 2006, 20, 1.   DOI   ScienceOn
27 Cortes, C.; Vapnik, V. Mach. Learn. 1995, 20, 273.
28 Zvinavashe, E.; Du, T.; Griff, T.; Van den berg, H. H. J.; Soffers, J. Vervoort, A. E. M. F.; Murk, A. J.; Rietjens, I. M. C. M. Chemosphere 2009, 75, 1531.   DOI   ScienceOn
29 Livingstone, D. J. J. Chem. Inf. Comput. Sci. 2000, 40, 195.   DOI
30 Ghasemi, J.; Saaidpour, S. Anal. Chim. Acta 2007, 604, 99.   DOI   ScienceOn
31 Chen, K. X.; Li, Z. G.; Xie, H. Y.; Gao, J. R.; Zou, J. W. Eur. J. Med. Chem. 2009, 44, 4367.   DOI   ScienceOn
32 Ghasemi, J.; Abdolmaleki, A.; Mandoumi, N. J. Hazard. Mater. 2009, 161, 74.   DOI   ScienceOn
33 Mercader, A. G.; Duchowicz, P. R.; Fernandez, F. M.; Castro, E. A. J. Chem. Inf. Model 2010, 50, 1542.   DOI   ScienceOn
34 Shamsipur, M.; Zare-Shahabadi, V.; Hemmateenejad, B.; Akhond, M. Anal. Chim. Acta 2009, 646, 39.   DOI
35 Jouan-Rimbaud, D.; Walczack, B.; Massart, D.; Last, I.; Prebble, K. Anal. Chim. Acta 1995, 304, 285.   DOI   ScienceOn
36 Guptaa, V. K.; Khanic, H.; Ahmadi-Roudid, B.; Mirakhorlic, S.; Fereyduni, E.; Agarwale, S. Talanta 2011, 83, 1014.   DOI   ScienceOn
37 Ghasemi, J.; Asadpour, S.; Abdolmaleki, A. Anal. Chim. Acta 2007, 588, 200.   DOI   ScienceOn
38 Deswal, S.; Roy, N. Eur. J. Med. Chem. 2006, 41, 1339.   DOI   ScienceOn
39 Xia, B.; Ma, W.; Zheng, B.; Zhang, X.; Fan, B. Eur. J. Med. Chem. 2008, 43, 1489.   DOI   ScienceOn
40 Blank, T. B.; Brown, S. D. Anal. Chem. 1993, 65, 3081.   DOI   ScienceOn
41 Vapnik, V. Statistical Learning Theory; John Wiley: New York, 1998.
42 Pourbasheer, E.; Riahi, S.; Ganjali, M. R.; Norouzi, P. Eur. J. Med. Chem. 2010, 45, 1087.   DOI   ScienceOn
43 Hemmateenejad, B.; Shamsipur, M.; Miri, R.; Elyasi, M.; Foroghini, F.; Sharghi, H. Anal. Chim. Acta 2008, 610, 25.   DOI   ScienceOn
44 Benoudjita, N.; François, D.; Meurensc, M.; Verleysen, M. Chemom. Intell. Lab. Syst. 2004, 74, 243.   DOI   ScienceOn