References
- T. Seth and V. Chaudhary, "Big data in finance," in Big Data: Algorithms, Analytics, and Applications. Boca Raton, FL: CRC Press, 2015, pp. 329-356.
- I. Taleb, R. Dssouli, and M. A. Serhani, "Big data pre-processing: a quality framework," in Proceedings of 2015 IEEE International Congress on Big Data, New York, NY, 2015, pp. 191-198.
- J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, "Feature selection: a data perspective," ACM Computing Surveys, vol. 50, no. 6, article no. 94, 2018.
- B. Arguello, "A survey of feature selection methods: algorithms and software," PhD dissertation, University of Texas at Austin, TX, 2015.
- A. Krause, "SFO: a toolbox for submodular function optimization," Journal of Machine Learning Research, vol. 11, pp. 1141-1144, 2010.
- M. A. Fattah, "A novel statistical feature selection approach for text categorization," Journal of Information Processing Systems, vol. 13, no. 5, pp. 1397-1409, 2017. https://doi.org/10.3745/JIPS.02.0076
- K. Kira and L. A. Rendell, "A practical approach to feature selection," in Machine Learning Proceedings 1992. St. Louis, MO: Elsevier, 1992, pp. 249-256.
- S. Fallahpour, E. N. Lakvan, and M. H. Zadeh, "Using an ensemble classifier based on sequential floating forward selection for financial distress prediction problem," Journal of Retailing and Consumer Services, vol. 34, pp. 159-167, 2017. https://doi.org/10.1016/j.jretconser.2016.10.002
- E. Wright, Q. Hao, K. Rasheed, and Y. Liu, "Feature selection of post-graduation income of college students in the United States," 2018; https://arxiv.org/abs/1803.06615.
- S. D. Kim, "A feature selection technique based on distributional differences," Journal of Informaion Processing System, vol. 2, no. 1, pp. 23-27, 2006. https://doi.org/10.3745/JIPS.2006.2.1.023
- S. Maldonado, J. Perez, and C. Bravo, "Cost-based feature selection for support vector machines: an application in credit scoring," European Journal of Operational Research, vol. 261, no. 2, pp. 656-665, 2017. https://doi.org/10.1016/j.ejor.2017.02.037
- A. Krause and V. Cevher, "Submodular dictionary selection for sparse representation," in Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 2010, pp. 567-574.
- Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and H. Greenspan, "Chest pathology identification using deep feature selection with non-medical training," Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 6, no. 3, pp. 259-263, 2018. https://doi.org/10.1080/21681163.2016.1138324
- R. Iyer, S. Jegelka, and J. Bilmes, "Fast semidifferential-based submodular function optimization," Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, 2013, pp. 855-863.
- K. Wei, Y. Liu, K. Kirchhoff, and J. Bilmes, "Using document summarization techniques for speech data subset selection," in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, 2013, pp. 721-726.
- A. Krause and C. Guestrin, "A note on the budgeted maximization of submodular functions," Carnegie Mellon University, Technical Report No. CMU-CALD-05-103, 2005.
- D. Kempe, J. Kleinberg, and E. Tardos, "Maximizing the spread of influence through a social network," in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2003, pp. 137-146.
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, "An analysis of approximations for maximizing submodular set functions - I," Mathematical Programming, vol. 14, no. 1, pp. 265-294, 1978. https://doi.org/10.1007/BF01588971
- M. A. Hall, "Correlation-based feature selection for machine learning," PhD dissertation, The University of Waikato, Hamilton, New Zealand, 1999.
- A. Pouramirarsalani, M. Khalilian, and A. Nikravanshalmani, "Fraud detection in E-banking by using the hybrid feature selection and evolutionary algorithms," International Journal of Computer Science and Network Security, vol. 17, no. 8, pp. 271-279, 2017.
- Y. Wang, W. Ke, and X. Tao, "A feature selection method for large-scale network traffic classification based on spark," Information, vol. 7, article no. 6, 2016.
- H. D. Gangurde, "Feature selection using clustering approach for big data," International Journal of Computer Applications, vol. 2014, no. 4, pp. 1-3, 2014.
- P. Sarlin, "Data and dimension reduction for visual financial performance analysis," Information Visualization, vol. 14, no. 2, pp. 148-167, 2015. https://doi.org/10.1177/1473871613504102
- H. S. Bhat and D. Zaelit, "Forecasting retained earnings of privately held companies with PCA and L1 regression," Applied Stochastic Models in Business and Industry, vol. 30, no. 3, pp. 271-293, 2014. https://doi.org/10.1002/asmb.1972
- I. Pisica, G. Taylor, and L. Lipan, "Feature selection filter for classification of power system operating states," Computers &Mathematics with Applications, vol. 66, no. 10, pp. 1795-1807, 2013. https://doi.org/10.1016/j.camwa.2013.05.033
- H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. New York, NY: Springer Science & Business Media, 2012.
- M. Dash, "Feature selection via set cover," in Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop, Newport Beach, CA, 1997, pp. 165-171.
- A. Arauzo-Azofra, J. M. Benitez, and J. L. Castro, "A feature set measure based on relief," in Proceedings of the 5th International Conference on Recent Advances in Soft Computing, Nottingham, UK, 2004, pp. 104-109.
- X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, et al., "MLlib: machine learning in Apache Spark," The Journal of Machine Learning Research, vol. 17, pp. 1-7, 2016.
- K. Noyes, "Five things you need to know about Hadoop v. Apache Spark," 2015; https://www.infoworld.com/article/3014440/five-things-you-need-to-know-about-hadoop-vapache- spark.html.
- P. Paakkonen and D. Pakkala, "Reference architecture and classification of technologies, products and services for big data systems," Big Data Research, vol. 2, no. 4, pp. 166-186, 2015. https://doi.org/10.1016/j.bdr.2015.01.001
- A. Abdiansah and R. Wardoyo, "Time complexity analysis of support vector machines (SVM) in LibSVM," International Journal Computer and Application, vol. 128, no. 3, pp. 28-34, 2015. https://doi.org/10.5120/ijca2015906480
- J. Giersdorf and M. Conzelmann, "Analysis of feature-selection for LASSO regression models," 2017; https://www.ni.tu-berlin.de/fileadmin/fg215/teaching/nnproject/Lasso_Project.pdf.
- V. Fonti and E. Belitser, "Feature selection using lasso," VU Amsterdam Research Paper in Business Analytics, 2017; https://beta.vu.nl/nl/Images/werkstuk-fonti_tcm235-836234.pdf