DOI QR코드

DOI QR Code

Recent deep learning methods for tabular data

  • Yejin Hwang (Department of Statistics, Ewha Womans University) ;
  • Jongwoo Song (Department of Statistics, Ewha Womans University)
  • Received : 2022.10.06
  • Accepted : 2022.12.19
  • Published : 2023.03.31

Abstract

Deep learning has made great strides in the field of unstructured data such as text, images, and audio. However, in the case of tabular data analysis, machine learning algorithms such as ensemble methods are still better than deep learning. To keep up with the performance of machine learning algorithms with good predictive power, several deep learning methods for tabular data have been proposed recently. In this paper, we review the latest deep learning models for tabular data and compare the performances of these models using several datasets. In addition, we also compare the latest boosting methods to these deep learning methods and suggest the guidelines to the users, who analyze tabular datasets. In regression, machine learning methods are better than deep learning methods. But for the classification problems, deep learning methods perform better than the machine learning methods in some cases.

Keywords

References

  1. Arik SO and Pfister T (2020). Tabnet: Attentive interpretable tabular learning, Proceedings of the AAAI Conference on Artificial Intelligence, 35, 6679-6687.
  2. Bansal S (2018). Historical data science trends on kaggle, Available from: https://www.kaggle.com/code/shivamb/data-science-trends-on-kaggle/notebook
  3. Badirli S, Liu X, Xing Z, Bhowmik A, Doan K, and Keerthi S (2020). Gradient Boosting Neural Networks: GrowNet, Available from: arXiv:2002.07971v2
  4. Chen T and Guestrin C (2016). XGBoost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, 785-794.
  5. Chen T and Guestrin C (2016). dmlc/xgboost/demo, Available from: https://github.com/dmlc/xgboost/tree/master/demo
  6. Dgomonov (2019). New York City Airbnb Open Data, Available from: https://www.kaggle.com/data sets/dgomonov/new-york-city-airbnb-open-data
  7. Friedman JH (2001). Greedy function approximation: A gradient boosting machine, The Annals of Statistics, 29, 1189-1232. https://doi.org/10.1214/aos/1013203450
  8. Hadi Fanaee-T and Gama J (2013). Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence, 2, 1-15. https://doi.org/10.1007/s13748-012-0035-5
  9. Hofmann H (1994). Statlog (German Credit Data) Data Set [https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29]. Irvine, CA: University of California, School of Information and Computer Science.
  10. Jana S (2020). Airlines customer satisfaction, Available from: https://www.kaggle.com/datasets/sjleshrac/airlines-customer-satisfaction?select=InvisticoAirline.csv
  11. Krizhevsky A, Sutskever I, and Hinton GE (2012). ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 2, 1106-1114.
  12. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, and Liu T (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree, NIPS.
  13. Kohavi R (1994). Bottom-up induction of oblivious read-once decision graphs: Strengths and limitations, AAAI
  14. Koklu M and Ozkan IA (2020). Multiclass classification of dry beans using computer vision and machine learning techniques, Computers and Electronics in Agriculture, 174, 105507.
  15. Liang X, Zou T, Guo B, Li S, Zhang H, Zhang S, Huang H, and Chen SX (2015). Assessing Beijing's PM2.5 pollution: Severity, weather impact, APEC and winter heating, Proceedings: Mathematical, Physical and Engineering Sciences, 471, 1-20.
  16. Lou Y and Obukhov M (2017). BDT: Gradient boosted decision tables for high accuracy and scoring efficiency, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1893-1901.
  17. Microsoft (2020). Microsoft/LightGBM/examples, Available from: https://github.com/microsoft/LightGBM/tree/master/examples
  18. Martins A and Astudillo R (2016). From softmax to sparsemax: A sparse model of attention and multi-label classification, Proceedings of the 33rd International Conference on International Conference on Machine Learning, 48, 1614-1623.
  19. Pace RK and Barry R (1997). Sparse spatial autoregressions, Statistics & Probability Letters, 33, 291-297. https://doi.org/10.1016/S0167-7152(96)00140-X
  20. Peters B, Niculae V, and Matrins A (2019). Sparse sequence-to-sequence models, In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 1504-1519.
  21. Pasedko S (2019). Belarus Used Cars Prices, Available from: https://www.kaggle.com/datasets/sla vapasedko/belarus-used-cars-prices
  22. Popov S, Morozov S, and Babenko A (2019). Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data, Available from: arXiv:1909.06312v2
  23. Prokhorenkova L, Gusev G, Vorobev A, Dorogush A, and Gulin A (2017). CatBoost: unbiased boosting with categorical features, NeurIPS.
  24. Sakar CO, Polat SO, Katircioglu M, and Yomi K (2018). Real-time prediction of online shoppers' purchasing intention using multilayer perceptron and LSTM recurrent neural networks, Neural Computing & Applications, 31, 6893-6908.
  25. Sharma A (2017). Mobile price classification, Available from: https://www.kaggle.com/datasets/iab hishekofficial/mobile-price-classification
  26. Sherstinsky A (2021). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network, Physica D: Nonlinear Phenomena, 404, 132306.
  27. Song T, Shi C, Xiao Z, Duan Z, Xu Y, Zhang M, and Tang J (2019). AutoInt: Automatic feature interaction learning via self-attentive neural networks, Proceedings of the 28th ACM International Conference on Information and Knowledge Management(CIKM), 1161-1170.
  28. Somepalli G, Goldblum M, Schwarzschild A, Bruss C, and Goldstein T (2021). SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training, Available from: arXiv:2106.01342
  29. Vaswani A, Shazzer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, and Polosukhin I (2017). Attention is all you need, NIPS, 2017.
  30. Yun S, Han D, Oh S, Chun S, Choe J, and Yoo Y (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features, In Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 6022-6031.
  31. Zhang H, Cisse M, Dauphin Y, and Lopez-Paz D (2017). Mixup: Beyond empirical risk minimization, International Conference on Learning Representations(ICLR).