[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2022.02.014

Two Stage Deep Learning Based Stacked Ensemble Model for Web Application Security

Sevri, Mehmet (Informatics Institute, Gazi University)
Karacan, Hacer (Computer Engineering Department, Faculty of Engineering, Gazi University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.16, no.2, 2022 , pp. 632-657 More about this Journal

Abstract

Detecting web attacks is a major challenge, and it is observed that the use of simple models leads to low sensitivity or high false positive problems. In this study, we aim to develop a robust two-stage deep learning based stacked ensemble web application firewall. Normal and abnormal classification is carried out in the first stage of the proposed WAF model. The classification process of the types of abnormal traffics is postponed to the second stage and carried out using an integrated stacked ensemble model. By this way, clients' requests can be served without time delay, and attack types can be detected with high sensitivity. In addition to the high accuracy of the proposed model, by using the statistical similarity and diversity analyses in the study, high generalization for the ensemble model is achieved. Within the study, a comprehensive, up-to-date, and robust multi-class web anomaly dataset named GAZI-HTTP is created in accordance with the real-world situations. The performance of the proposed WAF model is compared to state-of-the-art deep learning models and previous studies using the benchmark dataset. The proposed two-stage model achieved multi-class detection rates of 97.43% and 94.77% for GAZI-HTTP and ECML-PKDD, respectively.

Keywords

Anomaly detection; deep learning; ensemble learning; web application firewall; web security;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	ECML/PKDD, "Analyzing Web Traffic ECML/PKDD 2007 Discovery Challenge," in Proc. of 18nd Int. Conf. ECML/PKDD, Warsaw, Poland, Sep., 2007.
2	D. Bienstock, M. Gonzalo, and Sebastian Pokutta, "Principled deep neural network training through linear programming," arXiv preprint arXiv:1810.03218, 2018.
3	C. Zhang and Y. Ma, (ed.), Ensemble machine learning: methods and applications, Springer Science & Business Media, 2012.
4	Y. Luo, S. Cheng, C. Liu and F. Jiang, "PU Learning in Payload-based Web Anomaly Detection," in Proc. of 3rd Int. Conf. on Security of Smart Cities, Industrial Control System and Communications (SSIC), Shanghai, China, pp. 1-5, Oct., 2018.
5	Catak FO, "Two-layer malicious network flow detection system with sparse linear model based feature selection," Journal of the National Science Foundation of Sri Lanka, vol. 46, no. 4, pp. 601-612, 2018. DOI
6	T.G. Dietterich, "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization," Machine Learning, vol. 40, pp. 139-157, Aug. 2000. DOI
7	A. Al-Alyan and S. Al-Ahmadi, "Robust URL Phishing Detection Based on Deep Learning," KSII Transactions on Internet and Information Systems, vol. 14, no. 7, pp. 2752-2768, 2020. DOI
8	J. Surowiecki, "The wisdom of crowds," Anchor, Aug. 2005.
9	K. Guzel and G. Bilgin, "Classification of Nuclei in Colon Cancer Images using Ensemble of Deep Learned Features," in Proc. of Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, pp. 1-4, Oct., 2019.
10	D. H. Wolpert, "Stacked generalization," Neural networks, vol. 5, no. 2, pp. 241-259, 1992. DOI
11	J. Schmidhuber, "Deep learning in neural networks: An overview," Neural networks, vol. 61, pp. 85-117, 2015. DOI
12	Y. Bi, "The impact of diversity on the accuracy of evidential classifier ensembles," Int. J. of Approximate Reasoning, vol. 53, no. 4, pp. 584-607, 2012. DOI
13	S.S. Choi, S.H. Cha, and C.C. Tappert, "A survey of binary similarity and distance measures," J. of Systemics, Cybernetics and Informatics, vol. 8, no. 1, pp. 43-48, 2010.
14	C. Kruegel, and G. Vigna, "Anomaly detection of web-based attacks" in Proc. of 10th ACM Conf. on Computer and Communications Security, pp. 251-261, Oct. 2003.
15	G. Serpen and G. Zhenning, "Complexity analysis of multilayer perceptron neural network embedded into a wireless sensor network," Procedia Computer Science, vol. 36, pp. 192-197, 2014. DOI
16	C. Hwang, D. Kim and T. Lee, "Semi-supervised based Unknown Attack Detection in EDR Environment," KSII Transactions on Internet and Information Systems, vol. 14, no. 12, pp. 4909-4926, 2020. DOI
17	Owasp, "Top 10 Application Security Risks," 2017. [Online]. Available: https://owasp.org/wwwproject-top-ten/2017/
18	M. Sevri, and H. Karacan, "Deep learning based web application security," in Proc. of 2nd Int. Conf. on Advanced Technologies, in Proc. Computer Engineering and Science (ICATCES), Antalya, Turkey, pp. 349-354, Apr. 2019.
19	M. Exbrayat, "ECML/PKDD challenge: analyzing web traffic a boundaries signature approach," in Proc. of 18nd Int. Conf. ECML/PKDD, Warsaw, Poland, pp. 53-64, Sep., 2007.
20	B. Gallagher, and T. Eliassi-Rad, "Classification of HTTP attacks: a study on the ECML/PKDD 2007 discovery challenge," 2009.
21	H.T. Nguyen, C. Torrano-Gimenez, G. Alvarez, K. Franke, and S. Petrovic, "Enhancing the effectiveness of web application firewalls by generic feature selection," Logic Journal of IGPL, vol. 21, no. 4, pp. 560-570, Aug. 2013. DOI
22	V. Odumuyiwa, and A. Chibueze, "Automatic Detection of HTTP Injection Attacks using Convolutional Neural Network and Deep Neural Network," J. of Cyber Security and Mobility, vol. 9, No. 4, pp. 489-514, 2020.
23	A.M. Vartouni, M. Teshnehlab, and S.S. Kashi, "Leveraging deep neural networks for anomaly-based web application firewall," IET Information Security, vol. 13, no. 4, pp. 352-361, 2019. DOI
24	EdgeScan, "Vulnerability Statistics Report," pp. 4-17, 2019. [Online]. Available: https://www.edgescan.com/wp-content/uploads/2019/02/edgescan-Vulnerability-Stats-Report2019.pdf
25	L. Kagal, T. Finin, and A. Joshi, "A policy based approach to security for the semantic web," in Proc. of 2nd Int. Semantic Web Conf., Sanibel Island, FL, USA, pp. 402-418, Oct., 2003.
26	H.T. Nguyen, and K. Franke, "Adaptive Intrusion Detection System via online machine learning," in Proc. of 12th Int. Conf. on Hybrid Intelligent Systems (HIS), pp. 271-277, Dec. 2012.
27	B. A. Tama, L. Nkenyereye, S. R. Islam, and K.-S. Kwak, "An enhanced anomaly detection in web traffic using a stack of classifier ensemble," IEEE Access, vol. 8, pp. 24120-24134, 2020. DOI
28	C. Raissi, J. Brissaud, G. Dray, P. Poncelet, M. Roche, and M. Teisseire, "Web analyzing traffic challenge: description and results," in Proc. of 18nd Int. Conf. ECML/PKDD, Warsaw, Poland, pp. 47-52, Sep., 2007.
29	A. Tekerek, C. Gemci, and O.F. Bay, "Design and implementation of a web-based intrusion prevention system: a new hybrid model," J. of the Faculty of Engineering and Architecture of Gazi University, vol. 31, no. 3, pp. 646-655. 2016.
30	L. Liu, P. Wang, J. Lin, and L. Liu, "Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning," IEEE Access, vol. 9, pp. 7550-7563, 2020.
31	K. Pachopoulos, D. Valsamou, D. Mavroeidis, and M. Vazirgiannis, "Feature extraction from web traffic data for the application of data mining algorithms in attack identification," in Proc. of 18nd Int. Conf. ECML/PKDD, Warsaw, Poland, pp. 65-70, Sep., 2007.
32	Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015. DOI