[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.22937/IJCSNS.2022.22.2.34

Accuracy of Phishing Websites Detection Algorithms by Using Three Ranking Techniques

Mohammed, Badiea Abdulkarem (University of Ha'il, Computer Science and Engineering)
Al-Mekhlafi, Zeyad Ghaleb (University of Ha'il, Computer Science and Engineering)

Publication Information

International Journal of Computer Science & Network Security / v.22, no.2, 2022 , pp. 272-282 More about this Journal

Abstract

Between 2014 and 2019, the US lost more than 2.1 billion USD to phishing attacks, according to the FBI's Internet Crime Complaint Center, and COVID-19 scam complaints totaled more than 1,200. Phishing attacks reflect these awful effects. Phishing websites (PWs) detection appear in the literature. Previous methods included maintaining a centralized blacklist that is manually updated, but newly created pseudonyms cannot be detected. Several recent studies utilized supervised machine learning (SML) algorithms and schemes to manipulate the PWs detection problem. URL extraction-based algorithms and schemes. These studies demonstrate that some classification algorithms are more effective on different data sets. However, for the phishing site detection problem, no widely known classifier has been developed. This study is aimed at identifying the features and schemes of SML that work best in the face of PWs across all publicly available phishing data sets. The Scikit Learn library has eight widely used classification algorithms configured for assessment on the public phishing datasets. Eight was tested. Later, classification algorithms were used to measure accuracy on three different datasets for statistically significant differences, along with the Welch t-test. Assemblies and neural networks outclass classical algorithms in this study. On three publicly accessible phishing datasets, eight traditional SML algorithms were evaluated, and the results were calculated in terms of classification accuracy and classifier ranking as shown in tables 4 and 8. Eventually, on severely unbalanced datasets, classifiers that obtained higher than 99.0 percent classification accuracy. Finally, the results show that this could also be adapted and outperforms conventional techniques with good precision.

Keywords

Phishing websites; Supervised machine learning; Scikit Learn library; Deep learning; Classifiers;

Citations & Related Records

Reference

1	W. Y. Loh, "Classification and regression trees," WIREs Data Mining and Knowledge Discovery, vol. 1, pp. 14-23, 2011. DOI
2	B. Widrow and M. Lehr, "30 years of adaptive neural networks: perceptron, madaline, and backpropagation," Proceedings of the IEEE, vol. 78, pp.1415-1442, 1990. DOI
3	D. D. Lewis, "Naive (Bayes) at forty: The independence assumption in information retrieval," in Proc. ECML-98, Chemnitz, DE, pp. 4-15, 1998.
4	B. Scholkopf, A. J. Smola, F. Bach, "Learning with kernels: support vector machines, regularization, optimization, and beyond," London, England: MIT press, 2002.
5	M. Al-Sarem, F. Saeed, Z. G. Al-Mekhlafi, B. A. Mohammed, T. Al-Hadhrami et al., "An optimized stacking ensemble model for phishing websites detection," Electronics, vol. 10, no. 11, pp. 1-18, 2021.
6	P. Zhao, S. C and Hoi, "Cost-sensitive online active learning with application to malicious URL detection," in Proc. KDD13, New York, NY, USA, pp. 919-927, 2013.
7	D. R. Patil and J. B. Patil, "Malicious URLs Detection Using Decision Tree Classifiers and Majority Voting Technique," Cybernetics and Information Technologies, vol.18, no.1, pp.11-29, 2018. DOI
8	APWG, "Phishing activity trends report," [Online]. http://www.antiphishing.org/APWG_PhishingActivity_Report_Jul_05.pdf, 2005. [Accessed in 28 Jun 2021].
9	K. L. Chiew, C. L. Tan, K. Wong, K. S. Yong and W. K. Tiong, "A new hybrid ensemble feature selection framework for machine learning-based phishing detection system,". Information Sciences, vol. 484, pp. 153-166, 2019. DOI
10	S. Marchal, K. Saari, N. Singh and N. Asokan, "Know your phish: Novel techniques for detecting phishing sites and their targets," in Proc. ICDCS, Nara, Japan , pp. 323-333, 2016.
11	K. Thomas, C. Grier, J. Ma, V Paxson and D. Song, "Design and evaluation of a real-time URL spam filtering service," in Proc. IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 447-462, 2011.
12	S. Marchal, G. Armano, T. Grondahl, K. Saari, N. Singh et al., "Off-the-hook: An efficient and usable client-side phishing prevention application," IEEE Transactions on Computers, vol. 66, pp. 1717-1733, 2017. DOI
13	C. Whittaker, B. Ryner, and M. Nazif, "Large-scale automatic classification of phishing pages," in Proc. (NDSS), San Diego, CA, pp. 1-14, 2010.
14	R. Verma and K. Dyer, "On the character of phishing URLs: Accurate and robust statistical learning classifiers," in Proc. CODASPY'15, New York, NY, USA, pp. 111-122, 2015.
15	M. Adebowale, K. Lwin, E. Sanchez and M. Hossain, "Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text," Expert Systems with Applications, vol. 115, pp. 300-313, 2019. DOI
16	F. Vanhoenshoven, G. Napoles, R. Falcon, K. Vanhoof and M. Koppen, "Detecting malicious URLs using machine learning techniques," in Proc. SSCI, Athens, Greece, pp. 1-8, 2016.
17	J. Ma, L. K. Saul, S. Savage and G. M. Voelker, "Beyond blacklists: Learning to detect malicious web sites from suspicious URLs," in Proc. KDD'09, New York, NY, USA, pp. 1245-1254, 2009.
18	W. Zhang, Q. Jiang, L. Chen and C. Li, "Two-stage ELM for phishing web pages detection using hybrid features," World Wide Web, vol. 20, pp.797-813, 2017. DOI
19	A. Cutler, D. R. Cutler and J. R. Stevens, "Random forests," in Ensemble machine learning; Boston, MA: Springer, pp. 157-175, 2012.
20	J. H. Friedman, "Stochastic gradient boosting," Computational Statistics and Data Analysis, vol. 38, pp. 367-378, 2002. DOI
21	R. Verma and A. Das, "What's in a URL: Fast feature extraction and malicious URL detection," in Proc. IWSPA '17, New York, NY, USA, pp. 55-63, 2017.
22	J. Saxe and K. Berlin, ''eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys,'' 2017. [Online]. Available: https://arxiv.org/abs/1702.08568. [Accessed in 28 Jun 2021].
23	S. Selvaganapathy, M. Nivaashini and H. Natarajan, "Deep belief network based detection and categorization of malicious URLs," Information Security Journal: A Global Perspective, vol. 27, no. 3, pp. 145-161, 2018. DOI
24	Z. G. Al-Mekhlafi, B. A. Mohammed, M. Al-Sarem, F. Saeed, T. Al-Hadhrami et al. "Phishing websites detection by using optimized stacking ensemble model," Computer Systems Science and Engineering, Accepted on Jun 2021, pp.1-17, 2021. doi: 10.32604/csse.2021.020414. DOI
25	H. Shirazi, B. Bezawada, and I. Ray, "Know thy doma1n name: Unbiased phishing detection using domain name based features," in Proc. SACMAT '18, New York, NY, USA, pp. 69-75, 2018.
26	S. Marchal, J. Francois, R. State and T. Engel, "PhishStorm: Detecting phishing with streaming analytics," IEEE Transactions on Network and Service Management, vol. 11, pp.458-471, 2014. DOI
27	A. Vazhayil, R. Vinayakumar and K. Soman, "Comparative study of the detection of malicious URLs using shallow and deep Networks," in Proc. ICCCNT, Bengaluru, India, pp. 1-6, 2018.
28	M. Karabatak and T. Mustafa, "Performance comparison of classifiers on reduced phishing website dataset," in Proc. ISDFS, Antalya, Turkey, pp. 1-5, 2018.
29	T.C. Chen, T. Stepan, S. Dick and J. Miller, "An anti-phishing system employing diffused information," ACM Transactions on Information and System Security. vol.16, no 4, pp. 1-31, 2014.
30	S. S. Smith, "2017 Internet crime report," Federal Bureau of Investigation, Washington, DC. [Online]. https://www.ic3.gov/Media/PDF/AnnualReport/2017_IC3Report.pdf, 2018. [Accessed in 28 Jun 2021].
31	P. Vaitkevicius and V. Marcinkevicius, "Comparison of classification algorithms for detection of phishing websites." Informatica, vol. 31, pp. 143-160, 2020.
32	F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
33	B. B. Gupta, N. A. Arachchilage and K. E. Psannis, "Defending against phishing attacks: Taxonomy of methods, current issues and future directions," Telecommunication Systems, vol. 67, no. 2, pp. 247-267, 2018. DOI
34	D. Sahoo, C. Liu and S. C. H. Hoi, "Malicious URL detection using machine learning: A survey," arXiv:cs.LG/1701.07179, vol. 1, no.1 pp. 1-37, 2019.
35	G. Xiang, J. Hong, C. P. Rose and L. Cranor, "CANTINA+: A feature-rich machine learning framework for detecting phishing web sites," ACM Transactions on Information and System Security, vol. 14, no. 21, pp. 1-28, 2011.
36	L. Breiman, J. Friedman, C. J. Stone and R. A. Olshen, "Classification and regression trees; CRC press," Boca Raton, Florida: CRC Press, 1984.
37	A. K. Jain and B. B. Gupta, "A machine learning based approach for phishing detection using hyperlinks information," Journal of Ambient Intelligence and Humanized Computing vol. 10, pp. 2015-2028, 2019. DOI
38	J. Zhao, N. Wang, Q. Ma and Z. Cheng, "Classifying malicious URLs using gated recurrent neural networks," in Innovative Mobile and Internet Services in Ubiquitous Computing, Cham: Springer International Publishing, pp. 385-394, 2019.
39	G. W. Snedecor and W. G. Cochran, "Statistical methods," 8th Ed., vol. 54, Ames, IO, USA: Iowa State Univ. Press, pp. 71-82, 1989.
40	S. S. Shapiro and M. B. Wilk, "An Analysis of variance test for normality (complete samples)," Biometrika, vol. 52, No. ¾, pp. 591-611, 1995. DOI
41	R. Wang, "AdaBoost for feature selection, classification and its relation with SVM, a review," Physics Procedia, vol. 25, pp. 800-807, 2012. DOI
42	C. Seifert, I. Welch and P. Komisarczuk, "Identification of malicious web pages with static heuristics," in Proc. 2008 Australasian Telecommunication Networks and Applications Conference, Adelaide, SA, Australia, pp. 91-96, 2008.
43	A. K. Jain and B. B. Gupta, "Towards detection of phishing websites on client-side using machine learning based approach," Telecommunication Systems, vol. 68, no. 1, pp. 687-700, 2018. DOI
44	B. Cui, S. He, X. Yao and P. Shi, "Malicious URL detection with feature extraction based on machine learning," International Journal of High Performance Computing and Networking, vol. 12, pp. 166-178, 2018. DOI