URL Phishing Detection System Utilizing Catboost Machine Learning Approach

Fang, Lim Chian;Ayop, Zakiah;Anawar, Syarulnaziah;Othman, Nur Fadzilah;Harum, Norharyati;Abdullah, Raihana Syahirah;

doi:10.22937/IJCSNS.2021.21.9.39

International Journal of Computer Science & Network Security

Volume 21 Issue 9
/
Pages.297-302
/
2021
/
1738-7906(pISSN)

International Journal of Computer Science & Network Security (국제컴퓨터통신보호논문지학회)

DOI QR Code

URL Phishing Detection System Utilizing Catboost Machine Learning Approach

Fang, Lim Chian (Information Security Forensics and Computer Networking (INSFORNET), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka (UTeM)) ;
Ayop, Zakiah (Information Security Forensics and Computer Networking (INSFORNET), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka (UTeM)) ;
Anawar, Syarulnaziah (Information Security Forensics and Computer Networking (INSFORNET), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka (UTeM)) ;
Othman, Nur Fadzilah (Information Security Forensics and Computer Networking (INSFORNET), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka (UTeM)) ;
Harum, Norharyati (Information Security Forensics and Computer Networking (INSFORNET), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka (UTeM)) ;
Abdullah, Raihana Syahirah (Information Security Forensics and Computer Networking (INSFORNET), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka (UTeM))

Received : 2021.09.05
Published : 2021.09.30

https://doi.org/10.22937/IJCSNS.2021.21.9.39 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The development of various phishing websites enables hackers to access confidential personal or financial data, thus, decreasing the trust in e-business. This paper compared the detection techniques utilizing URL-based features. To analyze and compare the performance of supervised machine learning classifiers, the machine learning classifiers were trained by using more than 11,005 phishing and legitimate URLs. 30 features were extracted from the URLs to detect a phishing or legitimate URL. Logistic Regression, Random Forest, and CatBoost classifiers were then analyzed and their performances were evaluated. The results yielded that CatBoost was much better classifier than Random Forest and Logistic Regression with up to 96% of detection accuracy.

Keywords

Acknowledgement

This publication has been supported by Center of Research and Innovation Management (CRIM), Universiti Teknikal Malysia Melaka (UTeM). The authors would like to thank UTeM and INSFORNET research group members for their supports.

References

A.-P. W. Group, "Phishing Activity Trends Report," 2021.
A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, "A comprehensive survey of AI-enabled phishing attacks detection techniques," Telecommun. Syst., pp. 1-16, 2020.
W. Ali, "Phishing website detection based on supervised machine learning with wrapper features selection," Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 9, pp. 72-78, 2017.
A. K. Jain and B. B. Gupta, "PHISH-SAFE: URL features-based phishing detection system using machine learning," in Cyber Security, Springer, 2018, pp. 467-474.
R. Mahajan and I. Siddavatam, "Phishing website detection using machine learning algorithms," Int. J. Comput. Appl., vol. 181, no. 23, pp. 45-47, 2018. https://doi.org/10.5120/ijca2018918026
V. Patil, P. Thakkar, C. Shah, T. Bhat, and S. P. Godse, "Detection and prevention of phishing websites using machine learning approach," in 2018 Fourth international conference on computing communication control and automation (ICCUBEA), 2018, pp. 1-5.
J. Mao et al., "Phishing page detection via learning classifiers from page layout feature," EURASIP J. Wirel. Commun. Netw., vol. 2019, no. 1, pp. 1-14, 2019. https://doi.org/10.1186/s13638-018-1318-8
S. Masurkar and V. Dalal, "ENHANCED MODEL FOR DETECTION OF PHISHING URL USING MACHINE LEARNING."
J. Hancock and T. M. Khoshgoftaar, "Performance of catboost and xgboost in medicare fraud detection," in 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020, pp. 572-579.
"UCI Machine Learning Repository: Phishing Websites Data Set." [Online]. Available: https://archive.ics.uci.edu/ml/datasets/phishing+websites. [Accessed: 13-Sep-2021].
L. Khairunnahar, M. A. Hasib, R. H. Bin Rezanur, M. R. Islam, and M. K. Hosain, "Classification of malignant and benign tissue with logistic regression," Informatics Med. Unlocked, vol. 16, p. 100189, 2019. https://doi.org/10.1016/j.imu.2019.100189
Y. Liu and H. Wu, "Prediction of Road Traffic Congestion Based on Random Forest," in 2017 10th International Symposium on Computational Intelligence and Design (ISCID), 2017, vol. 2, pp. 361-364, doi: 10.1109/ISCID.2017.216.
S. Ben Jabeur, C. Gharib, S. Mefteh-Wali, and W. Ben Arfi, "CatBoost model and artificial intelligence techniques for corporate failure prediction," Technol. Forecast. Soc. Change, vol. 166, p. 120658, 2021. https://doi.org/10.1016/j.techfore.2021.120658
"Advantages and Disadvantages of Logistic Regression." [Online]. Available: https://iq.opengenus.org/advantages-and-disadvantages-of-logistic-regression/. [Accessed: 13-Sep-2021].
"Random Forest Pros & Cons - HolyPython.com." [Online]. Available: https://holypython.com/rf/random-forest-pros-cons/. [Accessed: 13-Sep-2021].
D. Mwiti, "Fast Gradient Boosting with CatBoost | by Derrick Mwiti | Heartbeat," 16-Jun-2020. [Online]. Available: https://heartbeat.fritz.ai/fast-gradientboosting-with-catboost-38779b0d5d9a. [Accessed: 13-Sep-2021].
A. Nahon, "XGBoost, LightGBM or CatBoost - which boosting algorithm should I use?," 30-Dec-2019. [Online]. Available: https://medium.com/riskified-technology/xgboost-lightgbm-or-catboost-whichboosting-algorithm-should-i-use-e7fda7bb36bc. [Accessed: 13-Sep-2021].
M. S. O. Djediden, H. Reguieg, Z. M. Maaza, and others, "A distributed intrusion detection system based on apache spark and scikit-learn library," J. Appl. Phys. Sci., vol. 5, no. 1, pp. 30-36, 2019.

International Journal of Computer Science & Network Security

URL Phishing Detection System Utilizing Catboost Machine Learning Approach

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)