Journal of Applied Science and Engineering

Published by Tamkang University Press

1.30

Impact Factor

2.10

CiteScore

Bo HuThis email address is being protected from spambots. You need JavaScript enabled to view it. and SaiNan Zhang

Center of information construction and management, Nanjing Normal University of Special Education, Nanjing 210038, China


 

 

Received: April 24, 2024
Accepted: September 1, 2024
Publication Date: October 7, 2024

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.


Download Citation: ||https://doi.org/10.6180/jase.202507_28(7).0011  


Web phishing attacks have emerged as a significant threat to online security, enabling phishers to steal sensitive financial information and commit fraud. To combat this, many anti-phishing systems have been developed, focusing on detecting phishing content in online communications. This study introduces novel approaches to enhance phishing detection by employing machine learning techniques. Specifically, three different single models were analyzed: Random Forest Classifier (RFC), Adaptive Boosting Classification (ADAC), and Naïve Bayes Classification Algorithm (NBC). These models were optimized using Artificial Rabbits Optimization (ARO), resulting in hybrid models RFAR, NBAR, and ADAR. The results of the models’ analysis indicate that the RFAR hybrid model performs better than the other single models and their optimized models. The RFAR model achieved precision scores of 0.950 for phishing websites, 0.954 for suspicious websites, and 0.872 for legitimate websites, with corresponding recall values of 0.929, 0.954, and 0.990 , respectively. In comparison, the ADAR model was notably effective in classifying legitimate websites with a precision score of 0.896 . The study’s novelty lies in integrating ARO with traditional classifiers to create hybrid models that improve classification accuracy.


Keywords: Phishing; Cyber Attacks; Classification; Data Mining; Optimization Algorithms; Phishing Websites Prediction; Artificial Intelligence


  1. [1] R. Gowtham and I. Krishnamurthi, (2014) “PhishTackle—a web services architecture for anti-phishing" Cluster Computing 17: 1051–1068. DOI: https://doi.org/10.1007/s10586-013-0320-5
  2. [2] S. Gupta, A. Singhal, and A. Kapoor. “A literature survey on social engineering attacks: Phishing attack”. In: 2016 international conference on computing, communication and automation (ICCCA). IEEE. 2016, 537–540. DOI: 10.1109/CCAA.2016.7813778.
  3. [3] A. A. Orunsolu, A. S. Sodiya, and A. T. Akinwale, (2022) “A predictive model for phishing detection" Journal of King Saud University-Computer and Information Sciences 34: 232–247. DOI: https://doi.org/10.1016/j.jksuci.2019.12.005.
  4. [4] B. Schneier, (2013) “Phishing has gotten very good" Schneier on Security, Retrieved on 16:
  5. [5] C. Whittaker, B. Ryner, and M. Nazif. “Large-Scale Automatic Classification of Phishing Pages.” In: Ndss. 10. 2010, 2010.
  6. [6] R. Gowtham and I. Krishnamurthi, (2014) “A comprehensive and efficacious architecture for detecting phishing webpages" Computers Security 40: 23–37. DOI: https://doi.org/10.1016/j.cose.2013.10.004.
  7. [7] C. M. R. da Silva, E. L. Feitosa, and V. C. Garcia, (2020) “Heuristic-based strategy for Phishing prediction: A survey of URL-based approach" Computers Security 88: 101613. DOI: https://doi.org/10.1016/j.cose.2019.101613
  8. [8] P. A. Barraclough, M. A. Hossain, M. A. Tahir, G. Sexton, and N. Aslam, (2013) “Intelligent phishing detection and protection scheme for online transactions" Expert systems with applications 40: 4697–4706. DOI: https://doi.org/10.1016/j.eswa.2013.02.009.
  9. [9] N. Abdelhamid, A. Ayesh, and F. Thabtah, (2014) “Phishing detection based associative classification data mining" Expert Systems with Applications 41: 5948–5959. DOI: https://doi.org/10.1016/j.eswa.2014.03.019
  10. [10] R. M. Mohammad, F. Thabtah, and L. McCluskey, (2014) “Predicting phishing websites based on selfstructuring neural network" Neural Computing and Applications 25: 443–458. DOI: https://doi.org/10.1007/s00521-013-1490-z
  11. [11] R. M. Mohammad, F. Thabtah, and L. McCluskey, (2014) “Intelligent rule-based phishing websites classification" IET Information Security 8: 153–160. DOI: https://doi.org/10.1049/iet-ifs.2013.0202.
  12. [12] M. A. U. H. Tahir, S. Asghar, A. Zafar, and S. Gillani. “A hybrid model to detect phishing-sites using supervised learning algorithms”. In: 2016 International conference on computational science and computational intelligence (CSCI). IEEE. 2016, 1126–1133. DOI: 10.1109/CSCI.2016.0214.
  13. [13] O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, (2019) “Machine learning based phishing detection from URLs" Expert Systems with Applications 117: 345–357. DOI: https://doi.org/10.1016/j.eswa.2018.09.029.
  14. [14] M. He, S.-J. Horng, P. Fan, M. K. Khan, R.-S. Run, J.-L. Lai, R.-J. Chen, and A. Sutanto, (2011) “An efficient phishing webpage detector" Expert systems with applications 38: 12018–12027. DOI: https://doi.org/10.1016/j.eswa.2011.01.046.
  15. [15] A. Hodži´c, J. Kevri´c, and A. Karadag. “Comparison of machine learning techniques in phishing website classification”. In: International Conference on Economic and Social Studies (ICESoS’16). 2016, 249–256.
  16. [16] M. Al-diabat, (2016) “Detection and prediction of phishing websites using classification mining techniques" International Journal of Computer Applications 147: DOI: 10.5120/ijca2016911061.
  17. [17] J. D. Cox, (2013) “Strengthening financial reporting: an essay on expanding the auditor’s opinion letter" Geo. Wash. L. Rev. 81: 1036.
  18. [18] I. Qabajeh and F. Thabtah. “An experimental study for assessing email classification attributes using feature selection methods”. In: 2014 3rd International Conference on Advanced Computer Science Applications and Technologies. IEEE. 2014, 125–132.
  19. [19] V. S. Lakshmi and M. S. Vijaya, (2012) “Efficient prediction of phishing websites using supervised learning algorithms" Procedia Engineering 30: 798–805. DOI: https://doi.org/10.1016/j.proeng.2012.01.930.
  20. [20] W. Ali and A. A. Ahmed, (2019) “Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting" IET Information Security 13: 659–669. DOI: https://doi.org/10.1049/iet-ifs.2019.0006.
  21. [21] C. D. Manning. An introduction to information retrieval. 2009.
  22. [22] G. Rosen, E. Garbarine, D. Caseiro, R. Polikar, and B. Sokhansanj, (2008) “Metagenome fragment classification using N-mer frequency profiles" Advances in bioinformatics 2008:
  23. [23] J. Han and M. Kamber, (2001) “Data mining concepts and techniques San Francisco Moraga Kaufman":
  24. [24] A. A. Pekuwali, W. A. Kusuma, and A. Buono, (2018) “Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification." Journal of ICT Research Applications 12: DOI: 10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2.
  25. [25] C. Liu, M. White, and G. Newell. “Measuring the accuracy of species distribution models: a review”. In: Proceedings 18th World IMACs/MODSIM Congress. Cairns, Australia. 4241. 2009, 4247.
  26. [26] S. K. Ghosh and F. Janan. “Prediction of student’s performance using random forest classifier”. In: Proceedings of the 11th Annual International Conference on Industrial Engineering and Operations Management, Singapore. 2021, 7–11.
  27. [27] L. Breiman, (2001) “Random forests" Machine learning 45: 5–32.
  28. [28] Y. Freund, (1995) “Boosting a weak learning algorithm by majority" Information and computation 121(2): 256–285.
  29. [29] P. S. Efraimidis and P. G. Spirakis, (2006) “Weighted random sampling with a reservoir" Information processing letters 97(5): 181–185.
  30. [30] L. Wang, Q. Cao, Z. Zhang, S. Mirjalili, and W. Zhao, (2022) “Artificial rabbits optimization: A new bio-inspired meta-heuristic algorithm for solving engineering optimization problems" Engineering Applications of Artificial Intelligence 114: 105082.