Phishing Website Detection using Multilayer Perceptron

Blessing Obianuju Emedolu, Godwin Thomas, Nentawe Y. Gurumdimma
University of Jos, Bauchi Road, Jos, Plateau State, Nigeria
Received: 01 July 2023; Revised: 13 July 2023; Accepted: 17 July 2023; Published: 25 August 2023


Abstract: – Phishing attacks pose a significant threat in the cyber world, exploiting unsuspecting users through deceptive emails that lead them to malicious websites. To combat this challenge, various deep learning based anti-phishing techniques have been developed. However, these models often suffer from high false positive rates or lower accuracy. In this study, we evaluate the performance of two neural networks, the Autoencoder and Multilayer Perceptron (MLP), using a publicly available dataset to build an efficient phishing detection model. Feature selection was performed through correlation analysis, and the Autoencoder achieved an accuracy of 94.17%, while the MLP achieved 96%. We used hyperparameters for optimization using the Gridsearch CV, resulting in a False Positive Rate (FPR) of 1.3%, outperforming the Autoencoder’s 4.1% FPR. The MLP model was further deployed to determine the legitimacy of websites based on input URLs, demonstrating its usability in real-world scenarios. This research contributes to the development of effective phishing detection models, emphasizing the importance of optimizing neural network architecture for improved accuracy and reduced false positives.

Keywords: Phishing Website, MLP, Cybersecurity, Deep Learning

I. Introduction

With the increasing prevalence of cyber attacks, phishing has emerged as a significant concern in today’s digital landscape [1]. Phishing attacks employ deceptive tactics, such as misleading emails and fraudulent websites, to dupe unsuspecting users into divulging sensitive information [2]. These malicious activities not only compromise personal data but also pose a substantial risk to online security and financial transactions [3]. As a result, the development of effective detection strategies for phishing has become a crucial area of research.

In this paper, we focus on harnessing the potentials of deep learning techniques to enhance phishing detection capabilities. Deep learning, a subset of machine learning, leverages neural networks to automatically learn and extract intricate patterns and features from complex datasets [4]. This enables us to effectively analyze and classify phishing instances based on their distinguishing characteristics.

Our research objectives revolve around evaluating the performance of two specific deep learning models: the Autoencoder and Multilayer Perceptron (MLP). We aim to assess their effectiveness in detecting phishing attacks and compare their respective accuracies and false positive rates. To achieve this, we utilize a publicly available dataset specifically designed for phishing detection research, ensuring a standardized and reliable evaluation environment.

To guide our analysis, we employ methodologies, including feature selection techniques and hyperparameter optimization. Through careful selection and fine-tuning of key model parameters, we aim to enhance the performance and robustness of our deep learning models in identifying phishing attempts accu rately.

The contributions of this study are twofold. Firstly, we present a comprehensive evaluation of the Autoencoder and MLP models, shedding light on their respective strengths and limitations in phishing detection. Secondly, we demonstrate the potential of deep learning techniques for bolstering cybersecurity measures and combating the ever-evolving threat landscape of phishing attacks.

The remainder of this paper is organized as follows. Section 2 provides an overview of the related work in the field of phishing detection, highlighting the existing challenges and gaps. In Section 3, we present the methodology employed in our research, detailing the dataset used, preprocessing techniques, and the architecture of the Autoencoder and MLP models. Section 4 presents the experimental results and analysis, showcasing the performance metrics of our models. We discuss the implications of our findings and their significance in Section 5. Finally, in Section 6, we conclude the paper, summarizing the key contributions and suggesting potential avenues for future research and development in the field of phishing detection.