Phishing Detection System through Hybrid Machine Learning
Keywords:
Phishing, LSD Model, GridSearchCV, Canopy Feature Selection.Abstract
The internet is in nowadays used to coordinate a variety of types of cybercrimes. Therefore, the primary subject of this study is phishing attacks. Phishing uses email distortion as its fundamental tactic. To gather the necessary data from the concerned parties, challenging correspondences are followed by mock sites. There is currently no complete and effective method for preventing phishing attacks, despite the fact that various studies have published their work on prevention, detection, and knowledge of these attacks. As a result, machine learning is essential in the fight against online crimes like phishing. The proposed study is based on the phishing URL-based dataset, which is a collection of phishing and legitimate URL features collected from more than 11000 website datasets. Several machine learning methods have been used after preprocessing in order prevent phishing URLs and provide user protection. This study utilizes several machine learning models like decision trees, linear regression, random forests, naive Bayes, gradient boosting classifiers, support vector classifiers, and a proposed hybrid LSD model that combines decision trees, support vector machines, and logistic regression with both soft and hard voting to effectively and accurately defend against phishing attacks. The proposed LSD model makes use of the GridSearchCV hyper parameter optimization technique and the canopy feature selection technique with cross fold validation.