URL Based Phishing Website Detection
Abstract
URL Based Phishing Website Detection
In today's digitally connected world, phishing attacks have emerged as one of the most common and dangerous forms of cybercrime. Phishing is a fraudulent practice in which attackers impersonate legitimate organizations or services to trick users into revealing sensitive personal or financial information. Most phishing attacks exploit websites and emails by embedding deceptive links, typically in the form of Uniform Resource Locators (URLs). While conventional anti-phishing strategies such as blacklisting or rule-based systems exist, they struggle to keep up with the evolving tactics of attackers. In light of these limitations, machine learning (ML) techniques have gained traction for automating and improving the detection of phishing threats.
In the initial phase of our project, we implemented a URL-based phishing website detection system using supervised ML algorithms like Random Forest (RF), Decision Tree (DT), and Support Vector Machine (SVM). We designed a system that analyzed URLs based on several lexical and heuristic features—such as the presence of “@” symbols, URL length, redirection count, domain age, and HTTPS usage—to classify websites as either phishing or legitimate. This model was trained on labeled datasets and evaluated using various performance metrics including accuracy, precision, recall, and F1-score. Our system achieved over 95% accuracy in identifying phishing websites in real-time web applications, establishing a strong baseline for further enhancement