AI BASED FEATURE SELECTION WITH UNSUPERVISED LEARNING FOR EFFICIENT SPAM AND PHISHING EMAIL CLASSIFICATION

Authors

  • Dr. S. Kavitha Professor,UG Scholar, Department of Computer Science & Engineering Kommuri Pratap Reddy Institute of Technology, Ghatkesar, Hyderabad, Telangana Author
  • Rahul Adugani, Ajay Raja, P Kishore Kumar UG Scholar, Department of Computer Science & Engineering Kommuri Pratap Reddy Institute of Technology, Ghatkesar, Hyderabad, Telangana Author

Keywords:

malware, phishing, decreased productivity, spam filtering, content-based methods, artificial neural networks, PCA (principal component analysis), UCI, CSDMC, Spam Assassin, SPAM, HAM, phishing emails

Abstract

Email has become one of the most important forms of communication. In 2014, there are estimated to be 4.1 billion email accounts worldwide, and about 196 billion emails are sent each day worldwide. Spam is one of the major threats posed to email users. In 2013, 69.6% of all email flows were spam. Links in spam emails may lead to users to websites with malware or phishing schemes, which can access and disrupt the receiver’s computer system. These sites can also gather sensitive information from. Additionally, spam costs businesses around $2000 per employee per year due to decreased productivity. Therefore, an effective spam filtering technology is a significant contribution to the sustainability of the cyberspace and to our society. Current spam techniques could be paired with content-based spam filtering methods to increase effectiveness. Content-based methods analyze the content of the email to determine if the email is spam. Therefore, this project employs artificial neural networks to detect SPAM, HAM, and Phishing emails by applying features selection algorithm called PCA (principal component analysis). All existing algorithms detected only SPAM and HAM emails, but proposed algorithm designed to detect 3 different classes called SPAM, HAM, and Phishing. To implement this project, we have combined three different datasets called UCI, CSDMC and SPAM ASSASSIN dataset, where UCI and CSDMC datasets provided SPAM and HAM emails and Spam Assassin dataset provided Phishing emails. All these emails were processed to extract important features used in spam and phishing emails such as JAVA SCRIPTS, HTML tags and other alluring URLS to attract users.

Downloads

Published

2025-07-31

Issue

Section

Articles

How to Cite

AI BASED FEATURE SELECTION WITH UNSUPERVISED LEARNING FOR EFFICIENT SPAM AND PHISHING EMAIL CLASSIFICATION. (2025). International Journal of Engineering and Science Research, 14(2s), 367-373. https://www.ijesr.org/index.php/ijesr/article/view/866

Similar Articles

1-10 of 689

You may also start an advanced similarity search for this article.