AI BASED FEATURE SELECTION WITH UNSUPERVISED  LEARNING FOR EFFICIENT SPAM AND PHISHING EMAIL  CLASSIFICATION

Dr. S. Kavitha; Rahul Adugani, Ajay Raja, P Kishore Kumar

Authors

Dr. S. Kavitha Professor,UG Scholar, Department of Computer Science & Engineering Kommuri Pratap Reddy Institute of Technology, Ghatkesar, Hyderabad, Telangana Author
Rahul Adugani, Ajay Raja, P Kishore Kumar UG Scholar, Department of Computer Science & Engineering Kommuri Pratap Reddy Institute of Technology, Ghatkesar, Hyderabad, Telangana Author

Keywords:

malware, phishing, decreased productivity, spam filtering, content-based methods, artificial neural networks, PCA (principal component analysis), UCI, CSDMC, Spam Assassin, SPAM, HAM, phishing emails

Abstract

Email has become one of the most important forms of communication. In 2014, there are estimated to be 4.1 billion email accounts worldwide, and about 196 billion emails are sent each day worldwide. Spam is one of the major threats posed to email users. In 2013, 69.6% of all email flows were spam. Links in spam emails may lead to users to websites with malware or phishing schemes, which can access and disrupt the receiver’s computer system. These sites can also gather sensitive information from. Additionally, spam costs businesses around $2000 per employee per year due to decreased productivity. Therefore, an effective spam filtering technology is a significant contribution to the sustainability of the cyberspace and to our society. Current spam techniques could be paired with content-based spam filtering methods to increase effectiveness. Content-based methods analyze the content of the email to determine if the email is spam. Therefore, this project employs artificial neural networks to detect SPAM, HAM, and Phishing emails by applying features selection algorithm called PCA (principal component analysis). All existing algorithms detected only SPAM and HAM emails, but proposed algorithm designed to detect 3 different classes called SPAM, HAM, and Phishing. To implement this project, we have combined three different datasets called UCI, CSDMC and SPAM ASSASSIN dataset, where UCI and CSDMC datasets provided SPAM and HAM emails and Spam Assassin dataset provided Phishing emails. All these emails were processed to extract important features used in spam and phishing emails such as JAVA SCRIPTS, HTML tags and other alluring URLS to attract users.

AI BASED FEATURE SELECTION WITH UNSUPERVISED LEARNING FOR EFFICIENT SPAM AND PHISHING EMAIL CLASSIFICATION

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Call For Paper

Submission

MenuBar

Visitors in IJESR

Images

Indexed

Information

Reach Us

Important Links

Downloads & Indexing

Ethics & Policies