Social Media Forensics: Cyberbullying & Hate Speech Analysis Using Machine Learning And NLP
Keywords:
Cyberbullying Detection, Hate Speech Classification, Natural Language Processing, TF-IDF, Support Vector Machine, Social Media Forensics, Text Classification, Flask, Affective ComputingAbstract
The exponential proliferation of user-generated content on social media platforms has created an urgent need for
automated systems capable of identifying cyberbullying, hate speech, and offensive language at scale. This paper
presents a comprehensive machine learning-based web application — Social Media Forensics (SMF) — that classifies
social media text into three categories: Hate Speech, Offensive Language, and Clean Content. The system employs
Natural Language Processing (NLP) preprocessing pipelines (tokenization, stop-word removal, lemmatization)
combined with TF-IDF (Term Frequency–Inverse Document Frequency) vectorization for feature extraction. Six
supervised machine learning classifiers — Logistic Regression, Naïve Bayes, Support Vector Machine (SVM), KNearest
Neighbors (KNN), Random Forest, and Gradient Boosting — are systematically trained, evaluated, and
compared. The best-performing model achieves approximately 94% classification accuracy. The full-stack web
application is developed using Python Flask, SQLite, Bootstrap 5, and Docker containerization, incorporating user
authentication, real-time prediction with confidence scoring, analysis history tracking, and a multi-chart analytics
dashboard. Mathematical formulations of TF-IDF, Bayes theorem, SVM hyperplane optimization, and informationgain-
based ensemble methods are derived. System architecture, algorithmic pseudocode, UML diagrams, and a
comprehensive performance comparison across all six classifiers are presented.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.










