OFFENSIVE LANGUAGE DETECTION USINGTEXT CLASSIFICATION
Abstract
There is a concerning rise of offensive language on the content generated by the
various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. In this study, we propose a model for text classification consisting of modular cleaning phase and tokenizer, three embedding methods, and eight classifiers. Our experiments show a promising result for detection of offensive language on our dataset obtained from Twitter.
Considering hyperparameter optimization, three methods of AdaBoost, SVM and MLP had highest average of F1-score on popular embedding method of TF-IDF. Index Terms— offensive language detection, social media, machine learning, text mining. This paper reviews text classification methods for offensive language detection in online platforms. It covers algorithms like Naive Bayes, SVMs, and neural networks, along with feature engineering techniques and evaluation metrics. Insights into current research and future directions are provided.