EVALUATING MACHINE LEARNING TECHNIQUES FOR DETECTING OFFENSIVE AND HATE SPEECH IN CHATBOT
Keywords:
Machine learning, chatbot, hate speech, offensive speech.Abstract
In recent times, it has been witnessing insurgence of offensive and hate speech along with racial and
ethnic dispositions in chatbots. Popular among them used is English. Although machine learning has been
successfully used to detect offensive and hate speech in several English contexts, the distinctiveness of chatbots and
the similarities among offensive, hate, and free speeches require domain-specific English corpus and techniques to
detect offensive and hate speech. Thus, we developed an English corpus from chatbots and evaluated different
machine-learning techniques to detect offensive and hate speech. Character n-gram, word n-gram, negative
sentiment, syntactic-based features, and their hybrid were extracted and analyzed using hyper-parameter
optimization, ensemble and multi-tier meta-learning models of support vector machine, logistic regression, random
forest,and gradient boosting algorithms. The results showed that an optimized support vector machine with character
n-gram performed best in the detection of hate speech, while optimized gradient boosting with word n-gram
performed best in the detection of hate speech.