Concept Drift in Online Fake reviews
Abstract
Online reviews significantly influence consumer
behavior and business reputation, especially in
sectors like e-commerce and hospitality. However,
the open nature of review platforms makes them
vulnerable to manipulation through fake or deceptive
reviews, which may be generated by bots, paid
users, or malicious actors. These fake reviews can
mislead potential buyers and create unfair market
advantages. To address this issue, this research
explores the use of machine learning models for
detecting fake online reviews, with a particular focus
on comparing supervised and semi-supervised
learning approaches.
The study utilizes publicly available datasets
containing hotel and product reviews, including
labeled and unlabeled data. Text preprocessing
techniques such as tokenization, stop word removal,
stemming, and TF-IDF vectorization were applied to
prepare the data. Several machine learning
algorithms were implemented: supervised models
including Support Vector Machines (SVM), Logistic
Regression, Random Forest, and Naïve Bayes; and
semi-supervised models including Self-Training,
Label Propagation, and Label Spreading