Celebrity Insight Generation Using Word Embedding
Keywords:
Text Analysis, Word Embeddings, Word2Vec, FastText, Naïve Bayes Multinomial, Support Vector Machine (SVM), Random ForestAbstract
Celebrity profiling is a specialized branch of author profiling focused on identifying attributes like gender, birth
year, fame, and occupation through textual analysis. Social media has become a platform for celebrities to share
interests and engage with fans, but it has also led to impersonation issues. To address this, researchers are
developing methods to verify whether texts are genuinely authored by celebrities and determine their profiling
characteristics. In 2019, the PAN competition introduced a celebrity profiling task, challenging participants to
predict celebrity attributes based on written texts. Researchers employed various stylistic features and machine
learning techniques for this task. Our approach leverages word embedding techniques like Word2Vec and
FastText to represent words as vectors, capturing semantic relationships. These word vectors were aggregated to
create document-level representations, which were then classified using Naïve Bayes Multinomial, Support
Vector Machine (SVM), and Random Forest algorithms. The results highlighted that combining Word2Vec with
Random Forest achieved the highest accuracy for predicting fame and occupation, showcasing the effectiveness
of advanced word embeddings and robust machine learning in celebrity profiling.