Comparing Human and LLM Annotations in Low-Resource Language NLP Tasks
Abstract
In Natural Language Processing (NLP), annotated
datasets play a crucial role in training and
evaluating machine learning models. However, in
low-resource languages, the availability of highquality
annotated data is extremely limited due to
linguistic complexity, lack of standardization, and
scarcity of expert annotators. With the rise of Large
Language Models (LLMs), such as GPT and similar
models, there is growing interest in using these
models to generate annotations automatically. This
study compares human-generated annotations with
those generated by LLMs for NLP tasks such as partof-
speech tagging, named entity recognition, and
sentiment analysis in low-resource languages. The
comparison is based on precision, recall, and F1-
score, along with qualitative analysis. Our findings
show that while LLMs can provide reasonable
annotations in many cases, human annotations still
outperform them in linguistic nuance, context
understanding, and domain specificity. However,
LLMs show potential in speeding up the annotation
process and supporting human annotators through
pre-annotation. This research highlights the
complementary strengths of humans and LLMs and
proposes a hybrid annotation workflow for building
better NLP resources in low-resource settings.