ENHANCING TEXT CLASSIFICATION WITH LIDA AND ENSEMBLE METHODS

Mohammad Adnaan Ahmed; Dr. K. Santhi Sree

Authors

Mohammad Adnaan Ahmed Student, Department of Information Technology, University College of Engineering, Science and Technology, JNTUH Hyderabad Author
Dr. K. Santhi Sree Professor & Head of Department, Department of Information Technology, University College of Engineering, Science and Technology, JNTUH Hyderabad Author

Keywords:

Data augmentation, low-resource language, text classification

Abstract

Developing a high-performance text classification model in a low-resource language is challenging
due to the lack of labeled data. Meanwhile, collecting large amounts of labeled data is costinefficient. One approach
to increase the amount of labeled data is to create synthetic data using data augmentation techniques. However, most
of the available data augmentation techniques work on English data and are highly language-dependent as they
perform at the word and sentence level, such as replacing some words or paraphrasing a sentence. We present
Language-independent Data Augmentation (LiDA), a technique that utilizes a multilingual language model to create
synthetic data from the available training dataset. Unlike other methods, our approach worked on the sentence
embedding level independent of any particular language. We evaluated LiDA in three languages on various fractions
of the dataset, and the result showed improved performance in both the LSTM and BERT models. Furthermore, we
conducted an ablation study to determine the impact of the components in our method on overall performance. The
source code of LiDA is available at https://github.com/yest/LiDA.

ENHANCING TEXT CLASSIFICATION WITH LIDA AND ENSEMBLE METHODS

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Similar Articles

Call For Paper

Submission

MenuBar

Visitors in IJESR

Images

Indexed

Information

Reach Us

Important Links

Downloads & Indexing

Ethics & Policies