Utilizing Pre-trained Language Models for Data Augmentation Task of Event Causality Identification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Event Causality Identification (ECI) is one of the main tasks in NLP especially in extracting causal relationships from text. Since identifying causality requires significant time and resources, we applied various data augmentation techniques to enhance data efficiency and the model's classification performance. In this context, preserving causality by maintaining the sentence's structure and context is crucial. Therefore, we propose an augmentation method that leverages the characteristics of PLMs that learn context during the masking process. RoBERTa and T5 models are utilized as PLMs and also various augmentation techniques such as EDA, POS Tagging, noise-based methods and LLMs are applied for comparison. PLMs showed the highest performance and demonstrated their applicability across various environments by maintaining strong performance even in imbalanced scenarios. Through our experiments, we confirmed that PLM-based data augmentation methods achieve meaningful results in the ECI task and demonstrate the importance of preserving context and structure for predicting causality in sentences.