Utilizing Pre-trained Language Models for Data Augmentation Task of Event Causality Identification

Youngjoon Chun
Suyeon Ha
Juho Bai

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Event Causality Identification (ECI) is one of the main tasks in NLP especially in extracting causal relationships from text. Since identifying causality requires significant time and resources, we applied various data augmentation techniques to enhance data efficiency and the model's classification performance. In this context, preserving causality by maintaining the sentence's structure and context is crucial. Therefore, we propose an augmentation method that leverages the characteristics of PLMs that learn context during the masking process. RoBERTa and T5 models are utilized as PLMs and also various augmentation techniques such as EDA, POS Tagging, noise-based methods and LLMs are applied for comparison. PLMs showed the highest performance and demonstrated their applicability across various environments by maintaining strong performance even in imbalanced scenarios. Through our experiments, we confirmed that PLM-based data augmentation methods achieve meaningful results in the ECI task and demonstrate the importance of preserving context and structure for predicting causality in sentences.

Version published to 10.21203/rs.3.rs-6623081/v1 on Research Square
May 30, 2025

TraceLM: Temporal Root-Cause Analysis with Contextual Embedding Language Models

This article has 1 author:
1. Bingxin Zhu
This article has no evaluationsLatest version May 26, 2025
Sequence-Based Diagnosis Prediction Using Temporal Knowledge Graphs from MIMIC-III

This article has 3 authors:
1. Kokil Singh
2. Pranjali Attarde
3. Manik Gupta
This article has no evaluationsLatest version Jun 9, 2025
Word-embedding Approach for Unknown Attributes in Access Control Model

This article has 2 authors:
1. Thanh Duc Bui*
2. Brajendra Panda
This article has no evaluationsLatest version May 20, 2025

Listed in

Abstract

Article activity feed

Related articles

TraceLM: Temporal Root-Cause Analysis with Contextual Embedding Language Models

Sequence-Based Diagnosis Prediction Using Temporal Knowledge Graphs from MIMIC-III

Word-embedding Approach for Unknown Attributes in Access Control Model