Natural Language Processing of ESG Disclosures with FinBERT and AraBERT: Insights into Retail Investor Flows in the Abu Dhabi Securities Exchange (ADX)

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study introduces a novel computational framework for understanding how the credibility of environmental, social, and governance (ESG) disclosures shapes retail investor behavior in emerging markets. Focusing on 125 firms listed on the Abu Dhabi Securities Exchange (ADX) from 2021 to 2025, the research develops two proprietary indices ESG_MKT (marketing intensity) and ESG_AI (AI-based disclosure credibility) derived through a multilingual natural language processing (NLP) pipeline integrating FinBERT for English and AraBERT for Arabic texts. This bilingual design represents one of the first large-scale applications of transformer models to sustainability reporting in the Gulf region. By fusing computational linguistics with behavioral finance, the study bridges the gap between symbolic communication and substantive disclosure, offering a new lens through which to examine signaling credibility in capital markets. A two-way fixed effects (TWFE) model and event-study design are employed on a balanced panel of over 80,000 firm–day observations, revealing that high-credibility ESG signals—those supported by quantifiable evidence and external verification—generate statistically significant positive abnormal retail flows. Conversely, symbolic ESG marketing campaigns lacking verifiable content produce muted or even negative investor reactions. Robustness tests using alternative sentiment frameworks (RoBERTa, VADER), ownership stratification, and dynamic panel estimation confirm the persistence of these effects. The findings provide theoretical evidence that costly, verifiable signals enhance market trust, while inexpensive, reputational messages erode it—empirically validating signaling theory (Spence, 1973) within a computational finance context. This research contributes to the intersection of NLP, ESG analytics, and market microstructure, establishing a reproducible methodological foundation for measuring credibility in sustainability communication. Beyond empirical insight, it advances the discourse on algorithmic transparency, linguistic asymmetry, and investor cognition, positioning the UAE as a testbed for the future of AI-driven sustainable finance in emerging economies.

Article activity feed