Governing Synthetic Data in the Financial Sector

Taylor C. Spears
Kristian Bondo Hansen
Ruowen Xu
Yuval Millo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Synthetic datasets, artificially generated to mimic real-world data while maintaining anonymization, have emerged as a promising technology in the financial sector, attracting support from regulators and market participants as a solution to data privacy and scarcity challenges limiting machine learning deployment. This paper argues that synthetic data's effects on financial markets depend critically on how these technologies are embedded within existing machine learning infrastructural ``stacks'' rather than on their intrinsic properties. We identify three key tensions that will determine whether adoption proves beneficial or harmful: (1) data circulability versus opacity, particularly the "double opacity" problem arising from stacked machine learning systems, (2) model-induced scattering versus model-induced herding in market participant behaviour, and (3) flattening versus deepening of data platform power. These tensions directly correspond to core regulatory priorities around model risk management, systemic risk, and competition policy. Using financial audit as a case study, we demonstrate how these tensions interact in practice and propose governance frameworks, including a synthetic data labelling regime to preserve contextual information when datasets cross organizational boundaries.

Version published to 10.31235/osf.io/ruxkh_v1 on OSF Preprints
Sep 8, 2025

Pattern Recognition of Aluminium Arbitrage in Global Trade Data

This article has 1 author:
1. Muhammad Sukri Bin Ramli
This article has no evaluationsLatest version Dec 19, 2025
Infrastructure vs Regulatory Shocks: Asymmetric Volatility Response in Cryptocurrency Markets

This article has 1 author:
1. Murad Farzulla
This article has no evaluationsLatest version Dec 11, 2025
Human-AI Synergy in Statistical Arbitrage: Enhancing Robustness Across Volatile Financial Markets

This article has 1 author:
1. Binxu Lei
This article has no evaluationsLatest version Dec 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pattern Recognition of Aluminium Arbitrage in Global Trade Data

Infrastructure vs Regulatory Shocks: Asymmetric Volatility Response in Cryptocurrency Markets

Human-AI Synergy in Statistical Arbitrage: Enhancing Robustness Across Volatile Financial Markets