Simulating Multi-Model Data Evolution for Benchmarking Big Data Systems

Alžběta Šrůtková
Jáchym Bártík
Irena Holubová

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper addresses the challenge of benchmarking multi-model data management systems capable of handling diverse and evolving data. Existing benchmarks are typically static, limited to specific models, and insufficient for evaluating cross-model interoperability or schema evolution. To overcome these limitations, we introduce a novel tool, called TransforMMer, that enables the generation of dynamic, customizable benchmarks from heterogeneous datasets. The tool combines schema inference, editing, transformation, and export within a unified graphical interface. It supports multiple data models and schema versions, facilitating comparative performance evaluation across systems. Experimental results on real-world datasets demonstrate its effectiveness and adaptability. To promote reproducibility and community adoption, we also provide DaRe, a curated repository of benchmark datasets.

Version published to 10.21203/rs.3.rs-7234064/v1 on Research Square
Sep 22, 2025

Best Practices for Using Large Language Models at Scale

This article has 5 authors:
1. Bhargavee Kannikanti
2. Arjun Coimbatore Nagarasan
3. Alberto Rosas
4. Sriram Kothandaraman
5. Sravan Kumar Kannuri
This article has no evaluationsLatest version Dec 12, 2025
Bridging the Gap Between Data Engineering and ML Operations: A Scalable Framework for Feature Curation, Discovery, and High-Throughput Serving

This article has 2 authors:
1. Bakhtiiar Tashbolotov
2. Burul Shambetova
This article has no evaluationsLatest version Dec 29, 2025
QModel: A Time-Aware GitHub Mining Framework for Empirical Software Quality Studies

This article has 1 author:
1. Dmytro Polishchuk
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Best Practices for Using Large Language Models at Scale

Bridging the Gap Between Data Engineering and ML Operations: A Scalable Framework for Feature Curation, Discovery, and High-Throughput Serving

QModel: A Time-Aware GitHub Mining Framework for Empirical Software Quality Studies