EADF: An Environment-Aware Deployment Design Pattern for Multi-Cloud Data Engineering CI/CD Pipelines

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Continuous Integration and Continuous Deployment (CI/CD) have revolutionized software engineering by enabling rapid, reliable, and automated delivery of code. However, data engineering projects face unique challenges, such as environment-specific configurations, data versioning, schema drift, and multi-cloud heterogeneity, that render traditional CI/CD pipelines insufficient. Existing tools for data engineering offer partial solutions but lack a standardized, reusable approach to promote data components (e.g., pipelines, models, transformation logics) consistently across development, testing, and production environments. This paper introduces the Environment-Aware Deployment Framework (EADF), a novel CI/CD design pattern for data engineering that decouples deployment logic from environment and cloud specifics using minimal, parameterized configurations. EADF enables teams to "write once, reuse everywhere" by late-binding configurations at runtime, ensuring reproducible, auditable, and portable deployments across multi-clouds platforms (e.g., Azure, AWS, GCP). Unlike ad-hoc scripts or vendor-specific pipelines, EADF fills a critical gap in the literature: the absence of a standardized, composable, and environment-aware design pattern for CI/CD for data projects. Our framework supports incremental adoption, reduces configuration duplication, and enhances governance, making it suitable for both data engineering and MLOps workflows. We provide open, reusable specifications and implementation guidance to facilitate real-world adoption.

Article activity feed