A Secure-by-Design Approach to Big Data Analytics Using Databricks and Format-Preserving Encryption

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Managing and analyzing data in data lakes for big data environments requires robust protocols to ensure security, scalability, and compliance with privacy regulations. The increasing need to process sensitive data emphasizes the relevance of secure-by-design approaches that integrate encryption techniques and governance frameworks to protect personal and confidential information. This study proposes a protocol that combines the capabilities of Databricks and format-preserving encryption to improve data security and accessibility in data lakes without compromising usability or structure. The protocol was developed using a design science methodology, incorporating findings from a systematic literature review and validated through expert feedback and proof-of-concept experiments in banking environments. The proposed solution integrates multiple layers, data ingestion, persistence, access, and consumption, leveraging the processing capabilities of Databricks and format-preserving encryption to enable secure data management and governance. Validation results indicate the protocol is effectiveness in protecting sensitive data, with promising applicability in regulated industries. This work contributes to addressing key challenges in big data security and lays the groundwork for future developments in data governance and encryption techniques.

Article activity feed