A Secure by Design Approach to Big Data Analytics Using Databricks and Format Preserving Encryption
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Context: Managing and analyzing data in Data Lakes for Big Data environments requires robust protocols to ensure security, scalability, and compliance with privacy regulations. The rapid growth of sensitive data processing highlights the importance of secure-by-design approaches that integrate encryption techniques and governance frameworks to safeguard personal and sensitive information. Aim: This study aims to design and validate a protocol that combines Databricks and Format-Preserving Encryption (FPE) to enhance data security and accessibility in Data Lakes, ensuring data privacy without compromising usability or structure. Method: The study uses Design Science a the main methodology, where a systematic literature review was conducted to identify existing encryption techniques and challenges in data governance. The proposed protocol was developed using a secure-by-design framework, validated through expert feedback, and tested using proof-of-concept experiments in banking environments. Results: The protocol features multiple layers—data ingestion, persistence, access, and consumption—that integrate Databricks’ advanced processing capabilities with FPE. It enables secure data ingestion, controlled access, and efficient governance while maintaining data usability. The validation demonstrated its effectiveness in safeguarding sensitive data, with potential for broader application in regulated industries. Conclusions: This work addresses critical gaps in Big Data security and proposes a scalable, secure framework for Data Lake management. It sets the foundation for future advancements in data governance and encryption methodologies across diverse domains.