dsTidyverse: An implementation of Tidyverse within the DataSHIELD ecosystem
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper introduces dsTidyverse, an R package designed to enhance data handling within the federated analysis platform DataSHIELD. DataSHIELD enables multi-site analysis without direct data sharing, crucial for privacy-sensitive research. While DataSHIELD facilitates complex analysis, it lacks user-friendly data manipulation tools. dsTidyverse addresses this by implementing selected functions from the “Tidyverse” ecosystem within DataSHIELD’s client-server architecture. The package provides functionality for selecting, renaming, and creating columns; conditional recoding; combining data frames; filtering rows; grouping data; and converting to tibbles. Rigorous disclosure checks are implemented to prevent individual-level data leakage. The paper demonstrates, through examples, how dsTidyverse simplifies common data manipulation tasks, improving user experience and analysis efficiency within DataSHIELD. The package is open-source, freely available on CRAN and GitHub, welcoming further development. See https://github.com/molgenis/ds-tidyverse