BioDH: A reversible data hiding framework for secure hiding sensitive data in biological data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the rapid advancement of single-cell RNA sequencing (scRNA-seq) technologies, the open sharing of massive high-dimensional biological data has become a key driver in advancing precision medicine and fundamental research. However, while promoting data accessibility, ensuring effective protection of sensitive data has emerged as a critical challenge in bioinformatics. To address the issue, we propose BioDH (Bi-ological Data Hiding) framework, a reversible information hiding frame-work tailored for biological data, aiming to simultaneously ensure data security and preserve scientific utility. The framework employs optimal lossless compression and strictly controls numerical perturbations to maintain precision. Upon extraction, both the secret information and the original data are perfectly recovered, achieving true reversibility. Validation on real-world datasets shows exceptional fidelity: when embedding capacity is 1.25 bit per byte, maximum and mean absolute errors are 8.54E-04 and 8.90E-11, respectively; PCA reveals a correlation of 1.0; UMAP testing exhibits no structural distortion; Mantel tests and clustering analyses (ARI = 1.0) confirm preservation of high-dimensional topology and cell subpopulations. All metrics surpass biological compatibility thresholds, indicating no detectable impact on downstream analyses.