Privacy-Preserving Differential Expression Analysis via Fully Homomorphic Encryption: A Systematic Tradeoff Evaluation of BFV and CKKS on Cancer RNA-Seq Datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Cloud-based genomic analysis increasingly exposes sensitive RNA-sequencing data to external computational infrastructure, raising critical privacy concerns for differential expression studies. Fully homomorphic encryption (FHE) enables computation directly on encrypted data without requiring decryption, offering a principled solution to privacy risks in genomic analysis pipelines. However, practical deployment is constrained by limited empirical understanding of performance and accuracy tradeoffs across leading FHE schemes. Results: Here, a systematic empirical benchmark of two widely used FHE schemes, BFV and CKKS, is conducted and applied to differential expression analysis on two cancer RNA-seq datasets: the UCI Gene Expression RNA-Seq dataset (801 samples, five cancer types, ten pairwise comparisons) and the TCGA LUSC+LUAD dataset (1,129 samples, one pairwise comparison). Experiments were executed across polynomial modulus degrees N in {4096, 8192, 16384} and three cohort sizes with ten independent runs per configuration under 128-bit security compliant parameter settings, totalling 300 runs. Performance was evaluated using encryption latency, execution latency, decryption latency, ciphertext storage size, mean absolute error, and Spearman rank correlation of DE gene rankings relative to plaintext baselines. Conclusions: Across all experiments, BFV achieved 3.5 to 7.5 times lower total latency than CKKS across all configurations. Conversely, CKKS produced ciphertexts that were approximately 2.66 times smaller per sample at N=16384, revealing a clear latency-storage tradeoff without a universally dominant configuration. The execution cost scaled primarily with the number of pairwise class comparisons rather than sample count, identifying a computational driver not previously isolated in FHE benchmarking studies. Further, CKKS accuracy degraded at higher polynomial modulus degrees due to scale-induced rescaling noise, while BFV approximation error decreased with increasing cohort size through quantisation noise averaging. Both schemes preserved gene ranking fidelity at rho > 0.999 across all configurations. These results provide practical parameter selection guidance for implementing privacy-preserving genomic analysis pipelines and establish a reproducible benchmarking framework for encrypted differential expression analysis using homomorphic encryption.