Revisiting VERTIGO and VERTIGO-CI: Identifying confidentiality breaches and introducing a statistically sound, efficient alternative

Marie-Pier Domingue
Jean-François Ethier
Jean-Philippe Morissette
Simon Lévesque
Anita Burgun
Félix Camirand Lemyre

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Health Data Research Network Canada is tasked with facilitating large-scale health data research, such as statistical analyses that integrate, within a single model, data collected by different organizations, each holding distinct subsets of features corresponding to the same individuals, thereby forming a vertical data partition. To support logistic regression analyses in this setting, we assessed two recently proposed algorithms, VERTIGO and VERTIGO-CI, which enable parameter estimation and confidence interval computation, respectively, with respect to three aspects: the risk of re-identifying patient feature data, communication efficiency, and the extent to which model interpretability is preserved. This study has three main objectives: (1) highlighting confidentiality issues that arise with VERTIGO-CI, as well as those that may occur with VERTIGO when a data node holds only binary covariates; (2) reducing the number of required communication rounds; and (3) proposing an alternative (RidgeLog-V) to VERTIGO that excludes the intercept from the penalty term, which VERTIGO otherwise includes. Methods: We inspected the quantities exchanged in the original algorithms and used linear algebra to identify reverse-engineering procedures that the coordinating center could employ to reconstruct raw data. We also analyzed the objective function of the optimization problem, leading to the proposal of an alternative formulation that requires only a single round of communication while allowing the intercept to be excluded from the penalty term. Results: We showed that, when the VERTIGO-CI algorithm is executed, the coordinating center can reconstruct all individual-level data using simple vector-matrix operations. When the VERTIGO algorithm is executed and a data node has binary covariates only, the coordinating center may be able to recover individual data when parameter estimates are shared. We adapted the VERTIGO algorithm to reduce the number of communications and proposed a variant that excludes the intercept from the penalty term. Conclusions: While the use of VERTIGO-CI, or of VERTIGO with binary covariates does not involve directly sharing raw data, confidentiality breaches may arise through reverse-engineering, illustrating that that the distributed nature of an algorithm does not inherently guarantee data privacy. This work also proposed a new algorithm (RidgeLog-V) that reduces operational costs and enhances model interpretability.

Version published to 10.21203/rs.3.rs-6933988/v1 on Research Square
Jun 23, 2025

Charting the pitfalls of disproportionality analysis

This article has 4 authors:
1. Michele Fusaroli
2. Daniele Sartori
3. Eugène van Puijenbroek
4. G. Niklas Norén
This article has no evaluationsLatest version Jul 1, 2025
VALORIS: A privacy-aware logistic regression method for vertically partitioned data within a novel privacy risk assessment framework

This article has 5 authors:
1. Jean-François Ethier
2. Félix Camirand Lemyre
3. Marie-Pier Domingue
4. Jean-Philippe Morissette
5. Anita Burgun
This article has no evaluationsLatest version Jul 21, 2025
Bias and Fairness in Medical LLMs: An Extensive Scoping Review

This article has 3 authors:
1. Farzana Islam Adiba
2. Yifan Zhang
3. Rahmatollah Beheshti
This article has no evaluationsLatest version Jun 30, 2025

Listed in

Abstract

Article activity feed

Related articles

Charting the pitfalls of disproportionality analysis

VALORIS: A privacy-aware logistic regression method for vertically partitioned data within a novel privacy risk assessment framework

Bias and Fairness in Medical LLMs: An Extensive Scoping Review