Revisiting VERTIGO and VERTIGO-CI: Identifying confidentiality breaches and introducing a statistically sound, efficient alternative

Marie-Pier Domingue
Jean-François Ethier
Jean-Philippe Morissette
Simon Lévesque
Anita Burgun
Félix Camirand Lemyre

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Health Data Research Network Canada is tasked with facilitating large-scale health data research, such as statistical analyses that integrate, within a single model, data collected by different organizations, each holding distinct subsets of features corresponding to the same individuals, thereby forming a vertical data partition. To support logistic regression analyses in this setting, we assessed two recently proposed algorithms, VERTIGO and VERTIGO-CI, which enable parameter estimation and confidence interval computation, respectively, with respect to three aspects: the risk of re-identifying patient feature data, communication efficiency, and the extent to which model interpretability is preserved. This study has three main objectives: (1) highlighting confidentiality issues that arise with VERTIGO-CI, as well as those that may occur with VER-TIGO when a data node holds only binary covariates; (2) reducing the number of required communication rounds; and (3) proposing an alternative (RidgeLog-V) to VERTIGO that excludes the intercept from the penalty term, which VER-TIGO otherwise includes.

Methods

We inspected the quantities exchanged in the original algorithms and used linear algebra to identify reverse-engineering procedures that the coordinating center could employ to reconstruct raw data. We also analyzed the objective function of the optimization problem, leading to the proposal of an alternative formulation that requires only a single round of communication while allowing the intercept to be excluded from the penalty term.

Results

We showed that, when the VERTIGO-CI algorithm is executed, the coordinating center can reconstruct all individual-level data using simple vectormatrix operations. When the VERTIGO algorithm is executed and a data node has binary covariates only, the coordinating center may be able to recover individual data when parameter estimates are shared. We adapted the VERTIGO algorithm to reduce the number of communications and proposed a variant that excludes the intercept from the penalty term.

Conclusions

While the use of VERTIGO-CI, or of VERTIGO with binary covariates does not involve directly sharing raw data, confidentiality breaches may arise through reverse-engineering, illustrating that that the distributed nature of an algorithm does not inherently guarantee data privacy. This work also proposed a new algorithm (RidgeLog-V) that reduces operational costs and enhances model interpretability.

Version published to 10.1101/2025.05.30.25328653 on medRxiv
May 31, 2025

The Relentless Two-Envelope Conundrum: A Paradox or Misapplication of Probability Theory

This article has 1 author:
1. Aris Spanos
This article has no evaluationsLatest version Dec 11, 2025
Navigating the Landscape of Stablecoins: Understanding Design, Volatility, and Regulatory Challenges

This article has 1 author:
1. Abhigyan Mukherjee
This article has no evaluationsLatest version Dec 31, 2025
A Novel Approach to Population Mean Estimation Using Two Auxiliary Variables Under PPS Sampling

This article has 3 authors:
1. Housila P. Singh
2. Rajesh Tailor
3. Akanksha Agrawal
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

The Relentless Two-Envelope Conundrum: A Paradox or Misapplication of Probability Theory

Navigating the Landscape of Stablecoins: Understanding Design, Volatility, and Regulatory Challenges

A Novel Approach to Population Mean Estimation Using Two Auxiliary Variables Under PPS Sampling