Privacy-protecting, reliable response data discovery using COVID-19 patient observations

Jihoon Kim
Larissa Neumann
Paulina Paul
Michele E Day
Michael Aratow
Douglas S Bell
Jason N Doctor
Ludwig C Hinske
Xiaoqian Jiang
Katherine K Kim
Michael E Matheny
Daniella Meeker
Mark J Pletcher
Lisa M Schilling
Spencer SooHoo
Hua Xu
Kai Zheng
Lucila Ohno-Machado
R2D2 Consortium
David M Anderson
Nicholas R Anderson
Chandrasekar Balacha
Tyler Bath
Sally L Baxter
Andrea Becker-Pennrich
Elmer V Bernstam
William A Carter
Ngan Chau
Yong Choi
Steven Covington
Scott DuVall
Robert El-Kareh
Renato Florian
Robert W Follett
Benjamin P Geisler
Alessandro Ghigi
Assaf Gottlieb
Zhaoxian Hu
Diana Ir
Tara K Knight
Jejo D Koola
Tsung-Ting Kuo
Nelson Lee
Ulrich Mansmann
Zongyang Mou
Robert E Murphy
Larissa Neumann
Nghia H Nguyen
Sebastian Niedermayer
Eunice Park
Amy M Perkins
Kai W Post
Clemens Rieder
Clemens Scherer
Andrey Soares
Ekin Soysal
Brian Tep
Brian Toy
Baocheng Wang
Zhen R Wu
Yujia Zhou
Rachel A Zucker

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Objective

To utilize, in an individual and institutional privacy-preserving manner, electronic health record (EHR) data from 202 hospitals by analyzing answers to COVID-19-related questions and posting these answers online.

Materials and Methods

We developed a distributed, federated network of 12 health systems that harmonized their EHRs and submitted aggregate answers to consortia questions posted at https://www.covid19questions.org. Our consortium developed processes and implemented distributed algorithms to produce answers to a variety of questions. We were able to generate counts, descriptive statistics, and build a multivariate, iterative regression model without centralizing individual-level data.

Results

Our public website contains answers to various clinical questions, a web form for users to ask questions in natural language, and a list of items that are currently pending responses. The results show, for example, that patients who were taking angiotensin-converting enzyme inhibitors and angiotensin II receptor blockers, within the year before admission, had lower unadjusted in-hospital mortality rates. We also showed that, when adjusted for, age, sex, and ethnicity were not significantly associated with mortality. We demonstrated that it is possible to answer questions about COVID-19 using EHR data from systems that have different policies and must follow various regulations, without moving data out of their health systems.

Discussion and Conclusions

We present an alternative or a complement to centralized COVID-19 registries of EHR data. We can use multivariate distributed logistic regression on observations recorded in the process of care to generate results without transferring individual-level data outside the health systems.

Version published to 10.1093/jamia/ocab054
May 29, 2021
ScreenIT
Mar 1, 2021
SciScore for 10.1101/2020.09.21.20196220: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when …
SciScore for 10.1101/2020.09.21.20196220: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2020.09.21.20196220 on medRxiv
Sep 23, 2020

Community needs for FAIR pathogen data

This article has 12 authors:
1. Geert van Geest
2. Daniel Thomas-Lopez
3. Anna A. Feitzinger
4. Lily A. Weissgold
5. Sam Halabi
6. Isabel Cuesta
7. Erik Hjerde
8. Kim Tamara Gurwitz
9. Nishtha Arora
10. Aitana Neves
11. Patricia M. Palagi
12. Jason J. Williams
This article has no evaluationsLatest version Apr 15, 2026
Privacy-Preserving Large Language Model Deployment for Oncology Registry Abstraction: Structure-Aware Evaluation in a Real-World Clinical Setting

This article has 9 authors:
1. Ruslan Enikeev
2. Max Moldovan
3. Megan Chu
4. Anisha Amalraj
5. Prajakta Prashant Koli
6. Shabbir Syed Abdul
7. Huren Sivaraj
8. Usman Iqbal
9. Chee Keong Toh
This article has no evaluationsLatest version May 21, 2026
FAMES: Federated additive model using piecewise exponential survival data

This article has 7 authors:
1. Nazmul Islam
2. Chongliang Luo
3. Jiayi Tong
4. Grant Weller
5. Daniel A. Pollyea
6. Andrew Kent
7. Steven Bair
This article has no evaluationsLatest version May 19, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Materials and Methods

Results

Discussion and Conclusions

Article activity feed

Related articles

Community needs for FAIR pathogen data

Privacy-Preserving Large Language Model Deployment for Oncology Registry Abstraction: Structure-Aware Evaluation in a Real-World Clinical Setting

FAMES: Federated additive model using piecewise exponential survival data