Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for ‘Complete Data’
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
Integrated clinical databases from national biobanks have advanced the capacity for disease research. Data quality and completeness filters are used when building clinical cohorts to address limitations of data missingness. However, these filters may unintentionally introduce systemic biases when they are correlated with race and ethnicity. In this study, we examined the race/ethnicity biases introduced by applying common filters to four clinical records databases.
Materials and Methods
We used 19 filters commonly used in electronic health records research on the availability of demographics, medication records, visit details, observation periods, and other data types. We evaluated the effect of applying these filters on self-reported race and ethnicity. This assessment was performed across four databases comprising approximately 12 million patients.
Results
Applying the observation period filter led to a substantial reduction in data availability across all races and ethnicities in all four datasets. However, among those examined, the availability of data in the white group remained consistently higher compared to other racial groups after applying each filter. Conversely, the Black/African American group was the most impacted by each filter on these three datasets, Cedars-Sinai dataset, UK-Biobank, and Columbia University Dataset.
Discussion and Conclusion
Our findings underscore the importance of using only necessary filters as they might disproportionally affect data availability of minoritized racial and ethnic populations. Researchers must consider these unintentional biases when performing data-driven research and explore techniques to minimize the impact of these filters, such as probabilistic methods or the use of machine learning and artificial intelligence.