Patient Deduplication in Uganda’s Electronic Medical Records System: A comparison of Three Classification Algorithms

Alex Mirugwe
Arthur G. Fitzmaurice
Alice Namale
Evelyn Akello
Simon Muhumuza
Milton Kaye
Samuel Lubwama
Jonathan Mpango
Paul Katongole
Solomon Ssevvume
Paul Mbaka
Clare Ashaba
Enos Sande
Kenneth Musenge

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Duplicate patient records pose a significant challenge to healthcare registries and electronic medical record (EMR) systems in Uganda, primarily due to the absence of a national unique patient identifier. These duplicates lead to fragmented patient care, misallocation of resources, and inaccuracies in data reporting, which hinder effective monitoring of disease progression, disrupt continuity of care, and complicate efforts to track patient outcomes. Objective: To evaluate the performance of three classification algorithms in identifying duplicate records of people living with HIV (PLHIV) and to determine a combination of variables that can uniquely identify a PLHIV. Methods: The study used a six-step deduplication process involving dataset extraction, preprocessing, indexing, comparison, classification, and performance evaluation. Records of PLHIV who were active in care between June and November 2022 were extracted from the UgandaEMR system - an EMR installed at 15 public health facilities in six districts in the Rwenzori Region. The dataset included demographic variables, i.e., first name, middle name, last name, sex, age, date of birth, address, and phone number. Three classification algorithms were used to classify the client scores into matches, potential matches, and non-matches, namely i) a threshold-based algorithm, ii) a weighted average score-based algorithm, and iii) a decision tree. Due to the absence of a labeled dataset, the decision tree was trained on data labeled using the two rule-based methods and evaluated on a synthetic reference dataset. Performance of the algorithms was evaluated using sensitivity, specificity, and F-score metrics. Results: A total of 44,717 records for PLHIV active in care in the Rwenzori region from June to November 2022 were extracted. The weighted average score-based algorithm identified 447 (5.8%) records as duplicates and 2996 (10%) as potential duplicates. The threshold-based algorithm identified 118 (0.5%) duplicates and flagged 8560 (21.0%) as potential duplicates. The weighted average score-based algorithm achieved the highest performance: sensitivity (99.0%), specificity (98.8%), and F-score (98.9%); followed by the threshold-based classification: sensitivity (95.3%), specificity (89.1%), and F-score (92.1%); and the decision tree algorithm sensitivity (92.3%), specificity (93.9%, and F-score (93.1%). Conclusions: The weighted average score-based algorithm achieved the best performance. Findings highlight that a combination of a few demographic variables can be employed to differentiate PLHIV. However, improving duplicate record detection at scale will require training these algorithms on a larger dataset that can generalize the PLHIV population in Uganda.

Version published to 10.20944/preprints202511.1450.v1
Nov 19, 2025

Quality of Medical Records in Sudanese Public Hospitals During Armed Conflict: A Multi-Centre Cross-Sectional Study

This article has 17 authors:
1. Tebyan Abdalgader Abdallah Mohmmed
2. Malaz Mohamed
3. Mohanned Salman
4. Mawada Osman
5. Abeer Elabid
6. Muath Ibrahim Mohamed Abusaada
7. Alaa Azhary Mohammed Ali
8. Fatima Afif
9. Roaa Bashir Alameen
10. Mohammed Jadain Mohamed Shareef
11. Israa Alsadig Ahmed Alfaki
12. Moez Salah
13. Ali Almadani
14. Ammar Elgadi
15. Danya Ibrahim
16. Mohanned Abdalkareem Mohamed Osman Idris
17. Mohamed Elmakki Ahmed
This article has no evaluationsLatest version Dec 31, 2025
Automated Medication Dispensing System: Are We Meeting Patient Needs? Insights from People Living with HIV’s Perspectives in Eswatini

This article has 18 authors:
1. Deus Bazira
2. Thokozani Maseko
3. Weijun Yu
4. Liyandza Mamba
5. Samson Haumba
6. Victor Williams
7. Jiaqin Wu
8. Fezokuhle Khumalo
9. Buhle Mkhonta
10. Hugben Byarugaba
11. Normusa Musarapasi
12. Jaskeerat Thakral
13. Thembisile Chili
14. Pido Bongomin
15. Arnold Mafukidze
16. Sharon Kibwana
17. Clara Nyakopota
18. Sylvia Ojoo
This article has no evaluationsLatest version Dec 22, 2025
Determinants and Histopathologic Patterns of Lung Cancer at St. Paul’s Hospital Millennium Medical College, Addis Ababa, Ethiopia: A Six Year Retrospective Case‒Control Study, 2024

This article has 4 authors:
1. Amanuel Yeneneh Teka
2. Bacha Mirkena Dhabi
3. Tsigehana Sisay Mekonnen
4. Yimer Seid Yimer
This article has no evaluationsLatest version Dec 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Quality of Medical Records in Sudanese Public Hospitals During Armed Conflict: A Multi-Centre Cross-Sectional Study

Automated Medication Dispensing System: Are We Meeting Patient Needs? Insights from People Living with HIV’s Perspectives in Eswatini

Determinants and Histopathologic Patterns of Lung Cancer at St. Paul’s Hospital Millennium Medical College, Addis Ababa, Ethiopia: A Six Year Retrospective Case‒Control Study, 2024