A rigorous benchmarking of alignment-based HLA typing algorithms for RNA-seq data

Dottie Yu
Ram Ayyala
Sarah Hany Sadek
Likhitha Chittampalli
Hafsa Farooq
Junghyun Jung
Abdullah Al Nahid
Grigore Boldirev
Mina Jung
Sungmin Park
Austin Nguyen
Alex Zelikovsky
Nicholas Mancuso
Jong Wha J. Joo
Reid F. Thompson
Houda Alachkar
Serghei Mangul

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.

Version published to 10.1101/2023.05.22.541750 on bioRxiv
May 24, 2023

Benchmarking scRNA-seq Copy Number Inference: A Comprehensive Evaluation and Practitioner’s Guide

This article has 16 authors:
1. Hung-Ching Chang
2. Yuxin Shi
3. Haoyu Cheng
4. Jian Zou
5. Alexander Chih-Chieh Chang
6. Brent T. Schlegel
7. Wenjia Wang
8. Daniel D. Brown
9. Fangyuan Chen
10. Sarah Wang
11. Danyang Li
12. Ria Sai
13. Noelle Michel
14. Steffi Oesterreich
15. Adrian V. Lee
16. George C. Tseng
This article has no evaluationsLatest version Apr 15, 2026
CANCAN: high-resolution copy number and mutation heterogeneity analysis of DNA sequence data for clinical applications

This article has 14 authors:
1. Arne V Pladsen
2. Daniel Vodak
3. Sen Zhao
4. Sigve Nakken
5. Daniel Nebdal
6. Tonje Lien
7. Britina Kjuul Danielsen
8. Caroline Wang
9. Wanja Kildal
10. Geir Olav Hjortland
11. Olav Engebråten
12. Eivind Hovig
13. Hege G Russnes
14. Ole Christian Lingjærde
This article has no evaluationsLatest version May 19, 2026
Evaluation of somatic variant calling methods on high coverage tumour-only amplicon sequencing data in a clinical environment

This article has 2 authors:
1. Dhammapal Bharne
2. Daniel Gaston
This article has no evaluationsLatest version Apr 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking scRNA-seq Copy Number Inference: A Comprehensive Evaluation and Practitioner’s Guide

CANCAN: high-resolution copy number and mutation heterogeneity analysis of DNA sequence data for clinical applications

Evaluation of somatic variant calling methods on high coverage tumour-only amplicon sequencing data in a clinical environment