muat : portable transformer-based method for tumour classification and representation learning from somatic variants
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Deep neural networks have proven effective in classifying tumour types using next-generation sequencing data. However, developing transferable models that work across heterogeneous operating environments remains challenging due to differences in cohort compositions and data generation protocols, privacy concerns, and limited computational capabilities.
Results
We introduce muat , a transformer-based software for tumour classification using somatic variant data from whole-genome (WGS) and whole-exome sequencing (WES). Building on previously developed MuAt and MuAt2 models, we distribute the software via Docker containers and Bioconda for deployment in high-performance computing (HPC) systems and Secure Processing Environments (SPEs). Using a downloadable MuAt checkpoint, we reproduce the performance reported in the original study on whole genome (PCAWG; 89% accuracy in histological tumour typing) and exome sequencing data (TCGA; 64% accuracy). Cross-cohort evaluation in Genomics England SPE achieved 81% accuracy without retraining and 89% following fine-tuning. As a demonstration of the software’s adaptability, we also deployed muat within the iCAN Digital Precision Cancer Medicine Flagship’s SPE and integrated it into a Nextflow-managed workflow.
Availability and implementation
muat is available through conda ( www.anaconda.org/bioconda/muat ) and GitHub ( https://github.com/primasanjaya/muat ), under the Apache 2.0 License.
Contact
prima.sanjaya@helsinki.fi , esa.pitkanen@helsinki.fi ; website: mlbiomed.net