The somatic and germline mutational landscape of HPV-negative oral cancer patients with a history of chewing tobacco and betel nut use
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (PREreview)
Abstract
Head and neck cancer (HNC) is highly prevalent in South-Asia, driven by additional region-specific exposures such as chewing tobacco and betel nut. Despite therapeutic advances, five-year survival rate remains around 50-60%, underscoring urgent need to identify novel therapeutic targets and improve disease-free survival. This study was designed to identify both somatic and germline drivers contributing to HNC pathogenesis. Through whole-exome sequencing of 103 patients, we detected mutations in known HNC drivers (TP53, CDKN2A, NOTCH1) as well as novel hotspots in several genes, including TRIM48, MAP3K19, and CDC20. A recurrent hotspot mutation (p.A187T) in POLQ gene was identified in patients with high tumor mutation burden and was absent in both TCGA and ICGC cohorts. Among known hotspots, the MYC p.T73A mutation was highly prevalent, occurring in over 50% of patients. As MYC is considered an “undruggable” target, alternative strategies targeting upstream regulators such as BRD4 with specific inhibitors may hold promise for South-Asian HNSCC patients harboring the p.T73A mutation. Copy-number variation analysis further revealed EGFR amplification and TP73 deletion in the majority of patients, highlighting additional layers of genomic dysregulation. Comparative genomic analyses showed no recurrent mutations in epigenetic regulators (ARID2, EP300, KMT2B/MLL2, KMT2D/MLL4, NSD1, and TET1). We report p.S456L germline variant in SDHA consistently among South-Asian cohorts. Patients with p.S456L mutation were younger than those without it, reflecting typical epidemiological signature of a genetic variant that increases susceptibility. Systematic molecular characterization of recurrent mutations is required to elucidate mechanism of action of these variants and to find actionable therapeutic targets.
Article activity feed
-
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/18728879.
Summary of main findings and contribution
This preprint profiles somatic and germline variation in largely HPV-negative head and neck/oral squamous cell carcinoma from South Asia with prominent chewing tobacco/areca nut exposure, using whole-exome sequencing (WES) across 103 patients aggregated from three South Asian cohorts (A–C) and compared to TCGA HNSCC (cohort D). The authors report recurrent alterations in canonical HNSCC drivers (e.g., TP53, CDKN2A, NOTCH1), nominate several potentially population/exposure-enriched hotspot mutations (e.g., TRIM48 p.I44T; POLQ p.A187T in high-TMB cases; MAP3K19 p.H1282Y; CDC20 p.R162Q), and identify recurrent copy-number changes including EGFR …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/18728879.
Summary of main findings and contribution
This preprint profiles somatic and germline variation in largely HPV-negative head and neck/oral squamous cell carcinoma from South Asia with prominent chewing tobacco/areca nut exposure, using whole-exome sequencing (WES) across 103 patients aggregated from three South Asian cohorts (A–C) and compared to TCGA HNSCC (cohort D). The authors report recurrent alterations in canonical HNSCC drivers (e.g., TP53, CDKN2A, NOTCH1), nominate several potentially population/exposure-enriched hotspot mutations (e.g., TRIM48 p.I44T; POLQ p.A187T in high-TMB cases; MAP3K19 p.H1282Y; CDC20 p.R162Q), and identify recurrent copy-number changes including EGFR amplification and TP73-region deletion. They also propose a germline SDHA variant (p.S456L) as a potential South Asian susceptibility factor, motivated by its recurrence across cohorts and a trend toward younger diagnosis age among carriers.
Major issues
Cohort heterogeneity: The study merges (i) prospectively collected cohort A containing both FFPE and fresh biopsies, (ii) two external raw FASTQ cohorts (B/C), and (iii) TCGA "pre-annotated" MAF calls, while also mixing genome builds (hg38 for A/B/C vs hg19 for ICGC raw; plus liftover to run MutSig2CV), which creates strong potential for batch effects and non-biological differences in mutation discovery rates and spectra. This concern is amplified by the authors' own observation that FFPE samples have much higher mutation counts than fresh tissues, consistent with known FFPE-associated artifact risk, yet the analysis does not clearly document artifact-mitigation steps (e.g., deamination/OxoG filtering) beyond "standard" preprocessing.
Tumor mutation burden (TMB) definition appears unclear and possibly inflated: The manuscript reports a median somatic variant count per patient of ~1050 and a median TMB of ~21 mutations/Mb across cohorts A–C, with some patients having >14,000 mutations, but does not specify the exact callable/exome target size, inclusion/exclusion rules (synonymous? indels? PASS-only?), or artifact handling used to compute TMB. Given the stated FFPE inflation of mutations and variable depths across cohorts, the central TMB estimate (and downstream claims about "high TMB" subsets and POLQ association) is difficult to interpret without a transparent, cohort-stratified TMB pipeline and QC.
Minor issues
Reporting clarity and reproducibility gaps: Key parameters are missing or not explicit (e.g., Mutect2 settings and filters; panel-of-normals usage; contamination estimation; tumor purity estimates; handling of low-depth samples; thresholds for hotspot calling beyond "≥10 reads" in places). The rationale for restricting CNVs in cohorts B/C to those overlapping cohort A (and the effect on sensitivity/specificity) should be justified more explicitly.
Presentation/terminology: Several passages conflate "head and neck cancer," "HNSCC," and "oral cancer" without consistently defining the included subsites, while cohort C is tongue-only and cohort A/B are buccal-predominant, which may strongly shape mutational patterns.
Evidence appraisal
The evidence is moderate for describing a South Asian, largely HPV-negative WES-based mutational landscape and confirming frequent alteration of canonical HNSCC genes (TP53, CDKN2A, NOTCH1) within the aggregated cohorts. The evidence is limited for claims of cohort-specific "novel drivers," for POLQ p.A187T as a distinctive high-TMB marker, and for SDHA p.S456L as a susceptibility variant, because key confounding (batch/FFPE artifacts, cohort integration effects, lack of external control comparisons, limited replication/validation) is not fully resolved.
Recommendations for improvement
Tighten cohort harmonization: Re-run key analyses with a unified pipeline across A–C (and, if possible, reprocess TCGA raw BAM/FASTQ rather than using pre-annotated MAF), and report batch-aware sensitivity analyses (by cohort, site, stage, tissue type FFPE vs fresh, and sequencing depth).
Make TMB rigorous and interpretable: Define TMB precisely, report callable Mb per sample, provide cohort-stratified TMB distributions, and show how high-TMB cases were identified and whether they remain high under stricter filtering.
Temper clinical inferences: Present EGFR amplification and cetuximab-related statements as hypotheses, and add validation of EGFR CNV calls (e.g., FISH, qPCR, or SNP array) plus any available clinical correlations if claiming therapeutic relevance.
Competing interests
The authors declare that they have no competing interests.
Use of Artificial Intelligence (AI)
The authors declare that they did not use generative AI to come up with new ideas for their review.
-