Deep learning for mass extinction detection on fossilized phylogenies: power, limitations, and lessons for simulation-based phylodynamic inference
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Detecting mass extinction events from phylogenies is a fundamental yet challenging task. While traditional likelihood-based methods are available, deep learning offers a powerful, simulation-based alternative. Here, we evaluate a deep learning approach using a novel hybrid model that combines Graph Neural Networks with Long Short-Term Memory networks. This model analyzes phylogenies—containing both extant species and fossils—simulated under a complex skyline Fossilized Birth-Death model that incorporates mass extinctions and fluctuating background rates. We validate the architecture's effectiveness through ablation studies. Our investigation revealed that the stochasticity of the simulation was a primary obstacle, creating significant "label noise" that initially limited performance. A direct comparison showed our deep learning approach performed slightly better than Bayesian methods. It is robust to uncertainty in phylogenetic branch lengths and topology and generalizes to larger trees, but its performance degrades under model mismatch with higher background extinction rates. However, our work highlights a critical limitation: the model is highly specific to the definition of mass extinction it was trained on. Consequently, any modification to this definition necessitates retraining a new model from scratch. We conclude by summarizing the challenges and lessons learned for simulation-based inference in phylodynamics.