LightPFP: A Lightweight Route to Ab Initio Accuracy at Scale
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Machine learning interatomic potentials bridge the gap between quantum-mechanical accuracy and scalable molecular simulations. However, a fundamental trade-off persists: universal MLIPs (u-MLIPs) offer broad transferability but are computationally expensive, whereas task-specific MLIPs (ts-MLIPs) achieve high efficiency at the expense of costly DFT training data. Here we introduce LightPFP, a data-efficient knowledge distillation framework. Instead of relying on expensive DFT calculations, LightPFP constructs a distilled ts-MLIP by using a u-MLIP to generate high-quality, material-specific training data, while employing a pre-trained lightweight model to further enhance data efficiency. Across diverse material systems, LightPFP reduces model development time by approximately three orders of magnitude compared with DFT‑based workflows, while maintaining DFT‑level accuracy. The distilled models deliver 50–150× higher computational efficiency than u-MLIPs. When the teacher exhibits systematic bias, LightPFP supports few-shot precision correction using as few as 10 high‑accuracy DFT configurations, as demonstrated by reproducing MgO’s experimental melting point. This u-MLIP-driven distillation establishes a scalable route to high-fidelity, data-efficient MLIPs, accelerating the pace of materials discovery and design.