MCNP-Simulated HPGe Soil Spectra: A Public Dataset and Machine Learning Benchmark for Multi-Isotope Quantification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate identification and quantification of radionuclides in soil are essential for environmental monitoring, nuclear safety, and emergency response. Machinelearning methods operating on fullspectrum gammaray data offer a promising alternative to traditional peakbased analysis, but their development is impeded by the lack of large, publicly accessible training datasets that reflect realistic soil matrices and highresolution HPGe (High-Purity Germanium) detector characteristics. In this work, we introduce the first open Monte Carlo N-Particle transport (MCNP)based HPGe soil gammaray spectroscopy dataset, comprising 6000 simulated spectra covering 41 prevalent radionuclides across diverse activity levels. We benchmark four regression approaches — Ridge Regression, Extreme Gradient Boosting Regression, Multilayer Perceptron, and Convolutional Neural Network — on quantification tasks using a held-out test set. Linear and ensemble methods achieve robust baselines, successfully predicting over 95% of isotopes within ± 15% relative error, whereas the tested deeplearning architectures exhibit greater variability on lowintensity and overlappingpeak nuclides. These results demonstrate the dataset’s utility for reproducible research and highlight significant opportunities for architectural innovations and domainadaptation strategies to enhance deeplearning performance. We anticipate that this resource will catalyze the development of more accurate, generalizable machinelearning solutions for multiisotope activity quantification in environmental applications.