BPformer: An Interpretable Deep Learning Framework for Livestock Breed Proportion Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction : Breed proportion analysis plays a crucial role in cattle genetic resource conservation and breeding improvement. With the rapid development of genomic technologies, breed proportion prediction based on single nucleotide polymorphisms (SNPs) has become a current research hotspot. However, existing methods still face challenges such as insufficient interpretability and the urgent need for feature engineering. Methods : This study developed the BPformer model, which combines convolutional neural networks and self-attention mechanisms, specifically designed for livestock breed proportion prediction. We utilized SNP data from 15 Chinese indigenous cattle breeds and 12 foreign commercial breeds, employing 39,868 high-quality SNPs loci as the gold standard dataset. Dimensionality-reduced datasets were constructed through four feature selection methods (F ST , In, BP_AVE, and BP_GRA). The study compared the performance of BPformer against traditional machine learning models (SVR, KNR, and RF) and other deep learning models (MLP, and CNN) on the dimensionality-reduced datasets, while performance evaluation of the three deep learning models was conducted on the gold standard dataset. Results : BPformer outperformed other models across all four detection methods with BBP SNPs = 4,000 and in the gold standard testing scenarios. Through attention mechanism visualization and SHAP value analysis, we identified key SNPs loci that contributed most significantly to the prediction of each breed proportion component, thereby enhancing the model's interpretability. Conclusion : BPformer effectively addresses the interpretability challenges faced by traditional methods from a modeling perspective and can efficiently capture long-range dependencies among SNPs loci. This provides a powerful tool for Chinese cattle breed resource conservation and genomic selection breeding, which is of great significance for maintaining genetic diversity in Chinese livestock industry.