Beyond Accuracy: Reliability-Aware Cross-Farm Evaluation of Dairy Cow Vocalization Models

Mayuri Kate
Suresh Raja Neethirajan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automated analysis of dairy cow vocalizations has largely relied on supervised classifiers evaluated within a single farm, a setting that inflates apparent performance and gives no measure of how far predictions can be trusted. We address this with a three-layer framework that separates acoustic structure discovery, proxy-state inference, and reliability assessment, evaluated on 569 annotated clips from three commercial dairy farms. A frozen self-supervised speech encoder, latent-space segmentation, and stability-guided clustering convert continuous recordings into discrete acoustic units without behavioral labels. Proxy-state signal is then tested under audio-only, audio-plus-context, and leave-one-farm-out (LOFO) protocols designed to separate transferable acoustic structure from farm-specific shortcuts. The results suggest that cross-farm generalizability differs substantially across biologically distinct vocalization categories. Non-vocal physiological sounds transfer across farms (LOFO macro-F1 = 0.763) and calibrate well (expected calibration error reduced from 0.087 to 0.023), whereas resource-related calls collapse to a majority-class baseline (macro-F1 = 0.500) and distress-related calls degrade under farm holdout. Selective prediction improves the retained-set score of the multiclass functional proxy (0.407 to 0.430), and an end-to-end convolutional baseline matches or exceeds the framework on raw accuracy for the easier targets yet yields a roughly two- to six-fold larger calibration error and offers no abstention. Random cross-validation consistently overstates cross-farm utility. These findings show that acoustic models for livestock monitoring require reliability-aware evaluation rather than flat classification.

Version published to 10.64898/2026.06.17.732832 on bioRxiv
Jun 22, 2026

Towards A Foundation Model for Clinical Voice Biomarkers

This article has 8 authors:
1. Olivier Elemento
2. Alexandros Sigaras
3. Joseph T. Colonel
4. Iman Hajirasouliha
5. Satrajit S. Ghosh
6. Yael Bensoussan
7. Bridge2AI-Voice Consortium
8. Anaïs Rameau
This article has no evaluationsLatest version May 30, 2026
Bridging Acoustic and Semantic Spaces for Interpretable Voice Scoring via Zero-Shot Semantic Expansion

This article has 4 authors:
1. Chi Hsiao
2. Yuan-Ren Cheng
3. Chung-Yao Yang
4. Fu-Shun Hsu
This article has no evaluationsLatest version Jun 1, 2026
Shorter FFT Windows Improve Cross-Domain Generalization in CNN-Based Cetacean Whistle Detection: A Controlled Sensitivity Analysis

This article has 1 author:
1. Rocco De Marco
This article has no evaluationsLatest version May 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Towards A Foundation Model for Clinical Voice Biomarkers

Bridging Acoustic and Semantic Spaces for Interpretable Voice Scoring via Zero-Shot Semantic Expansion

Shorter FFT Windows Improve Cross-Domain Generalization in CNN-Based Cetacean Whistle Detection: A Controlled Sensitivity Analysis