Sample Size Reporting in Human Cancer Microbiome Research is Inconsistent and Unstandardized

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transparent and standardized sample size reporting is essential for reproducibility in microbiome research, and for enabling AI-driven data aggregation tools. This is of particular importance with regards to cancer microbiome studies, where microbial profiles inform diagnosis, prognosis, and treatment responses. However, variation in how studies define and report analytical sample sizes remains largely unexamined. Here, we collect and analyze 2,305 human microbiome studies from a comprehensive literature view spanning a wide range of diseases. From this, we derive a subset of 67 clinical cancer microbiome studies to determine how standardized sample size reporting is across the literature. Each paper was evaluated for whether it explicitly stated a sample size using “n=” notation, where this appeared in the manuscript, and what the “n=” corresponded to (i.e. biological samples or participants). Of the studies examined, 91.0% verbally stated a sample size in the manuscript text, yet only 37.3% (95% CI: 26.7–49.3%) included explicit “n=” notation tied to biological sample count. Moreover, most “n=” statements were located in the main text (64% in Results) rather than the abstract (8%), potentially reducing visibility for machine-reading or AI-based text extraction. Furthermore, the meaning of “n=” varied considerably between studies, referring inconsistently to participants and/or samples. These findings highlight a lack of standardization in sample size reporting that may undermine reproducibility and hinder large-scale, automated data aggregation, underscoring the need for clearer sample size reporting conventions in future studies.

Article activity feed