PED-X-Bench: A Benchmark of Adult-to-Pediatric Extrapolation Decisions in FDA Drug Labels
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Pediatric trials are ethically and logistically difficult, so the U.S. FDA often extrapolates adult data to children when justified. Yet no public resource systematically documents these decisions. We present PED-X-Bench , the first dataset and benchmark that encodes FDA pediatric-extrapolation outcomes as a four-way classification task ( Full, Partial, None, Unlabeled ). PED-X-Bench contains 737 FDA drug-label sections (≈ 1 M words of source text) for approvals issued 2007–2024 across all therapeutic areas. A two-stage o3-mini prompting pipeline mined full FDA label text; nine domain reviewers then adjudicated a stratified sample of 135 labels yielding an accuracy F1 of 0.74 and 0.63 respectively (inter-annotator κ = 0.678) and spot-checking the remainder. For every drug we release the ground-truth label, concise efficacy and pharmacokinetic/safety summaries, and harmonized study metadata. To showcase utility we release two baseline models: (i) a logistic-regression classifier that uses structured metadata from FDA’s pediatric-labeling dataset, and (ii) a fine-tuned BigBird BERT that ingests full label text. Both base-lines perform modestly, leaving ample headroom for future work. PED-X-Bench enables research on pediatric drug development, clinical NLP and drug safety; dataset card and code are made available here: github.com/tatonetti-lab/PedXBench huggingface.co/datasets/apoorvasrinivasan/Ped-X-Bench