A regulatory medical device dataset with risk labels and an image-linked subset from the NMPA registry

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present NMPA-MedDevice, a regulatory dataset derived from China's National Medical Products Administration (NMPA) Unique Device Identification (UDI) registry. The release comprises four components: (1) a frozen raw snapshot of the NMPA UDI registry (66,472 records, July 2024); (2) a reproducibly cleaned text-and-metadata corpus of approximately 52,000 unique device records with risk class labels deterministically derived from the ninth character of the NMPA registration number; (3) a curated image-linked subset of 1,005 devices (Class I/II/III, 39/462/504) with precomputed text and image feature embeddings; and (4) an external temporal validation set of 300 devices from a later registry update (October--November 2025). All textual data, derived labels, the cleaned corpus, preprocessing scripts, dataset splits, and precomputed features are publicly deposited. Raw product images are not redistributed due to copyright restrictions; precomputed embeddings and image retrieval scripts are provided instead.

Article activity feed