A FAIR Resource Recommender System for Smart Open Scientific Inquiries

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A vast proportion of scientific data remains locked behind dynamic web interfaces – often referred to as the “deep web databases” – rendering it inaccessible to conventional search engines and standard web crawlers. This disconnect between data availability and machine usability hampers the goals of open science and automation. While registries such as FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR principles, they fall short of enabling seamless, programmatic access to the underlying datasets. In response, we present FAIRFind, a novel system designed to bridge this critical accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, irrespective of their adherence to FAIR compliance. Central to our approach is the introduction of the Deep Web Communication Protocol (DWCP), a resource description language capable of representing web forms, HTML tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind employs a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable access workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Our evaluation across multiple open-source LLMs and database types reveals over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances the capabilities of existing registries by transforming linked resources from static references into actionable endpoints, thereby laying a foundation for intelligent, autonomous data discovery across scientific domains.

Article activity feed