A FAIR Resource Recommender System for Smart Open Scientific Inquiries

Syed N. Sakib
Sajratul Y. Rubaiat
Kallol Naha
Hasan H. Rahman
Hasan M. Jamil

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains.

Version published to 10.3390/app15158334
Jul 26, 2025
Version published to 10.20944/preprints202506.0282.v1
Jun 4, 2025

QModel: A Time-Aware GitHub Mining Framework for Empirical Software Quality Studies

This article has 1 author:
1. Dmytro Polishchuk
This article has no evaluationsLatest version Jan 12, 2026
Standardized API Call Protocols for implementing Federated Learning in FAIRDatabase

This article has 3 authors:
1. Sem de Regt
2. Roland V. Bumbuc
3. Vivek M. Sheraton
This article has no evaluationsLatest version Jan 27, 2026
BH25DE report: On the path to machine-actionable training materials

This article has 11 authors:
1. Phil Reed
2. Nick Juty
3. Petra Steiner
4. Leyla Jael Castro
5. Charles Tapley Hoyt
6. Oliver Knodel
7. Martin Voigt
8. Roman Baum
9. Dilfuza Djamalova
10. Jacobo Miranda
11. Alban Gaignard
This article has no evaluationsLatest version Jan 26, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

QModel: A Time-Aware GitHub Mining Framework for Empirical Software Quality Studies

Standardized API Call Protocols for implementing Federated Learning in FAIRDatabase

BH25DE report: On the path to machine-actionable training materials