LinkPulse: A Hybrid Retrieval-Augmented Generation Platform for Autonomous Multi-Source Knowledge Synthesis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Retrieval-Augmented Generation (RAG) has emerged as a principled approach to grounding large language model (LLM) responses in external knowledge, yet deployed systems face three persistent limitations: (1) single-modality retrieval that degrades on semantically diverse queries; (2) fragmented knowledge sources that cannot unify static corpora with real-time web signals; and (3) monolithic architectures that resist component-level optimisation and scaling. We propose LinkPulse, an autonomous knowledge-synthesis platform that addresses these limitations through three technical contributions. First, the Tri-Modal Fusion Retriever (tmfr) dynamically weights dense vector similarity, sparse BM25 keyword matching, and live web-search signals via a lightweight learned gating network. Second, the Multi-Agent Ingestion Pipeline (maip) abstracts heterogeneous sources—web pages, GitHub repositories, PDFs, and multimedia—into a unified vector-indexed knowledge store with source-aware provenance tracking. Third, the Adaptive Context Window (acw) combines cross-encoder re-ranking with extractive sentence compression to reduce context dilution before generation. We evaluate LinkPulse on LinkPulse-Bench, a curated multi-domain benchmark of 150 queries spanning 1,000+ indexed documents across three content categories. On our benchmark, LinkPulse achieves an F1 score of 97.1, outperforming TF-IDF (+13.3 points), vector-only RAG (+3.4 points), and Self-RAG (+1.2 points), while reducing mean response latency to 230 ms and an observed hallucination rate of 3.2% on our test set. Ablation experiments confirm that each component contributes independently to performance. We release LinkPulse-Bench and all code to support reproducibility.