A Retrieval Model with Contextual Correlation Analysis for Verbose Queries

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Retrieving relevant documents using verbose queries is a key challenge in information retrieval, as such queries often include extraneous terms. Traditional retrieval models treat all query terms equally, which limits their effectiveness. Existing methods for verbose queries are typically supervised or rely on costly two-stage ranking pipelines.We propose a fully unsupervised, single-phase retrieval model that estimates the centrality of each query term by analyzing its contextual correlation with the entire query. A fully connected term graph is constructed, where edge weights capture the relative correlation of each term with the query context compared to others. Centrality scores are computed via power iteration over this graph. Dense representations of query terms and context are obtained using a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model.To further reduce the influence of non-informative document terms, an additional weight based on term information content is introduced. These two weights are combined and integrated into a modified Markov Random Field Sequential Dependence Model (SDM) for ranking.Experiments show that our model outperforms unsupervised baselines, performs comparably to supervised baselines, and surpasses several neural rankers in zero-shot settings. Comparable results with both GloVe and BERT embeddings highlight its embedding independence nature. The model shows larger gains on longer queries, modest improvements on shorter ones, but never underperforms SDM.Therefore, the model’s independence from relevance judgments and top-ranked documents, along with its consistent, embedding-agnostic performance across query lengths, makes it well-suited for low-resource scenarios.

Article activity feed