Linguistic Polarity and Decision Architecture in Large Language Model–Based Abstract Screening in the Dental Field
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) are increasingly investigated for abstract screening in sys-tematic reviews, yet it remains unclear whether screening errors attributed to linguistic complexity reflect intrinsic semantic limitations or the decision architecture in which the model is embedded. We investigated how five polarity variants of logically equivalent eli-gibility criteria—affirmative inclusion, antonymic exclusion, predicate negation, verb-level negation, and double negation—affect screening outcomes in a controlled biomedical task. Using 1,000 abstracts derived from a reconstructed Cochrane review corpus (50 eligible TARGET studies; 950 non-targets), we implemented four abstract-visible criteria within a sequential hard-gated pipeline, where failure at any step triggered irreversible exclusion. Under hard gating, linguistic polarity alone produced substantial sensitivity shifts. For GPT-5.1, recall ranged from 0.72 to 0.32 despite identical logical predicates and input da-ta. Replication with GPT-3.5 Turbo yielded a similar polarity-dependent divergence (recall range 0.92–0.18), confirming that the effect generalizes across model generations. TAR-GET losses were highly concentrated at criteria frequently satisfied but inconsistently re-ported in abstracts, consistent with conservative exclusion under evidential un-der-specification. To assess whether this effect was semantic or architectural, we reim-plemented screening using a scoring-based evidence-accumulation framework in which each criterion contributed graded support (YES/NO/UNCLEAR) and inclusion was de-termined by a tunable score threshold. Scoring substantially reduced polarity-driven recall divergence and transformed it into an explicit precision–recall trade-off. These findings indicate that negation sensitivity in LLM screening is strongly mediated by decision ar-chitecture: irreversible Boolean gating amplifies linguistic asymmetries under uncertainty, whereas cumulative scoring preserves uncertainty and enables controllable operating points.