TeLL-Me What You Can’t See: A Vision-Language Framework for Forensic Mugshot Augmentation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

During criminal investigations, the availability of images depicting persons of interest directly influences the success of an identification procedures. However, law enforcement agencies often face challenges related to the scarcity of high-quality images or their obsolescence, which can affect the accuracy and success of people-search processes. This paper introduces a novel forensic mugshot augmentation framework aimed at addressing these limitations. In order to assist law enforcement in identification procedures, our approach enhances visual evidence by creating synthetic, high-quality pictures through customizable data augmentation techniques. These are enabled by the combination of generative AI models and structured to preserve biometric identity and visual coherence with respect to the original data. Experimental results demonstrate that our method consistently enriches multimedia data quality for forensic identification and provides several robust enhancements across multiple investigative scenarios. Such effectiveness has been validated by means of both vision-based and evidence-based metrics supporting its potential as a tool for law enforcement applications. Attribute extraction reached 84.1% (+2.3 percentage points over the original mugshots), and re-identification indicated strong identity preservation: similarity for same-subject pairs ~ 0.89 vs. ~ 0.19 for different-subject pairs. These results suggest that the framework reliably extracts and leverages the required target characteristics, with no notable hallucinations observed.

Article activity feed