Human Detection of AI-Generated Faces and Voices is Not Domain-General

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human decision-making plays a critical role in distinguishing real from artificially-generated synthetic content. These decisions are particularly important when classifying faces and voices due to the rich personal and social information that they hold, meaning synthetic faces and voices, commonly known as "deepfakes" are frequently used for identity theft, financial fraud, and misinformation campaigns. It is currently unknown whether detection of real versus synthetic content is modality-specific, or whether it generalizes across sensory domains. We conducted a preregistered study in which participants classified real and AI-generated synthetic faces and voices. Overall classification accuracy was 65% for faces and 54% for voices. Using signal detection theory to analyze individuals' ability to classify stimuli, we observed no evidence of a domain-general effect, indicating that detection ability may not generalize across face and voice domains and is instead, domain-specific. Participants' confidence tracked accuracy for faces but not for voices, suggesting that metacognitive insight may also be modality-specific. Findings are discussed in terms of whether detection of synthetic content is underpinned by its own cognitive and social mechanisms, or whether classification is driven purely by domain-specific abilities. In applied settings, including forensic, legal, and security, it is vital to recognize that expertise in detecting synthetic content might be modality specific.

Article activity feed