Computer Vision Scoring of Figure Copy and Recall

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

Figure copy and recall tests are sensitive measures of visuoconstruction and visual episodic memory, but their clinical is constrained by labor-intensive manual scoring. We developed and validated an automated, element-level scoring pipeline using Vertex AI object detection for the tablet-based figure copy and recall tasks in the California Cognitive Assessment Battery (CCAB). The automated scoring pipeline duplicated the scoring procedures used by expert manual raters.

Methods

A normative sample of 2,011 community-dwelling adults aged 18-90 completed figure copy and delayed recall trials at baseline, with subsamples retested at 1 day and at 6, 18, and 30 months. Participants completed the drawings with their index finger on a tablet computer with finger position digitized to analyze the speed and timing of individual drawing strokes A convolutional object-detection model trained on the Vertex AI AutoML Vision platform identified each of twelve canonical figure elements in rendered drawings. Separate element presence and location scores were computed after homographically warping drawings onto a canonical template to produce trial-level Element, Location, and Total scores. To compare Vertex and human scores, Vertex AI and expert human raters independently scored 1500 randomly selected drawings to evaluate inter-rater agreement, including a common subset of 100 drawings scored by Vertex AI and all raters.

Results

Total scores were virtually indistinguishable (r = 0.966) from human-human agreement (mean r = 0.971) as were Element presence scores (mean r = 0.959 vs. r = 0.963). Location-score agreement (r = 0.951) was slightly below the human-human mean (r = 0.972) due to pixel-level analysis by Vertex AI that was impossible for human raters. The Vertex pipeline showed no preferential advantage for the single expert rater who categorized Elements during training. Automated scores showed strong demographic gradients, age effects on Recall (r = -0.32) were approximately twice those in Copy conditions (r = -0.16). A Memory Cost score (Recall - Copy) showed a monotonic age-related decline from +0.40 z in the youngest subjects to -0.54 z in the oldest. Kinetic analysis revealed that drawing speed and efficiency showed significant age-related changes. Overnight test-retest reliability was high (Recall r = 0.72) and the Recall trial showed a large overnight learning effect (&[Delta] = +1.18) that continued with repeated tests up to 30 months (&[Delta] = +0.75).

Conclusions

The computer vision pipeline described here preserves the element-level structure of expert manual scoring and recovers the memory-specific clinical signals of the delayed recall trial. Overall agreement with human rates was approximately an order of magnitude greater than previously described automated scoring approaches. Automatic, computer vision scoring of figure copy and delayed recall removes scoring barriers that impede the more extensive use of figure copy and recall tests.

Key Findings

  • A fully automated computer vision pipeline incorporated Vertex AI to score element presence and location in a figure copy and recall task in 9,117 drawings from 2,011 cognitively normal adults,

  • In validation samples, score agreement between Vertex AI score and three expert human raters was indistinguishable from human–human agreement (r = 0.966 vs. r = 0.971).

  • Age effects on delayed Recall scores were twice those of Copy scores (r = −0.32 vs. r = −0.16). As a result, a Memory Cost score (Recall − Copy in z units) declined monotonically with age from +0.40 z in the youngest stratum to −0.54 z in the oldest.

  • Test–retest reliability was high at a 1-day retest interval for Recall (r = 0.72) but obscured by ceiling effects for Copy (r = 0.48). The Recall trial showed large overnight learning effect (Δz = +1.18) that persisted up to 30 months (Δz = +0.75).

  • A parallel motor-temporal analysis of drawing time, stroke count, and pause structure revealed that older subjects produced jerkier strokes (r = 0.19) and showed longer intersegment pauses during recall (r = 0.19).

Article activity feed