Authenticating Matryoshka Nesting Dolls via MML-LLM-Zero-shot 3D Reconstruction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This work presents a multimodal machine learning (MML) pipeline with zero-shot 3D completion for the digital preservation and authentication of Matryoshka nesting dolls (MND). A private collection is digitized as this novel multimodal dataset centered on turntable videos, augmented with single and group images and auxiliary physical and textual cues. A text modality is produced using Qwen3VL captions to enable video-text fusion and semantic motif analysis. A unimodal 2D baseline is established for fine-grained 8-way style recognition and a 3-way authenticity task, and is compared against multimodal configurations that incorporate learned text embeddings. To incorporate geometry as direct evidence, the pipeline integrates a silhouette-to-skeleton branch based on the Blum medial axis (BMA) and a convolutional autoencoder (CA) that reconstructs dense silhouettes from sparse skeletons, yielding a compact representation suitable for downstream 3D reasoning. The 3D pipeline is implemented along two complementary branches: zero-shot completion with a pretrained 3D prior (Hunyuan3D) and mesh-oriented skeletonization via a custom BMA procedure. Late fusion combines geometric and textual signals to improve decision confidence beyond appearance-only models. The framework supports authentication decisions with explicit geometric and semantic evidence and is transferable to other cultural artifacts. Potential applications include AR/VR, education, gaming, and assistive technologies. The code for this project is available upon request.

Article activity feed