Can LLMs Bridge Domain and Visualization? A Case Study on High-Dimension Data Visualization in Single-Cell Transcriptomics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

While many visualizations are built for domain users (e.g., biologists, machine learning developers), understanding how visualizations are used in the domain has long been a challenging task. Previous research has relied on either interviewing a limited number of domain users or reviewing relevant application papers in the visualization community, neither of which provides comprehensive insight into visualizations ``in the wild'' of a specific domain. This paper aims to fill this gap by examining the potential of using Large Language Models (LLM) to analyze visualization usage in domain literature. We use high-dimension (HD) data visualization in sing-cell transcriptomics as a test case, analyzing 1,203 papers that describe 2,056 HD visualizations with highly specialized domain terminologies (e.g., biomarkers, cell lineage). To facilitate this analysis, we introduce a human-in-the-loop LLM workflow that can effectively analyze a large collection of papers and translate domain-specific terminology into standardized data and task abstractions. Instead of relying solely on LLMs for end-to-end analysis, our workflow enhances analytical quality through 1) integrating image processing and traditional NLP methods to prepare well-structured inputs for three targeted LLM subtasks (\ie, translating domain terminology, summarizing analysis tasks, and performing categorization), and 2) establishing checkpoints for human involvement and validation throughout the process.The analysis results, validated with expert interviews and a test set, revealed three often overlooked aspects in HD visualization: trajectories in HD spaces, inter-cluster relationships, and dimension clustering.This research provides a stepping stone for future studies seeking to use LLMs to bridge the gap between visualization design and domain-specific usage.

Article activity feed