What are People Talking about in High-Dimension Data Visualization? LLM-supported Analysis of Domain Literature
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Visualizing high-dimensional (HD) data is a common yet challenging task in various application domains. Previous surveys on HD visualization are conducted either in the visualization community or through the interview of a limited number of domain users. A comprehensive understanding of the usage of HD visualizations in the wild is missing. To fill this gap, we analyzed more than 1,000 papers from one representative domain (single-cell transcriptomics) that extensively employed HD data visualizations. To effectively analyze this extensive corpus filled with highly domain-specific terminologies, we propose a pipeline to effectively collaborate with an LLM annotator on interpreting and summarizing the usage of HD visualizations in the collected papers.This pipeline includes machine learning techniques for figure detection, traditional NLP methods for text cleaning, and LLM prompt engineering for nuanced interpretation.With this pipeline, we categorized HD visualization based on how users referred to and mentioned these visualizations in their papers.We then discussed representative visualizations for each category, as well as current practices and potential wrong uses.These analyses can assist the visualization community in designing and evaluating future HD visualizations.