STARCall integrates image stitching, alignment, and read calling to enable scalable analysis of in situ sequencing data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Fluorescent in situ sequencing involves imaging-based sequencing by synthesis in intact cells or tissues to reveal target nucleotide sequences inside each cell. Often, the target sequences are barcodes that indicate a perturbation (e.g. CRISPR guide or genetic variant) delivered to the cell. However, processing in situ sequencing data presents a considerable challenge, requiring stitching and aligning tens of thousands of images with millions of cells, detecting small amplicon colonies across sequencing cycles, and calling reads. To address these challenges, we introduce STARCall: STitching, Alignment and Read Calling for in situ sequencing, a software package that analyzes raw in situ sequencing images to produce a genotype-to-phenotype mapping for each cell. STAR-Call improves upon previous solutions by combining stitching and alignment of images into a single step that minimizes both inter-cycle and intra-cycle alignment error. STARCall also improves detection and extraction of sequencing reads, incorporating filters and normalization to combat background fluorophore signal. We compare STARCall to other methods using a diverse set of images that include commonly encountered imaging problems such as variable intensity across channels and cycles and high levels of background. Specifically, this comprises ∼250,000 images from a pooled screen of ∼3,500 barcoded LMNA variants expressed in U2OS cells and ∼1,200 bar-coded PTEN variants in induced pluripotent stem cells (iPSC) and iPSC-derived neurons. Overall, STARCall aligned more than 50% of tiles with <1 pixel error on all nine image sets while alternative packages had higher error on four. STARCall also yielded an 8-40% increase in genotyped cells due to improved filtering and normalization methods that address background fluorescence. STARCall can call tools like CellPose to segment cells and CellProfiler to compute cell features from the phenotyping images. STARcall is open-source and freely available, providing a robust solution for the analysis of in situ sequencing data.
Author summary
Short regions of RNA or DNA can be sequenced inside intact cells or tissues (i.e. in situ ) by combining a microscope and sequencing by DNA synthesis. Multiple cycles of sequencing are performed, in which incorporation of a single fluorescently labeled nucleotide is imaged, and the corresponding base detected. Recently, in situ sequencing has proved useful in optical pooled screens, where a library of perturbations such as CRISPR-mediated gene knockouts or genetic variants is introduced into cells, and in situ sequencing is used to reveal the specific perturbation in each cell. However, processing in situ sequencing data presents a considerable challenge, requiring stitching and aligning tens of thousands of images, detecting small amplicon colonies across sequencing cycles, and calling reads. To address these challenges, we introduce STARCall: STitching, Alignment and Read Calling for in situ sequencing. STARCall uses a novel stitching algorithm that minimizes both inter-cycle and intra-cycle alignment error, and improved filters and normalization for base calling. When applied to a set of 9 in situ sequencing image sets, STARCall yielded an 8-40% increase in genotyped cells. STARCall is open-source and applicable to a variety of experiments, providing a robust pipeline for in situ sequencing data.