A scalable and modular automated pipeline for stitching of large electron microscopy datasets

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Mahalingam et al. report on a new software suite, ASAP, the assembly stitching and alignment pipeline, capable of montaging and aligning serial sections at a speed leading to total time shorter than image acquisition time. The software applies to both electron microscopy and array tomography, and more generally to any data set consisting of collections of 2D images in need of in-section montaging and cross-section registration. The result is a coarsely registered volume, ready for refining with existing software suits such as SEAMLESS by Macrina et al. (2021) towards subsequent processing, such as image segmentation and neuronal arbor reconstruction for cellular connectomics. This paper will be of special interest to researchers within the field of connectomics, but also to the broad class of scientists who perform large-scale microscopy. The establishment of fast, reliable and scalable image alignment software to process the millions of images produced by modern microscopes at the same speed as they are acquired is key to accelerate research in neuroscience and other fields. The key claims of the manuscript are well supported by the data, and the approaches used are thoughtful and rigorous.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Serial-section electron microscopy (ssEM) is the method of choice for studying macroscopic biological samples at extremely high resolution in three dimensions. In the nervous system, nanometer-scale images are necessary to reconstruct dense neural wiring diagrams in the brain, so -called connectomes . The data that can comprise of up to 10 8 individual EM images must be assembled into a volume, requiring seamless 2D registration from physical section followed by 3D alignment of the stitched sections. The high throughput of ssEM necessitates 2D stitching to be done at the pace of imaging, which currently produces tens of terabytes per day. To achieve this, we present a modular volume assembly software pipeline ASAP (Assembly Stitching and Alignment Pipeline) that is scalable to datasets containing petabytes of data and parallelized to work in a distributed computational environment. The pipeline is built on top of the Render Trautman and Saalfeld (2019) services used in the volume assembly of the brain of adult Drosophila melanogaster (Zheng et al. 2018). It achieves high throughput by operating only on image meta-data and transformations. ASAP is modular, allowing for easy incorporation of new algorithms without significant changes in the workflow. The entire software pipeline includes a complete set of tools for stitching, automated quality control, 3D section alignment, and final rendering of the assembled volume to disk. ASAP has been deployed for continuous stitching of several large-scale datasets of the mouse visual cortex and human brain samples including one cubic millimeter of mouse visual cortex (Yin et al. 2020); Microns Consortium et al. (2021) at speeds that exceed imaging. The pipeline also has multi-channel processing capabilities and can be applied to fluorescence and multi-modal datasets like array tomography.

Article activity feed

  1. Evaluation Summary:

    Mahalingam et al. report on a new software suite, ASAP, the assembly stitching and alignment pipeline, capable of montaging and aligning serial sections at a speed leading to total time shorter than image acquisition time. The software applies to both electron microscopy and array tomography, and more generally to any data set consisting of collections of 2D images in need of in-section montaging and cross-section registration. The result is a coarsely registered volume, ready for refining with existing software suits such as SEAMLESS by Macrina et al. (2021) towards subsequent processing, such as image segmentation and neuronal arbor reconstruction for cellular connectomics. This paper will be of special interest to researchers within the field of connectomics, but also to the broad class of scientists who perform large-scale microscopy. The establishment of fast, reliable and scalable image alignment software to process the millions of images produced by modern microscopes at the same speed as they are acquired is key to accelerate research in neuroscience and other fields. The key claims of the manuscript are well supported by the data, and the approaches used are thoughtful and rigorous.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    The software is open source and runs in any of: a desktop computer, a computer cluster, or a cloud computing cluster. The approach minimizes costly image I/O by storing extracted image features and their correspondences to those of other images, both in-section and cross-section. Costly image data storage is minimized by avoiding image duplication in disk, with images being rendered on demand by loading images from disk and applying transformations on the fly.

    ASAP software can operate concurrently with image acquisition, therefore fulfilling a most sought after feature: avoiding slowing down connectomics.

    The basic steps are first computing the lens distortion correction, specific of each microscope. Second extracting scale invariant image features, which are used to estimate affine transformations for in-section and coarse cross-section alignment. Third estimating elastic section-wide 3D transformations. All operations run on scaled down versions of the images for best performance and also to match the effective feature size across multiple adjacent sections. The overall approach builds on prior work by Kaynig et al. (2010) Saalfeld et al. (2010, 2012) and Cardona et al. (2012) among others, with an implementation impressively capable of scaling to petabytes of images.

  3. Reviewer #2 (Public Review):

    The authors of this paper propose a high-throughput software pipeline to stitch and align millions of microscopy images, which is scalable to petabyte-sized datasets and can be executed in distributed computer environments. Although the software was originally designed for electron microscopy (EM) images and research on connectomics, it has been successfully tested in other image modalities. This is an impressive engineering work that opens the door to the analysis of previously unfeasible large-scale datasets by facilitating their assembling at the same pace of acquisition, or even faster. Despite the fact that the resources needed for projects of these dimensions are nowadays only accessible by a few labs and institutions in the world, the proposed software pipeline and tools will have an impact in more modest (but also large-scale) bioimage analysis research. The quality control tools provided within the software are crucial in a framework like this, where errors can get easily propagated and thus human proofreading needs to be facilitated.

    The quality of the resulting assembled datasets together with their processing speed proves the suitability of the proposed method. Overall, this is an exciting paper which makes significant contributions to the field of connectomics and large-scale bioimage analysis in general.

  4. Reviewer #3 (Public Review):

    The authors present a scalable workflow for the stitching of petabyte-sized image data. This is a relevant achievement necessary for the processing of large electron microscopy data sets occurring for example in the field of neuronal connectomics, where sufficiently large brain regions must be imaged for biologically meaningful results. The key concept in the approach is to modularize the individual processing steps and manage the transformation metadata for each tile separately in a database structure, enabling massive parallelization. The authors demonstrate that the workflow runs fast, automated and robust with inbuilt quality control algorithms and can be deployed on scalable (cloud-based) hardware infrastructure.

    While the presented software tools are of very high quality and great utility, the challenge of, in practice, setting up the same soft- and hardware infrastructure in other labs is not to be underestimated and will likely require technically skilled and dedicated personnel.

    For non-experts, the article in its current form is partially challenging to read as it requires detailed technical knowledge of previous publications or documents on GitHub repositories. A recommendation would be to rewrite the article to differentiate between the expected readerships. The manuscript could start with a section discussing the challenges and conceptual solutions along with an example data set. This first section could inform the interested reader whether the presented solutions are of general interest, e.g. for the reader's host institution or microscopy facility without going into any technicalities such as abbreviated software tools or discussing specific client-server architectures. Then, a second section should follow where the actual implementations and the respective software tools that solve bespoke challenges are introduced. A third section may then outline what it takes to actually implement the software stack on a specific hardware infrastructure. Such a structure will make the great achievements of this work more accessible to different readerships with more or less software development background and/or interest.