GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
Abstract
Background
The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena.
Results
We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study.
Conclusions
GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.
Article activity feed
-
Now published in GigaScience doi: 10.1093/gigascience/giaa127
Miroslav Kratochvíl 1Institute of Organic Chemistry and Biochemistry, Prague, Czech Republic2Department of Software Engineering, Faculty of Mathematics and Physics, Charles university, Prague, Czech RepublicFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Miroslav KratochvílFor correspondence: exa.exa@gmail.comOliver Hunewald 3Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLaurent Heirendt 4Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, Belvaux, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this …
Now published in GigaScience doi: 10.1093/gigascience/giaa127
Miroslav Kratochvíl 1Institute of Organic Chemistry and Biochemistry, Prague, Czech Republic2Department of Software Engineering, Faculty of Mathematics and Physics, Charles university, Prague, Czech RepublicFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Miroslav KratochvílFor correspondence: exa.exa@gmail.comOliver Hunewald 3Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLaurent Heirendt 4Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, Belvaux, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Laurent HeirendtVasco Verissimo 4Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, Belvaux, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteJiří Vondrášek 1Institute of Organic Chemistry and Biochemistry, Prague, Czech RepublicFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Jiří VondrášekVenkata P. Satagopam 4Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, Belvaux, Luxembourg5ELIXIR Luxembourg, University of Luxembourg, Campus Belval, Belvaux, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Venkata P. SatagopamReinhard Schneider 4Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, Belvaux, Luxembourg5ELIXIR Luxembourg, University of Luxembourg, Campus Belval, Belvaux, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Reinhard SchneiderChristophe Trefois 4Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, Belvaux, Luxembourg5ELIXIR Luxembourg, University of Luxembourg, Campus Belval, Belvaux, LuxembourgFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Christophe TrefoisMarkus Ollert 3Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg6Department of Dermatology and Allergy Center, Odense Research Center for Anaphylaxis, Odense University Hospital, University of Southern Denmark, Odense, DenmarkFind this author on Google ScholarFind this author on PubMedSearch for this author on this site
A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa127 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
These peer reviews were as follows:
Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102479 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102480
-
-