Pipeline for FlowCam data processing with modular open-source software and optional machine learning classification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
1.FlowCam is an established instrument for Flow Imaging Microscopy (FIM) widely used in plankton research. FIM significantly increases the number of organisms that can be detected and measured while reducing the analysis time and individual human bias compared to classical microscopy. However, processing the large number of images produced still poses a challenge. Workflows with VisualSpreadsheet (VSP), the commercial software licensed with the FlowCam instruments, offers options for image analysis, but is limited in implementing machine learning-supported classification and has costs for each necessary licence.2.Therefore, we developed a freely available modular pipeline for processing FlowCam data. First, a preprocessing Python script normalises the output of different VSP versions and detects and labels imaging artefacts (air bubbles, beads, duplicate images). The size range of particles relevant for analysis – defined as objects – is determined based on user-defined thresholds. The preprocessed data is summarised in a CSV file that can be opened in LabelChecker, an open-source, cross-platform program, which enables the validation of labels generated by the preprocessing step and allows for manual annotation of FlowCam images for further processing.3.The processing pipeline can be paired with machine learning approaches for automatic object classification. Annotated objects can be used to train a shallow, multi-input classification model that uses both images (convolutional neural network) and the properties measured by the FlowCam (multilayer perceptron). The resulting classified data can be analysed in various ways, including taxonomic and trait-based approaches. 4.We demonstrate a robust workflow of this pipeline with a FlowCam-derived plankton dataset, achieving 86% accuracy in classification by a machine learning model, despite using a small training dataset with high class imbalance, heterogenous classes, and variable measured features. Focussing on accessibility, this pipeline (preprocessing, LabelChecker, and machine learning) paves the way for fast and reproducible plankton analysis by FIM, to enable high throughput analyses adaptable to a wide range of plankton studies.