Assessing the reliability of immunofluorescence image analysis with artificial intelligence
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In view of the outstanding progress of machine learning (ML) and growing cost of health systems, it is a current challenge to incorporate artificial intelligence tools into actual medical practice. Here we explored the feasibility and reliability of using machine learning to perform an important immunological investigation that currently requires experienced biologists : Anti-nuclear cytoplasmic antibodies (ANCAs) are important markers for vasculitis and they may be evidenced by microscopic examination of cells labeled with patients’ sera. The use of a reliable ML classifier to discriminate between positive and negative samples would increase the rapidity and decrease the cost of immunofluorescence-based ANCA detection.
Here, we tested seven well-documented ML algorithms, ranging from simple models such as k nearest neighbors to more complex convolutional neural networks involving millions of adjustable parameter. We studied the feasibility and reliability of classifying 1114 serum samples that had been collected for about 3 years and assayed with conventional procedure. We compared four strategies consisting of assaying either whole microscope fields or individual cell images, and natural images or histograms. The following conclusions were obtained : (i) Several different strategies allowed us to build models stable enough to discriminate between positive and negative samples collected during about 27 months, with a comparison to human classification yielding a kappa index of about 0.7, that may be considered as fairly good and intermediate between the performance of junior and senior biologists. (ii) Simpler ML models combined with theoretical thinking might provide the most rapid and efficient way of developing a reliable test within the framework of a single institution. (iii) In addition, the interpretability of the simplest model provided some theoretical insight into important classification parameters. (iv) An important point and caveat is that the multiplicity and versatility of currently available tools make it an essential requirement to test repeatedly a given model, that must be chosen as simple as possible, to achieve a reliability compatible with medical use.
It is concluded that our study provides a strong incentive to incorporate ML tools in well defined medical tests, which might reduce the risk of human errors and pave the way to fully automatic procedures.