Mag-Net: Rapid enrichment of membrane-bound particles enables high coverage quantitative analysis of the plasma proteome

This article has been Reviewed by the following groups

Read the full article

Abstract

Membrane-bound particles in plasma are composed of exosomes, microvesicles, and apoptotic bodies and represent ∼1-2% of the total protein composition. Proteomic interrogation of this subset of plasma proteins augments the representation of tissue-specific proteins, representing a “liquid biopsy,” while enabling the detection of proteins that would otherwise be beyond the dynamic range of liquid chromatography-tandem mass spectrometry of unfractionated plasma. We have developed an enrichment strategy (Mag-Net) using hyper-porous strong-anion exchange magnetic microparticles to sieve membrane-bound particles from plasma. The Mag-Net method is robust, reproducible, inexpensive, and requires <100 μL plasma input. Coupled to a quantitative data-independent mass spectrometry analytical strategy, we demonstrate that we can collect results for >37,000 peptides from >4,000 plasma proteins with high precision. Using this analytical pipeline on a small cohort of patients with neurodegenerative disease and healthy age-matched controls, we discovered 204 proteins that differentiate (q-value < 0.05) patients with Alzheimer’s disease dementia (ADD) from those without ADD. Our method also discovered 310 proteins that were different between Parkinson’s disease and those with either ADD or healthy cognitively normal individuals. Using machine learning we were able to distinguish between ADD and not ADD with a mean ROC AUC = 0.98 ± 0.06.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/13334307.

    Do the beads not enrich unwanted negatively charged proteins (those with low isoelectric point)? It's not clear what principle would favor particle enrichment over protein enrichment?

    The threshold for lines in the volcano plot to determine which are differentially expressed is stated in the results but should also be stated in the figure legend along with which statistical test was used.

    It is a strength that they assessed the enrichment of markers, the size distribution of the enriched particles, and the TEM of the beads to increase confidence that they have captured the particles of interest.

    Figure 3D: does it make sense that the EVs would be visible outside the particles? My impression was that the EVs would go inside the beads.

    It's not clear how figure 2A/B standard PAC differs from Figure 4A. Why would they get over 4k proteins in 4A but only ~1k proteins in 2A/B?

    Some figures are small and difficult to see, such as 6b and 6d.

    It appears that some plasma samples for each dementia sample group may have been collected with the different protocols as part of separate studies. Different studies can lead to major batch effects that may confound the data interpretation. Thus, the strong differentiation they observe may be due to differences in sample collection.

    Some slight details about how they did the classification task should be mentioned in the results text instead of only in the methods. For example what model and how it was fit.

    Given that their data distributions are imbalanced for classification (sometimes 10 out of 40 total in the positive class, or 0.25 of the data), they should not use AUROC to assess performance. Precision and recall, or summaries of precision and recall such as F1 or AUPR, would be more appropriate. 

    Averaging k-fold cross validation is also not the best way to compute metrics. It would be better to use nested k-fold cross validation and report the average of metrics on the nested test sets. Given the simple model this may not substantially change the results but it would be a more true assessment of generalizability. 

    Details how statistically significant proteins or peptides were used in relation to the ML are a little sparse. For example, if the significant proteins were determined using the whole dataset and those proteins were then applied for training in the 5-fold CV, this constitutes test set leakage. 

    It is also not clear how the single protein level AUROC was determined. Was this also the average of a 5-fold CV? Also again this should be AUPR or F1 score to reflect imbalance classes. 

    Competing interests

    The authors declare that they have no competing interests.