A sulfatide-centered ultra-high resolution magnetic resonance MALDI imaging benchmark dataset for MS1-based lipid annotation tools
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
Spatial ‘omics techniques are indispensable for studying complex biological systems and for the discovery of spatial biomarkers. While several current matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) instruments are capable of localizing numerous metabolites at high spatial and spectral resolution, the majority of MSI data is acquired at the MS1 level only. Assigning molecular identities based on MS1 data presents significant analytical and computational challenges, as the inherent limitations of MS1 data preclude confident annotations beyond the sum formula level. To enable future advancements of computational lipid annotation tools, well-characterized benchmark - or ground truth - datasets are crucial, which exceed the scope of synthetic data or data derived from mimetic tissue models. To this end, we provide two sulfatide-centered, biology-driven magnetic resonance MSI (MR-MSI) datasets at different mass resolving powers that characterize lipids in a mouse model of human metachromatic dystrophy. This data includes an ultra-high-resolution (R ∼1,230,000) quantum cascade laser mid-infrared imaging-guided MR-MSI dataset that enables isotopic fine structure analysis and therefore enhances the level of confidence substantially. To highlight the usefulness of the data, we compared 118 manual sulfatide annotations with the number of decoy database-controlled sulfatide annotations performed in Metaspace (67 at FDR < 10%). Overall, our datasets can be used to benchmark annotation algorithms, validate spatial biomarker discovery pipelines, and serve as a reference for future studies that explore sulfatide metabolism and its spatial regulation.
Article activity feed
-
ABSTRACTSpatial ‘omics techniques are indispensable for studying complex biological systems and for the discovery of spatial biomarkers. While several current matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) instruments are capable of localizing numerous metabolites at high spatial and spectral resolution, the majority of MSI data is acquired at the MS1 level only. Assigning molecular identities based on MS1 data presents significant analytical and computational challenges, as the inherent limitations of MS1 data preclude confident annotations beyond the sum formula level. To enable future advancements of computational lipid annotation tools, well-characterized benchmark - or ground truth - datasets are crucial, which exceed the scope of synthetic data or data derived from mimetic tissue models. To …
ABSTRACTSpatial ‘omics techniques are indispensable for studying complex biological systems and for the discovery of spatial biomarkers. While several current matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) instruments are capable of localizing numerous metabolites at high spatial and spectral resolution, the majority of MSI data is acquired at the MS1 level only. Assigning molecular identities based on MS1 data presents significant analytical and computational challenges, as the inherent limitations of MS1 data preclude confident annotations beyond the sum formula level. To enable future advancements of computational lipid annotation tools, well-characterized benchmark - or ground truth - datasets are crucial, which exceed the scope of synthetic data or data derived from mimetic tissue models. To this end, we provide two sulfatide-centered, biology-driven magnetic resonance MSI (MR-MSI) datasets at different mass resolving powers that characterize lipids in a mouse model of human metachromatic dystrophy. This data includes an ultra-high-resolution (R ∼1,230,000) quantum cascade laser mid-infrared imaging-guided MR-MSI dataset that enables isotopic fine structure analysis and therefore enhances the level of confidence substantially. To highlight the usefulness of the data, we compared 118 manual sulfatide annotations with the number of decoy database-controlled sulfatide annotations performed in Metaspace (67 at FDR < 10%). Overall, our datasets can be used to benchmark annotation algorithms, validate spatial biomarker discovery pipelines, and serve as a reference for future studies that explore sulfatide metabolism and its spatial regulation.Competing Interest StatementBruker Daltonics co-funded the BMBF-funded projects Drugs4Future and DrugsData within the framework M2Aind, as mandated by BMBF, but did not influence this study. All other authors declare no competing interests.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf150), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2: Hikmet Budak
I believe that the dataset produced is a great contribution to the community. My major concerns are as follows:
- The data described is good but please clarify how would be solution the discrepancy between the manual annotations and the computational annotations and annotations quality for he sulfatide-centered MSI dataset, challenges?
- Please remove too old references unless they are pioneer and replace with the new ones.
- Please try to add some of figures as supplementary instead of text,
- algorithm is not fully optimized or not?
- How did you recover the missing annotations? Please clarify/elabroate this
Would be happy to review after revisions.
-
ABSTRACTSpatial ‘omics techniques are indispensable for studying complex biological systems and for the discovery of spatial biomarkers. While several current matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) instruments are capable of localizing numerous metabolites at high spatial and spectral resolution, the majority of MSI data is acquired at the MS1 level only. Assigning molecular identities based on MS1 data presents significant analytical and computational challenges, as the inherent limitations of MS1 data preclude confident annotations beyond the sum formula level. To enable future advancements of computational lipid annotation tools, well-characterized benchmark - or ground truth - datasets are crucial, which exceed the scope of synthetic data or data derived from mimetic tissue models. To …
ABSTRACTSpatial ‘omics techniques are indispensable for studying complex biological systems and for the discovery of spatial biomarkers. While several current matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) instruments are capable of localizing numerous metabolites at high spatial and spectral resolution, the majority of MSI data is acquired at the MS1 level only. Assigning molecular identities based on MS1 data presents significant analytical and computational challenges, as the inherent limitations of MS1 data preclude confident annotations beyond the sum formula level. To enable future advancements of computational lipid annotation tools, well-characterized benchmark - or ground truth - datasets are crucial, which exceed the scope of synthetic data or data derived from mimetic tissue models. To this end, we provide two sulfatide-centered, biology-driven magnetic resonance MSI (MR-MSI) datasets at different mass resolving powers that characterize lipids in a mouse model of human metachromatic dystrophy. This data includes an ultra-high-resolution (R ∼1,230,000) quantum cascade laser mid-infrared imaging-guided MR-MSI dataset that enables isotopic fine structure analysis and therefore enhances the level of confidence substantially. To highlight the usefulness of the data, we compared 118 manual sulfatide annotations with the number of decoy database-controlled sulfatide annotations performed in Metaspace (67 at FDR < 10%). Overall, our datasets can be used to benchmark annotation algorithms, validate spatial biomarker discovery pipelines, and serve as a reference for future studies that explore sulfatide metabolism and its spatial regulation.Competing Interest StatementBruker Daltonics co-funded the BMBF-funded projects Drugs4Future and DrugsData within the framework M2Aind, as mandated by BMBF, but did not influence this study. All other authors declare no competing interests.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf150), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1: Morteza Akbari
This manuscript by Gruber et al. provides a Data Note detailing a high-value, sulfatide-focused benchmark dataset for the mass spectrometry imaging (MSI) community. The project is well thought out, technically advanced, and directly meets a major need for biologically relevant, deeply characterized ground-truth data to test MS1-level metabolite annotation software. It is a big technical achievement to create an ultra-high-resolution dataset (R∼1,230,000) with a 7T FT-ICR instrument. The use of isotopic fine structure (IFS) to boost annotation confidence is a major strength. Using QCL-MIR imaging strategically to guide the MSI acquisition is a smart and effective way to do things. It's great that the authors are committed to FAIR principles.
The writing in the manuscript is excellent, and the data is very good. It makes a big difference in the field. There are, however, several changes that should be made to make it clearer, more scientifically complete, and more useful as a stand-alone benchmark resource for the community. The following points are given to help make the manuscript stronger for publication.
Major Revisions
Provision of the "Ground Truth" Annotation List: The benchmark dataset is the most important part of this Data Note. The manuscript's supplementary information, on the other hand, doesn't seem to have the final, curated list of manual annotations that make up the "ground truth." For this dataset to be truly reusable for benchmarking third-party software, it needs another table. This table should show all of the manually annotated sulfatides for each replicate, along with their experimental m/z, proposed sum formula, lipid annotation, mass error (ppm), and a way to tell if IFS was used to confirm them.
Strengthening the "Ground Truth" Justification: The manuscript depends on an earlier publication (Ref) to validate the sulfatide structures using MS/MS. It is acceptable to reference previous work, but a benchmark Data Note should be as self-sufficient as possible. Please add a short paragraph to the "Data Validation and Quality Control" section that sums up the main MS/MS fragmentation evidence from Ref that backs up the sulfatide identifications. This will give users of the dataset a more complete and clear chain of evidence.
Deeper Analysis of Automated Annotation Discrepancies: The comparison with Metaspace shows how important this dataset is by showing that even a top-of-the-line tool can't annotate 14 high-confidence sulfatides. The discussion needs to be longer so that it can look at
why these failures could be happening. Please explain why Metaspace's scoring algorithm, which only looks at the four most intense isotopic peaks, might not work well with this kind of ultra-high-resolution data where low-intensity IFS peaks (like 34 S) are very important. Talking about how future algorithms could make better use of this information would make the paper much more useful and help with the development of new tools.
Minor Revisions
Clarification of Table 1: The row headers for the R2 dataset ("all" vs. "QCL-MIR-guided") are slightly confusing, as all R2 data is QCL-MIR-guided. Please revise these for clarity (e.g., "Total Annotations in ROIs" and "Annotations with Confirmed IFS Evidence").
Definition of "Internal Error": The legend for Figure 1g should include a brief definition or reference for how "internal error" was calculated to ensure the metric is fully understood.
Confirmation of Database Contents: In the Methods section, please add a sentence explicitly confirming that all manually annotated sulfatide species were included in the custom database file used for the Metaspace analysis. This is a crucial detail for a fair comparison.
Explicit Statement of Dataset Limitations: In the "Re-use Potential" section, it would be beneficial to explicitly state the inherent trade-off of the ultra-high-resolution approach. Please add a sentence acknowledging that the dataset is optimized for high-confidence annotation and that this comes at the cost of reduced sensitivity and comprehensive spatial coverage compared to a standard MSI experiment.
Link to Custom Database: The Methods section mentions the creation of a custom database of 780 theoretical sulfatides. Please explicitly state in the text that this database is available as Supplementary Dataset 3.
Addressing these points will significantly enhance the manuscript's value and ensure its lasting impact as a key resource for the computational mass spectrometry community.
-
