Reframing natural organic matter research through compositional data analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Compositional data (CoDa) are prevalent in environmental research. They represent parts of a whole, such as percentages, proportions, and relative or absolute abundance. They are arrays of positive data that convey relevant information in the ratios between their components. Standard statistical techniques developed for real random observations often yield spurious results and are therefore unsuitable for CoDa, which has unique geometric properties. CoDa analysis is now widely acknowledged across various research fields, ranging from geoscience to social science, with a recent surge in popularity in microbial genomics. However, its adoption remains limited in natural organic matter (NOM) research, despite NOM data from key analytical tools such as mass spectrometry, fluorescence spectroscopy, and nuclear magnetic resonance spectroscopy all being compositional. Given the structural similarity between NOM and high-throughput sequencing data, for which CoDa analysis has been successfully adopted, we argue that CoDa analysis should also be consistently integrated into NOM research to prevent analytical pitfalls and misleading inferences. A few pioneering studies have applied CoDa analysis to NOM data, and a wide array of useful open-source tools are already available. This paper discusses step-by-step the application of CoDa analysis to NOM research, using ultrahigh-resolution mass spectrometry data as an illustrative example. The goal of the study is to provide the community with an overview of CoDa analysis and guide them on how to use it in practice.

Article activity feed