Integration of diverse bioactivity data into the Chemical Checker compound universe

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Chemical signatures encode the physicochemical and structural properties of small molecules into numerical descriptors, forming the basis for chemical comparisons and search algorithms. The increasing availability of bioactivity data has improved compound representations to include biological effects, although bioactivity descriptors are often limited to a few well-documented molecules. To address this issue, we implemented a collection of deep neural networks able to leverage the experimentally determined bioactivity data associated to small molecules and infer the missing bioactivity signatures for any compound of interest. However, unlike static chemical descriptors, these bioactivity signatures dynamically evolve with new data and processing strategies. Here, we present a computational protocol to modify or generate novel bioactivity spaces and signatures, describing the main steps needed to leverage diverse bioactivity data with the current knowledge, as catalogued in the Chemical Checker (CC), using the predefined data curation pipeline. We illustrate the functioning of the protocol through four specific examples, including the incorporation of new compounds to an already existing bioactivity space, a change in the data pre-processing without altering the underlying experimental data, and the creation of two novel bioactivity spaces from scratch, which are completed in under 9 hours using GPU computing. Overall, this protocol offers a guideline for installing, testing and running the CC data integration approach on user-provided data, with the aim of extending the annotation presented for a limited number of small molecules to a larger chemical landscape.

Key points

  • The Chemical Checker is a large collection of processed, harmonized and integrated bioactivity signatures for over 1 million small molecules. Data are organized into 5 levels of increasing biological complexity and curation degrees: from chemical properties to clinical outcomes and from raw data representing explicit knowledge to embedded signatures inferred from observed bioactivity patterns.

  • The Chemical Checker package provides a predefined and versatile framework to integrate user-provided data to the Chemical Checker universe of small molecules and generate novel customized bioactivity signatures.

Key references

Duran-Frigola et al . “Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.” Nature Biotechnology 38.9 (2020): 1087-1096.

Bertoni et al . “Bioactivity descriptors for uncharacterized chemical compounds.” Nature Communications 12.1 (2021): 3932.

Article activity feed