LIGHTHOUSE illuminates therapeutics for a variety of diseases including COVID-19

This article has been Reviewed by the following groups

Read the full article

Abstract

No abstract available

Article activity feed

  1. SciScore for 10.1101/2021.09.25.461785: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Stocks of these viruses were prepared by inoculation of Vero-TMPRSS2 cell cultures as described previously54.
    Vero-TMPRSS2
    suggested: JCRB Cat# JCRB1818, RRID:CVCL_YQ48)
    Recombinant DNA
    SentencesResources
    AAC is an 8,420-length vector in which each position corresponds to a sequence of three amino acids11.
    8,420-length
    suggested: None
    In addition, the JM109 strain was transformed with 1 μg of the pBlueScript II SK+ plasmid (Invitrogen), which harbors an ampicillin resistance gene as a selection marker, and was then spread on LB agar plates containing ampicillin (100 μg/ml)
    pBlueScript II SK+
    suggested: None
    Software and Algorithms
    SentencesResources
    Generation of a data set for the training phase of LIGHTHOUSE: The compound SMILES strings of the data set were extracted from the PubChem compound database on the basis of compound names and PubChem compound IDs (CIDs).
    PubChem
    suggested: (PubChem, RRID:SCR_004284)
    The protein sequences of the data set were extracted from the UniProt protein database on the basis of gene names/RefSeq accession numbers or the UniProt IDs.
    UniProt
    suggested: (UniProtKB, RRID:SCR_004426)
    With regard to IC50, we downloaded data from BindingDB, obtained SMILES expressions and amino acid sequences similarly, and again separated the data into training (80%), validation (10%), and test (10%) data sets (Extended Data Fig.
    BindingDB
    suggested: (BindingDB, RRID:SCR_000390)
    Generation of virtual chemical libraries and prediction by LIGHTHOUSE: We prepared nearly 1 billion purchasable substances, which were downloaded from the ZINC database23 as of 30 July 2020, for virtual PPAT inhibitor screening.
    ZINC
    suggested: (Zinc, RRID:SCR_008596)
    The resulting top 500 potential targets were then subjected to enrichment analyses with the use of the Metascape Web server53.
    Metascape
    suggested: (Metascape, RRID:SCR_016620)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    A limitation of LIGHTHOUSE is the generation of false positives, which is due in part to the fact that the confidence score provided by STITCH is not based solely on experimental data but also on other factors such as co-occurrence in the literature. Well-studied molecules are thus prone to score higher than others. This drawback can be mitigated partially by combining the three different models (CNN, AAC, and Transformer). It may also be important to perform a counter-virtual screening to determine whether an identified small molecule reacts specifically with the target protein or whether it scores highly with many proteins. Such an approach has the potential to reduce the number of false positives and provide more accurate guidance. Despite this limitation, LIGHTHOUSE proved to be effective for the identification of lead compounds for all conditions tested. It can theoretically be applied to any protein of any organism, and even to proteins that do not exist naturally. This is an advantage over 3D docking simulation methods, which require prior 3D structural knowledge of the protein of interest. LIGHTHOUSE computes and embeds structural information in numerical vectors, which are then readily retrieved by the subsequent decoding module. Given the accelerating development of protein embedding technologies49 and graph-based chemoinformatics approaches, LIGHTHOUSE has the potential to be a cornerstone of drug discovery. In summary, we have developed LIGHTHOUSE as a means to di...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.