NucBalancer: streamlining barcode sequence selection for optimal sample pooling for sequencing

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment:

    This paper presents NucBalancer, a R-pipeline and Shiny app designed for the optimal selection of barcode sequences for sample multiplexing in sequencing. Providing a user-friendly interface aiming to make this process accessible to both bioinformaticians and experimental researchers, enhancing its utility in adapting libraries prepared for one sequencing platform to be compatible with others. Important now with the introduction of additional sequencing platforms by Element Biosciences (AVITI System) and Ultima Genomics (UG100) increasing the diversity and capability of genomic research tools available. NucBalancer’s incorporation of dynamic parameters, including customizable red flag thresholds, allows for precise and practical barcode sequencing strategies. This adaptability is key in ensuring uniform nucleotide distribution, particularly in MGI sequencing and single-cell genomic studies, leading to more reliable and cost-effective sequencing outcomes across various experimental conditions. All the code is available under an open source license, and upon review the authors have also shared the code for the Shiny app.

    This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Recent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claiming better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer’s utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. NucBalancer efficiently balances nucleotide composition and sample concentrations, reducing biases and enhancing the reliability of NGS data across platforms. Its adaptability makes it invaluable for addressing sequencing challenges, ensuring effective barcode balancing for sample pooling on any platform. Availability and implementation NucBalancer is implemented in R and is available at https://github.com/ersgupta/NucBalancer. Additionally, a shiny interface is available at https://ersgupta.shinyapps.io/NucBalancer/.

Article activity feed

  1. Editors Assessment:

    This paper presents NucBalancer, a R-pipeline and Shiny app designed for the optimal selection of barcode sequences for sample multiplexing in sequencing. Providing a user-friendly interface aiming to make this process accessible to both bioinformaticians and experimental researchers, enhancing its utility in adapting libraries prepared for one sequencing platform to be compatible with others. Important now with the introduction of additional sequencing platforms by Element Biosciences (AVITI System) and Ultima Genomics (UG100) increasing the diversity and capability of genomic research tools available. NucBalancer’s incorporation of dynamic parameters, including customizable red flag thresholds, allows for precise and practical barcode sequencing strategies. This adaptability is key in ensuring uniform nucleotide distribution, particularly in MGI sequencing and single-cell genomic studies, leading to more reliable and cost-effective sequencing outcomes across various experimental conditions. All the code is available under an open source license, and upon review the authors have also shared the code for the Shiny app.

    This evaluation refers to version 1 of the preprint

  2. AbstractRecent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claim better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer’s utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. By facilitating the efficient balancing of nucleotide composition and the accommodation of differing sample concentrations, NucBalancer plays a pivotal role in reducing biases in nucleotide composition. This enhances the fidelity and reliability of NGS data across multiple platforms. As the NGS field continues to expand with the introduction of new sequencing technologies, the adaptability and wide-ranging applicability of NucBalancer render it an invaluable asset in genomic research. This tool addresses the current sequencing challenges ensuring that researchers can effectively balance barcodes for sample pooling regardless of the sequencing platform used.

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.138). These reviews are as follows.

    Reviewer 1. Aamir Khan

    Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

    Yes. The tool has novel features not reported in previous tools for barcoding.

    Is the source code available, and has an appropriate Open Source Initiative license been assigned to the code?

    Yes. The tool is available as an R script as well as a shiny app.

    Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined? Yes. I would suggest mentioning a few features that are novel or superior to other tools. Perhaps adding a table specifying these novel features that are not part of existing tools will add value to MS.

    Is the documentation provided clear and user friendly?

    Yes. The documentation is provided in a clear and user-friendly way. The input file formats are given in the GitHub page. It would be better to add an example to the shiny app page.

    Yes. Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level? Yes. Dependencies are mentioned on the tool documentation page and can be installed if R is already installed.

    Additional Comments: The authors have a well-written MS describing the NucBalancer tool. The tool adds value for sequencing by pooling samples and will be useful as we make technological advancements in the sequencing space.

    Reviewer 2. Hugo Varet

    Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

    Yes. The manuscript explains the constraints to be satisfied when looking for barcodes but more details about the context (Illumina chemistry for instance) would be appreciated. Moreover, is the software compatible with dual-indexing?

    Is the source code available, and has an appropriate Open Source Initiative license been assigned to the code?

    Yes. The source code of the program is available on GitHub as a R script, but the source code of the Shiny application is not available.

    As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

    Yes. Support can be asked by email to the authors as stated at the end of the README on GitHub.

    Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

    Yes. The example command line works well. However, the R script needs shiny and xtable packages to be loaded even if none of their functions is actually called in the script.

    Is the documentation provided clear and user friendly?

    No. A detailed documentation would improve the application proposed. In particular, more details about the different chemistries used by Illumina, MGI... and the constraints to find compatible barcodes.

    Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

    No. The strategy used to find barcodes seems very simple, but more details would improve the manuscript.

    Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

    No. The manuscript cites several packages developed to find compatibles sequencing barcodes but the performances are not compared. Moreover, we do not know if NucBalancer still work with a high number of samples/barcodes.

    Are there (ideally real world) examples demonstrating use of the software?

    No. A real world example would be appreciated to illustrate the software, especially in a scenario where the other cited solutions were not able to find compatible barcodes.

    Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

    No.

    Additional Comments: I would suggest the authors to improve the design of the Shiny app as (at the moment) it only runs a R script and prints the result. Moreover, I think the quality of the R code could be easily improved (e.g. loops with strange counters or comparisons with booleans).

    Re-review: I thank the authors for the improvements they made on this new version of the manuscript. At this stage, I'm not totally satisfied for the following reasons: - authors tell the source code of the Shiny app is now available on GitHub, but I have not been able to find it. - in the manuscript, the sentence "The tool does not have any dependency other than the utilities from the base R package" is no longer true as the tool now uses optparse. - in table 1, checkMyIndex is referenced with no web interface available white it actually exists (https://checkmyindex.pasteur.fr/). Moreover, the proposed web interface could still be improved. For instance: - it would be great to add something to show the algorithm is currently looking for a solution. - check the input files have a valid structure to be used. - display the input files when they are loaded to make sure the user uploaded the correct file.

    Reviewer 3. Wen Yao

    The authors reported a new tool for barcode sequences design. This tool is developed using R/Shiny and is available for using online. Below are my comments for further improvement of the manuscript and the tool. 1. Please provide a “load example data” button in the Shiny app. With this button, the example data can be easily loaded by the users for testing NucBalancer. 2. This URL (http://146.118.68.98:8888/) for NucBalancer should also be given in the manuscript. 3. The “Download Table” button is not working. 4. Format of the input data should be checked, as input data in wrong format caused the NucBalancer to crash. 5. The authors should compare NucBalancer with published similar tools in this field. More details are required.

    Re-review: The authors have addressed all my concerns.