A Database of Restriction Maps to Expand the Utility of Bacterial Artificial Chromosomes

Eamon Winden
Alejandro Vasquez-Echeverri
Susana Calle-Casteneda
Yumin Lian
Juan Pablo Hernández-Ortiz
David C. Schwartz

Curated by GigaByte
**Editors Assessment: **

While Bacterial Artificial Chromosomes libraries were once a key resource for building the human genome project over time they have been rendered relatively obsolete by long-read technologies. In the era of CRISPR-Cas systems pairing this data with one of the many guide-RNA libraries to find targets for manipulation with CRISPR tools is bringing back BACs advantages for genomics. With this in mind the authors have developed a BAC restriction map database containing the restriction maps for both uniquely placed and insert-sequenced BACs from 11 libraries covering the recognition sequences of available restriction enzymes. Alongside a set of Python functions to reconstruct the database and more easily access it (which were debugged and had improved documentation added during review). The presented data should be valuable for researchers simply using BACs, as well as those working with larger sections of the genome in terms of synthetic genes, large-scale editing, and mapping.

*This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (GigaByte)
Endorsed by GigaByte (scotted400)

Abstract

While Bacterial Artificial Chromosomes were once a key resource for the genomic community, they have been obviated, for sequencing purposes, by long-read technologies. Such libraries may now serve as a valuable resource for manipulating and assembling large genomic constructs. To enhance accessibility and comparison, we have developed a BAC restriction map database.

GigaByte
Sep 25, 2023
**Editors Assessment: **

While Bacterial Artificial Chromosomes libraries were once a key resource for building the human genome project over time they have been rendered relatively obsolete by long-read technologies. In the era of CRISPR-Cas systems pairing this data with one of the many guide-RNA libraries to find targets for manipulation with CRISPR tools is bringing back BACs advantages for genomics. With this in mind the authors have developed a BAC restriction map database containing the restriction maps for both uniquely placed and insert-sequenced BACs from 11 libraries covering the recognition sequences of available restriction enzymes. Alongside a set of Python functions to reconstruct the database and more easily access it (which were debugged and had improved documentation added during review). The presented data should be …
**Editors Assessment: **

While Bacterial Artificial Chromosomes libraries were once a key resource for building the human genome project over time they have been rendered relatively obsolete by long-read technologies. In the era of CRISPR-Cas systems pairing this data with one of the many guide-RNA libraries to find targets for manipulation with CRISPR tools is bringing back BACs advantages for genomics. With this in mind the authors have developed a BAC restriction map database containing the restriction maps for both uniquely placed and insert-sequenced BACs from 11 libraries covering the recognition sequences of available restriction enzymes. Alongside a set of Python functions to reconstruct the database and more easily access it (which were debugged and had improved documentation added during review). The presented data should be valuable for researchers simply using BACs, as well as those working with larger sections of the genome in terms of synthetic genes, large-scale editing, and mapping.

*This evaluation refers to version 1 of the preprint
Read the original source
GigaByte
Sep 21, 2023

Read the original source
GigaByte
Sep 21, 2023
AbstractWhile Bacterial Artificial Chromosomes were once a key resource for the genomic community, they have been obviated, for sequencing purposes, by long-read technologies. Such libraries may now serve as a valuable resource for manipulating and assembling large genomic constructs. To enhance accessibility and comparison, we have developed a BAC restriction map database.

This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.93), and has published the reviews under the same license. These are as follows.

**Reviewer 1. Po-Hsiang Hung **

Are all data available and do they match the descriptions in the paper?

No. The dataset in FTP includes all the Bac sequences and the restriction enzyme recognition sites in csv files. However, I could not find the database of pairs of BACs, …
AbstractWhile Bacterial Artificial Chromosomes were once a key resource for the genomic community, they have been obviated, for sequencing purposes, by long-read technologies. Such libraries may now serve as a valuable resource for manipulating and assembling large genomic constructs. To enhance accessibility and comparison, we have developed a BAC restriction map database.

This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.93), and has published the reviews under the same license. These are as follows.

**Reviewer 1. Po-Hsiang Hung **

Are all data available and do they match the descriptions in the paper?

No. The dataset in FTP includes all the Bac sequences and the restriction enzyme recognition sites in csv files. However, I could not find the database of pairs of BACs, which have overlaps generated by restriction enzymes that linearize the BACs. The makePairs function gave me an error when I tried running it locally, so I was not able to verify what is in these datasets. Personally, I find this function to be one of the most useful features described in this manuscript.

Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide

Yes. This manuscript contains the necessary minimal information (Submitting author, Author list, Dataset title, Dataset description, and Funding information)

Is there sufficient detail in the methods and data-processing steps to allow reproduction?

No. The authors provide their code in GitHub such that researchers can download the datasets and analyze the sequences locally. However, I felt that the descriptions in the readme.md file is often insufficient to reproduce the data presented in the manuscript, especially for researchers with little to no programming experience. Detailed information includes examples of how to use each function, the input format, and the location of the output folder/files. I also encountered software version issues during the installation of bacmapping. Please re-test the code in a new environment and describe all the versions of each software. For instance, I found Python version 3.11 is incompatible with this package while Python version 3.7 is compatible.

Is there sufficient data validation and statistical analyses of data quality?

No. The author used the BioRestriction class from Biopython to get the digestion site information. No extra validation is conducted in this manuscript. Due to the errors I encountered in re-running the code (see details in Any Additional Overall Comments to the Author), an independent method for checking several digestion sites in some Bac clones is suggested. The suggested independent method is to do enzyme digestion on some Bac clones or upload some Bac sequences to other software and compare the digestion sites.

In the output files that contain the digestions sites for each enzyme, some of the enzyme digestion sites are either NA or []. What is the difference between the two? If they mean the same thing (no cutting by the enzyme), bugs or other coding errors may cause this inconsistency. Please check the code again and also verify some of them using the independent methods suggested above. Examples of this issue are the files in maps>sequenced>CEPHB. Here I list two enzymes that show different results in each file: 3.csv : Ragl ([]), SchI (NA) 6.csv: EspEI (NA), AccII([]) 13.csv: EcoT22I ([]), Hsp92II (NA) X.csv: PacI ([]), AcIWI (NA)

Is the validation suitable for this type of data?

No. No validation in this manuscript. See the answer above.

Additional Comments: The authors make a database with enzyme digestion site information of Bac clones to help people to use the Bac clones for further usage. I think it is useful to have this information and also have the code to do further analysis locally. Thus, I think providing a very detailed user manual (or readme.md) is very important to help people use this dataset. Below I summarized the issues I encountered in running codes and also some suggestions. Major points: (1) I tested some bacmapping functions, and I discovered that some functions are not working as intended due to typos/bugs

The version of the software is required to help people properly install this package

Refining the code and also providing a better user manual is very helpful for people without a lot of coding experience to use it. The detailed information includes examples of how to use each function, the input format, and the location of the output folder/files. Descriptions for some functions in the readme file are not detailed enough and often do not describe what the input needs to be. For example, getCuts() require ‘row’ as input. But the author never gives a detailed description of what ‘row’ is in the readme file. I had to look in bacmapping.py to understand what ‘row’ is. If a function requires the variable ‘row’, show a few examples of how ‘row’ can be extracted from the proper input file.

mapPlacedClones() requires an input file (‘/home/eamon/BACPlay/longboys.csv’, line 335) that is located in the author’s local computer and is not available through github.

Typo in line 814 in getMap(). Should be: name = cloneLine[‘CloneName’]

Inconsistency in output variable type in getMap() (line 830 and 851). When local == ‘sequenced’, the output variable is a tuple, which causes issues in downstream functions such as getRestrictionMap() (line 869). (2) Add pairs of BACs into the dataset (3) The output file of digestion sites of each enzyme, some of the enzyme digestion sites showed NA or [ ]. Please double-check this and explain the differences (4) Validation of an independent method for the digestion map is suggested

Minor points: (1) Add a title to each column of sequencedStats.csv is useful for understanding the table easier

Re-review:

The authors have addressed majority of my points. The software installation works great after considering version control. The updated read.me provide detailed information for each function and their required input variables, and the examples in jupyter notebook are a great help for running the code. I did, however, encounter two minor errors when I tested the Ch19_bacmapping_example.ipynb on a Mac system. Please check this and update it.

(1)The .DS_store file that is automatically generated on a Mac system in the bacmapping/Examples/Ch19_example/maps/placed folder causes an error when running bmap.mapPlacedClones(cpustouse=cpus, chunk_size=chunksize). The same problem happened when I ran bmap.mapSequencedClones(cpustouse=cpus). After I deleted .DS_store in the folder, the code worked.

Here is the error message when I ran bmap.mapSequencedClones(cpustouse=cpus). NotADirectoryError: [Errno 20] Not a directory: '/Users/user_nsame/bacmapping/Examples/Ch19_example/maps/sequenced/.DS_Store'

(2) The second error is from running bmap.getRestrictionMap(name,enzyme). I got the error message, 'list' object has no attribute 'item'. I was able to run this function after changing maps[enzyme].item() to maps[enzyme] in line 779 of bacmapping.py. I encountered the same error with the drawMap function. I was able to run to run this function after changing line 847 of bacmapping.py from rmap = maps[nenzyme].item() to rmap = maps[nenzyme].item().

Here is the error message

AttributeError Traceback (most recent call last) Cell In[20], line 5 3 maps = bmap.getMaps(name) 4 #print(maps) #this is a big dataframe of all the maps, uncomment to check it out ----> 5 rmap = bmap.getRestrictionMap(name,enzyme) 6 print('Sites in ' + name + ' where ' + enzyme + ' cuts: '+ str(rmap)) 7 plt = bmap.drawMap(name, enzyme)

File ~/miniconda3/envs/bacmapping/lib/python3.11/site-packages/bacmapping/bacmapping.py:779, in getRestrictionMap(name, enzyme) 777 maps = getMaps(name) 778 nenzyme, r = getRightIsoschizomer(enzyme) --> 779 return(maps[nenzyme].item())

AttributeError: 'list' object has no attribute 'item'

**Reviewer 2. Wei Dong **

Is there sufficient data validation and statistical analyses of data quality? Not my area of expertise

Is the validation suitable for this type of data? I am not sure about this.This is not my specialty.

Overall comments: This is a great idea, fully exploring, integrating, and utilizing existing data for new research.
Read the original source
Version published to 10.1101/2023.03.31.535162v1 on bioRxiv
Apr 2, 2023

Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length

This article has 7 authors:
1. Jessica Jann
2. Isabelle Gagnon-Arsenault
3. Alicia Pageau
4. Alexandre K Dubé
5. Anna Fijarczyk
6. Romain Durand
7. Christian R. Landry
This article has no evaluationsLatest version May 29, 2025
Assemblatron: An Automated Workflow for High-Throughput Assembly of Big-DNA Libraries

This article has 10 authors:
1. Antonio Vela Gartner
2. Luisa F. Quispe
3. Klaus-Peter Rühmann
4. J. Andrew Martin
5. Jordan M. Welker
6. Mingyu Kenneth Li
7. Camila Coelho
8. Yu Zhao
9. Ran Brosh
10. Jef D. Boeke
This article has no evaluationsLatest version Jul 2, 2025
Megabase-scale human genome rearrangement with programmable bridge recombinases

This article has 12 authors:
1. Nicholas T. Perry
2. Liam J. Bartie
3. Dhruva Katrekar
4. Gabriel A. Gonzalez
5. Matthew G. Durrant
6. James J. Pai
7. Alison Fanton
8. Masahiro Hiraizumi
9. Chiara Ricci-Tam
10. Hiroshi Nishimasu
11. Silvana Konermann
12. Patrick D. Hsu
This article has no evaluationsLatest version May 16, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Here is the error message

Related articles

Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length

Assemblatron: An Automated Workflow for High-Throughput Assembly of Big-DNA Libraries

Megabase-scale human genome rearrangement with programmable bridge recombinases