FetchM: Streamlining Genome and Metadata Integration for Microbial Comparative Genomics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
FetchM is a Python-based tool for fetching, analyzing, and combining bacterial genomic metadata from the NCBI Genome database and associated sample metadata from NCBI BioSample records. When working with bulk-genome analyses, such as comparative genomics or pangenome studies, you often require a unified dataset that captures the full context of a particular bacterial species population. You can obtain genomic metadata by downloading the ncbi_dataset.tsv file from the NCBI Genome database for a specific bacterial species. However, this file lacks key metadata fields such as Collection Date, Host, Geographic Location, and Isolation Source. FetchM fills this gap by automatically retrieving these missing fields, linking genome accessions to their corresponding BioSample records via the NCBI Entrez API. FetchM not only helps you compile a complete, metadata-rich dataset but also provides visualizations and summaries of both genomic and contextual metadata features. You can filter and download sequences based on specific criteria such as year, host, isolation source, country, continent, and subcontinent, making it a flexible and powerful companion for large-scale genomic studies. FetchM is available as an open-source tool at: https://github.com/Tasnimul-Arabi-Anik/FetchM . It can also be downloaded as a PyPI package.