CluSeek: Bioinformatics Tool to Identify and Analyze Gene Clusters
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene clusters are key structural and functional units that encode diverse phenotypes in genomes, from metabolism to pathogenesis. As genome sequencing expands, tools for systematic exploration of this growing data are increasingly needed. We present CluSeek, an open-source, Python-based platform for discovering, visualizing, and analyzing gene clusters across all GenBank data. Unlike existing tools, CluSeek does not rely on predefined cluster types or reference libraries but enables mining of any gene neighborhoods containing colocalized homologs of user-specified genes. It features an intuitive graphical interface suitable for non-bioinformaticians and is freely available at https://cluseek.com . We demonstrate the versatility of Cluseek in two distinct case studies: (i) mining of specialized metabolites, where CluSeek uncovered over 16 new classes containing the bioactivity-enhancing 4-alkyl-L-proline moiety, previously known in only three Golden Era antibiotic classes; and (ii) analysis of type III secretion systems present in Bordetella species, revealing previously unrecognized taxonomic distribution, and genetic variants, including gene multiplications and novel components with potential functional significance.