GENE-FAM: An automated pipeline for mining gene families and its application to MADS-box genes in Cannabis sativa

Louise Ryan
Nina Trubanová
Grace Pender
Rainer Melzer
Graham M Hughes
Susanne Schilling

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Understanding how gene families evolve can offer great insight into adaptation at the phenotypic and ecological levels. This is particularly true in plants, where transcription factor gene families are often targeted for breeding programs to improve the agronomic traits of economically important crops. While recent advances in next generation sequencing have accelerated the wealth of genomics data, there remains a lack of accessible and reproducible genome mining pipelines tailored for gene family characterisation.

Here, we address this gap by developing GENE-FAM, an automated, scalable and open-source pipeline designed to mine and predict gene families based on conserved domains and motifs. To illustrate its application, we apply GENE-FAM to annotate MADS-box transcription factor genes across multiple Cannabis sativa genomes.

A comprehensive set of MADS-box genes was identified across three C. sativa cultivars, including both previously annotated and newly predicted genes. Through phylogenetic analyses, we confirm that all type II MADS-box gene subfamilies represented in flowering plants are present in C. sativa . Comparing our annotations with those of Arabidopsis thaliana and Solanum lycopersicum revealed that while most MADS type II families are highly conserved, SEPALLATA -like genes have undergone diversification in C. sativa . Together, these results demonstrate the application of GENE-FAM for genome-wide identification and characterisation of gene families in non-model species, revealing novel insights into MADS-box gene family evolution in C. sativa .

Version published to 10.64898/2026.06.10.731441 on bioRxiv
Jun 15, 2026

BAT: an integrated pipeline for gene tree construction, annotation, and functional inference

This article has 3 authors:
1. Benjamin D. Sheppard
2. Brian Behnken
3. Adam Steinbrenner
This article has no evaluationsLatest version May 12, 2026
EffectorGeneP: accurate gene annotation in pathogen genomes from infection transcriptomes

This article has 12 authors:
1. Jana Sperschneider
2. Camilla Langlands-Perry
3. Jian Chen
4. Jibril Lubega
5. Taj Arndell
6. David Lewis
7. Eva Henningsen
8. Cheryl Blundell
9. Thomas Vanhercke
10. Kostya Kanyuka
11. Melania Figueroa
12. Peter Dodds
This article has no evaluationsLatest version May 5, 2026
Functional genomic map of local adaptation in sorghum to guide allele mining

This article has 5 authors:
1. Yuxing Xu
2. Aayudh Das
3. Clara Cruet-Burgos
4. Geoffrey P. Morris
5. Jesse R. Lasky
This article has no evaluationsLatest version May 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

BAT: an integrated pipeline for gene tree construction, annotation, and functional inference

EffectorGeneP: accurate gene annotation in pathogen genomes from infection transcriptomes

Functional genomic map of local adaptation in sorghum to guide allele mining