GENE-FAM: An automated pipeline for mining gene families and its application to MADS-box genes in Cannabis sativa

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Understanding how gene families evolve can offer great insight into adaptation at the phenotypic and ecological levels. This is particularly true in plants, where transcription factor gene families are often targeted for breeding programs to improve the agronomic traits of economically important crops. While recent advances in next generation sequencing have accelerated the wealth of genomics data, there remains a lack of accessible and reproducible genome mining pipelines tailored for gene family characterisation.

Here, we address this gap by developing GENE-FAM, an automated, scalable and open-source pipeline designed to mine and predict gene families based on conserved domains and motifs. To illustrate its application, we apply GENE-FAM to annotate MADS-box transcription factor genes across multiple Cannabis sativa genomes.

A comprehensive set of MADS-box genes was identified across three C. sativa cultivars, including both previously annotated and newly predicted genes. Through phylogenetic analyses, we confirm that all type II MADS-box gene subfamilies represented in flowering plants are present in C. sativa . Comparing our annotations with those of Arabidopsis thaliana and Solanum lycopersicum revealed that while most MADS type II families are highly conserved, SEPALLATA -like genes have undergone diversification in C. sativa . Together, these results demonstrate the application of GENE-FAM for genome-wide identification and characterisation of gene families in non-model species, revealing novel insights into MADS-box gene family evolution in C. sativa .

Article activity feed