Evolution of transcription factor-containing superfamilies in Eukaryotes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Regulation of gene expression helps determine various phenotypes in most cellular life forms. It is orchestrated at different levels and at the point of transcription initiation by transcription factors (TFs). TFs bind to DNA through domains that are evolutionarily related, by shared membership of the same superfamilies (TF-SFs), to those found in other nucleic acid binding and protein-binding functions (nTFs for non-TFs). Here we ask how TF DNA binding sequence families in eukaryotes have evolved in relation to their nTF relatives. TF numbers scale by power law with the total number of protein-coding genes differently in different clades, with fungi usually showing sub-linear powers whereas chordates show super-linear scaling. The LECA probably encoded a complex regulatory machinery with both TFs and nTFs, but with an excess of nTFs when compared to the relative distribution of TFs and nTFs in extant organisms. Losses drive the evolution of TFs and nTFs, with the possible exception of TFs in Animalia for some tree topologies. TFs are highly dynamic in evolution, showing higher gain and loss rates than nTFs though both are conserved to similar extents. Gains of TFs and nTFs are driven by the appearance of a large number of new sequence clusters in a small number of nodes, which determine the presence of as many as a third of extant TFs and nTFs as well as the relative presence of TFs and nTFs. Whereas nodes showing explosion of TF numbers belong to multicellular clades, those for nTFs lie among the fungi and the protists.