Overcoming the widespread flaws in the annotation of vertebrate selenoprotein genes in public databases

Max Ticó
Emerson Sullivan
Roderic Guigó
Marco Mariotti

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Selenocysteine (Sec) is a non-canonical amino acid incorporated into selenoproteins, oxidoreductase enzymes carrying essential roles in redox homeostasis. Sec insertion occurs in response to UGA, normally interpreted as stop codon, but recoded in selenoprotein mRNAs. Owing to the dual function of UGA, the identification of selenoprotein genes poses a challenge.

We show that the vertebrate selenoprotein genes are widely misannotated in major public databases. Only 12% and 6% of selenoprotein genes are well annotated in Ensembl and NCBI GenBank, respectively, due to the lack of dedicated selenoprotein annotation pipelines. In most cases (81% and 84%), overlapping flawed annotations are present which lack the Sec-encoding UGA. In contrast, NCBI RefSeq employs a dedicated selenoprotein pipeline, yet with some shortcomings: its selenoprotein annotations are correct in 76% of cases, and most errors affect families with a C-terminal Sec residue.

We argue that selenoproteins must be correctly annotated in public databases and that must occur via automated pipelines, to keep the pace with genome sequencing. To facilitate this task, we present a new version of Selenoprofiles, an homology based tool for selenoprotein prediction that produces predictions with accuracy comparable to manual curation, and can be easily deployed and integrated in existing annotation pipelines.

Version published to 10.1101/2024.10.30.620813 on bioRxiv
Nov 3, 2024

Adaptive laboratory evolution with ethionine identifies novel genetic determinants for enhanced protein and methionine accumulation in Saccharomyces cerevisiae

This article has 7 authors:
1. Tae Hoon Lee
2. Sang-Hun Do
3. Hyun-Jae Lee
4. Kun-Jae Lee
5. Jonghyeok Shin
6. Yong-Cheol Park
7. Sun-Ki Kim
This article has no evaluationsLatest version Jan 13, 2026
Ensembl’s regulatory annotation for human, mouse, livestock, and aquaculture species

This article has 5 authors:
1. Garth R. Ilsley
2. Paulo R. Branco Lins
3. Gabriela A. Merino
4. David Urbina-Gómez
5. Peter W. Harrison
This article has no evaluationsLatest version Jan 6, 2026
Bioinformatic Analyses of the Ataxin-2 Family Since Algae Emphasize Its Small Isoforms, Large Chimerisms, and the Importance of Human Exon 1B as Target of Therapies to Prevent Neurodegeneration

This article has 11 authors:
1. Georg Auburger
2. Jana Key
3. Suzana Gispert
4. Isabel Lastres-Becker
5. Luis-Enrique Almaguer-Mederos
6. Carole Bassa
7. Antonius Auburger
8. Georg Auburger
9. Aleksandar Arsovic
10. Thomas Deller
11. Nesli-Ece Sen
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Adaptive laboratory evolution with ethionine identifies novel genetic determinants for enhanced protein and methionine accumulation in Saccharomyces cerevisiae

Ensembl’s regulatory annotation for human, mouse, livestock, and aquaculture species

Bioinformatic Analyses of the Ataxin-2 Family Since Algae Emphasize Its Small Isoforms, Large Chimerisms, and the Importance of Human Exon 1B as Target of Therapies to Prevent Neurodegeneration